Open-Assistant Model Evaluation

This repository contains tools used during Open-Assistant development to measure the quality/capabilities of OA models and potential candidate base models for fine-tuning. We also collect results like sampling reports here.

Generating Sampling Reports

Installation

Make sure python 3.10.0 is installed.
Create a virtual environment: python3.10 -m venv .venv
Activate the venv: source .venv/bin/activate
Installed dependencies by executing pip install -r requirements.txt in the root directory of this repository.

Sampling

You can use the script model_eval/manual/sampling_report.py to generate continuations for a number of prompts specified in a jsonl file with sampling parameters defined in a configuration file. You find 100 random prompts from the Open-Assistant prompt database in the file data/en_100_text.jsonl and simple sampling configurations in the config/ directory.

Example command to run sampling_report.py with facebook/galactica-125m (a small model good for testing):

python sampling_report.py --model-name facebook/galactica-125m --config config/default.json --prompts data/en_100_text.jsonl --report report_file.json --verbose --num-samples 2 --half

CLI arguments

$ python sampling_report.py --help
Using pytorch version 1.13.1+cu117
usage: sampling_report.py [-h] [--device DEVICE] [--device-index DEVICE_INDEX] [--model-name MODEL_NAME] [--mode MODE] [--prompts PROMPTS] [--report REPORT] [--seed SEED] [--verbose] [-n N]
                          [--num-samples NUM_SAMPLES] [--config CONFIG] [--half] [--skip-special-tokens] [--model-type MODEL_TYPE] [--max-input-len MAX_INPUT_LEN]

options:
  -h, --help            show this help message and exit
  --device DEVICE       device to use
  --device-index DEVICE_INDEX
                        device index
  --model-name MODEL_NAME
  --mode MODE           legacy, v2
  --prompts PROMPTS     jsonl string prompts input file name
  --report REPORT       json sampling report output file name
  --seed SEED           psoudo random number generator seed
  --verbose
  -n N                  number of promtps to use (default: all)
  --num-samples NUM_SAMPLES
                        number of sampling runs per configuration
  --config CONFIG       configuration file path
  --half                use float16
  --skip-special-tokens
  --model-type MODEL_TYPE
                        CausalLM, T5Conditional
  --max-input-len MAX_INPUT_LEN
                        max token counts for input

Once the report file has been generated you can use the Model Output Comparer to compare the sampling results with different other outputs.

Model Ouptput Comparer

Use the Model Output Comparer to compare sampling results of different models and standard Huggingface Transformers sampling configurations.

You can load json report files by either specifying their URLs or by clicking into the file drop-zone and selecting a file in the browsers file-selector. As the name implies you can also drop files to the drop-zone (drag & drop).

You can select some models form the sampling_reports folder for comparison.

Here are some example URLs you can copy & paste into the input box directly below the Model Output Comparer title as an example:

https://raw.githubusercontent.com/Open-Assistant/oasst-model-eval/main/sampling_reports/pythia/2023-03-01_theblackcat102_pythia-3b-deduped-sft_sampling_default.json
https://raw.githubusercontent.com/Open-Assistant/oasst-model-eval/main/sampling_reports/chip2_7b_instruct_alpha/2023-03-02_chip2_7b_instruct_alpha_sampling_default.json
https://raw.githubusercontent.com/Open-Assistant/oasst-model-eval/main/sampling_reports/bloomz-7b1-mt/2023-03-02_bigscience_bloomz-7b1-mt_sampling_default.json

Commands to deploy web model comparer to gh-pages

To run a front-end tool to compare model outputs, do:

cd model_comparer
npm start

See the model_comparer/README.md for more information.

To deploy to github pages:

cd model_comparer
npm install
npm run deploy

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.vscode		.vscode
model_comparer		model_comparer
model_eval		model_eval
reviews		reviews
sampling_reports		sampling_reports
scoring_reports		scoring_reports
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-Assistant Model Evaluation

Generating Sampling Reports

Installation

Sampling

CLI arguments

Model Ouptput Comparer

Commands to deploy web model comparer to gh-pages

About

Releases

Packages

Contributors 10

Languages

License

Open-Assistant/oasst-model-eval

Folders and files

Latest commit

History

Repository files navigation

Open-Assistant Model Evaluation

Generating Sampling Reports

Installation

Sampling

CLI arguments

Model Ouptput Comparer

Commands to deploy web model comparer to gh-pages

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages