Skip to content

Commit

Permalink
Add docstring docs (#413)
Browse files Browse the repository at this point in the history
* Add Reference docs with Pipeline docs

* Pin numpy<2

* Add Tasks docs

* Add more Tasks docs

* Add Models docs

* Fix Models docs

* Remove AdapterModel that requires peft

* Remove NanotronLightevalModel and VLLMModel that require nanotron and vllm

* Fix markdown comment syntax

* Add Metrics docs

* Fix typo

* Remove Main classes section

* Add Datasets docs

* Create Main classes section with Pipeline

* Add EvaluationTracker docs

* Add ModelConfig docs

* Add ParallelismManager to Pipeline docs

* Add inter-links from using-the-python-api

* Fix inter-links

* Add more Metrics docs

* Comment Metrics enum

* Fix typo

* Add explanation and GH issue to comment in Metrics enum

* Add inter-link to Metrics

* Add subsection titles to LightevalTask

* Add inter-link to LightevalTaskConfig

* Add inter-link to section heading anchor

* Add more Metrics docs

* Add inter-link to SampleLevelMetric and Grouping

* Add inter-link to LightevalTaskConfig

* Fix section title with trailing colon

* Add sections to Models docs

* Move Models docs to Main classes section

* Document you can pass either model or model config to Pipeline

* Move Datasets docs to Tasks docs

* Add logging docs
  • Loading branch information
albertvillanova authored Dec 3, 2024
1 parent 0c80801 commit 9bfa1ea
Show file tree
Hide file tree
Showing 14 changed files with 216 additions and 14 deletions.
18 changes: 18 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,21 @@
- local: available-tasks
title: Available Tasks
title: API
- sections:
- sections:
- local: package_reference/evaluation_tracker
title: EvaluationTracker
- local: package_reference/models
title: Models
- local: package_reference/model_config
title: ModelConfig
- local: package_reference/pipeline
title: Pipeline
title: Main classes
- local: package_reference/metrics
title: Metrics
- local: package_reference/tasks
title: Tasks
- local: package_reference/logging
title: Logging
title: Reference
8 changes: 5 additions & 3 deletions docs/source/adding-a-custom-task.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,9 @@ def prompt_fn(line, task_name: str = None):
)
```

Then, you need to choose a metric, you can either use an existing one (defined
in `lighteval/metrics/metrics.py`) or [create a custom one](adding-a-new-metric)).
Then, you need to choose a metric: you can either use an existing one (defined
in [`lighteval.metrics.metrics.Metrics`]) or [create a custom one](adding-a-new-metric)).
[//]: # (TODO: Replace lighteval.metrics.metrics.Metrics with ~metrics.metrics.Metrics once its autodoc is added)

```python
custom_metric = SampleLevelMetric(
Expand All @@ -59,7 +60,8 @@ custom_metric = SampleLevelMetric(
)
```

Then, you need to define your task. You can define a task with or without subsets.
Then, you need to define your task using [`~tasks.lighteval_task.LightevalTaskConfig`].
You can define a task with or without subsets.
To define a task with no subsets:

```python
Expand Down
10 changes: 6 additions & 4 deletions docs/source/adding-a-new-metric.mdx
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Adding a New Metric

First, check if you can use one of the parametrized functions in
[src.lighteval.metrics.metrics_corpus]() or
[src.lighteval.metrics.metrics_sample]().
[Corpus Metrics](package_reference/metrics#corpus-metrics) or
[Sample Metrics](package_reference/metrics#sample-metrics).

If not, you can use the `custom_task` system to register your new metric:

Expand Down Expand Up @@ -49,7 +49,8 @@ def agg_function(items):
return score
```

Finally, you can define your metric. If it's a sample level metric, you can use the following code:
Finally, you can define your metric. If it's a sample level metric, you can use the following code
with [`~metrics.utils.metric_utils.SampleLevelMetric`]:

```python
my_custom_metric = SampleLevelMetric(
Expand All @@ -62,7 +63,8 @@ my_custom_metric = SampleLevelMetric(
)
```

If your metric defines multiple metrics per sample, you can use the following code:
If your metric defines multiple metrics per sample, you can use the following code
with [`~metrics.utils.metric_utils.SampleLevelMetricGrouping`]:

```python
custom_metric = SampleLevelMetricGrouping(
Expand Down
6 changes: 3 additions & 3 deletions docs/source/contributing-to-multilingual-evaluations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Browse the list of all templates [here](https://github.com/huggingface/lighteval
Then, when ready, to define your own task, you should:
1. create a Python file as indicated in the above guide
2. import the relevant templates for your task type (XNLI, Copa, Multiple choice, Question Answering, etc)
3. define one or a list of tasks for each relevant language and evaluation formulation (for multichoice) using our parametrizable `LightevalTaskConfig` class
3. define one or a list of tasks for each relevant language and evaluation formulation (for multichoice) using our parametrizable [`~tasks.lighteval_task.LightevalTaskConfig`] class

```python
your_tasks = [
Expand Down Expand Up @@ -101,7 +101,7 @@ your_tasks = [
4. then, you can go back to the guide to test if your task is correctly implemented!

> [!TIP]
> All `LightevalTaskConfig` parameters are strongly typed, including the inputs to the template function. Make sure to take advantage of your IDE's functionality to make it easier to correctly fill these parameters.
> All [`~tasks.lighteval_task.LightevalTaskConfig`] parameters are strongly typed, including the inputs to the template function. Make sure to take advantage of your IDE's functionality to make it easier to correctly fill these parameters.


Once everything is good, open a PR, and we'll be happy to review it!
Once everything is good, open a PR, and we'll be happy to review it!
2 changes: 1 addition & 1 deletion docs/source/metric-list.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ These metrics need the model to generate an output. They are therefore slower.
- `quasi_exact_match_gsm8k`: Fraction of instances where the normalized prediction matches the normalized gold (normalization done for gsm8k, where latex symbols, units, etc are removed)
- `maj_at_8_gsm8k`: Majority choice evaluation, using the gsm8k normalisation for the predictions and gold

## LLM-as-Judge:
## LLM-as-Judge
- `llm_judge_gpt3p5`: Can be used for any generative task, the model will be scored by a GPT3.5 model using the OpenAI API
- `llm_judge_llama_3_405b`: Can be used for any generative task, the model will be scored by a Llama 3.405B model using the HuggingFace API
- `llm_judge_multi_turn_gpt3p5`: Can be used for any generative task, the model will be scored by a GPT3.5 model using the OpenAI API. It is used for multiturn tasks like mt-bench.
Expand Down
3 changes: 3 additions & 0 deletions docs/source/package_reference/evaluation_tracker.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# EvaluationTracker

[[autodoc]] logging.evaluation_tracker.EvaluationTracker
12 changes: 12 additions & 0 deletions docs/source/package_reference/logging.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Loggers

## GeneralConfigLogger
[[autodoc]] logging.info_loggers.GeneralConfigLogger
## DetailsLogger
[[autodoc]] logging.info_loggers.DetailsLogger
## MetricsLogger
[[autodoc]] logging.info_loggers.MetricsLogger
## VersionsLogger
[[autodoc]] logging.info_loggers.VersionsLogger
## TaskConfigLogger
[[autodoc]] logging.info_loggers.TaskConfigLogger
70 changes: 70 additions & 0 deletions docs/source/package_reference/metrics.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Metrics

## Metrics
[//]: # (TODO: aenum.Enum raises error when generating docs: not supported by inspect.signature. See: https://github.com/ethanfurman/aenum/issues/44)
[//]: # (### Metrics)
[//]: # ([[autodoc]] metrics.metrics.Metrics)
### Metric
[[autodoc]] metrics.utils.metric_utils.Metric
### CorpusLevelMetric
[[autodoc]] metrics.utils.metric_utils.CorpusLevelMetric
### SampleLevelMetric
[[autodoc]] metrics.utils.metric_utils.SampleLevelMetric
### MetricGrouping
[[autodoc]] metrics.utils.metric_utils.MetricGrouping
### CorpusLevelMetricGrouping
[[autodoc]] metrics.utils.metric_utils.CorpusLevelMetricGrouping
### SampleLevelMetricGrouping
[[autodoc]] metrics.utils.metric_utils.SampleLevelMetricGrouping

## Corpus Metrics
### CorpusLevelF1Score
[[autodoc]] metrics.metrics_corpus.CorpusLevelF1Score
### CorpusLevelPerplexityMetric
[[autodoc]] metrics.metrics_corpus.CorpusLevelPerplexityMetric
### CorpusLevelTranslationMetric
[[autodoc]] metrics.metrics_corpus.CorpusLevelTranslationMetric
### matthews_corrcoef
[[autodoc]] metrics.metrics_corpus.matthews_corrcoef

## Sample Metrics
### ExactMatches
[[autodoc]] metrics.metrics_sample.ExactMatches
### F1_score
[[autodoc]] metrics.metrics_sample.F1_score
### LoglikelihoodAcc
[[autodoc]] metrics.metrics_sample.LoglikelihoodAcc
### NormalizedMultiChoiceProbability
[[autodoc]] metrics.metrics_sample.NormalizedMultiChoiceProbability
### Probability
[[autodoc]] metrics.metrics_sample.Probability
### Recall
[[autodoc]] metrics.metrics_sample.Recall
### MRR
[[autodoc]] metrics.metrics_sample.MRR
### ROUGE
[[autodoc]] metrics.metrics_sample.ROUGE
### BertScore
[[autodoc]] metrics.metrics_sample.BertScore
### Extractiveness
[[autodoc]] metrics.metrics_sample.Extractiveness
### Faithfulness
[[autodoc]] metrics.metrics_sample.Faithfulness
### BLEURT
[[autodoc]] metrics.metrics_sample.BLEURT
### BLEU
[[autodoc]] metrics.metrics_sample.BLEU
### StringDistance
[[autodoc]] metrics.metrics_sample.StringDistance
### JudgeLLM
[[autodoc]] metrics.metrics_sample.JudgeLLM
### JudgeLLMMTBench
[[autodoc]] metrics.metrics_sample.JudgeLLMMTBench
### JudgeLLMMixEval
[[autodoc]] metrics.metrics_sample.JudgeLLMMixEval
### MajAtK
[[autodoc]] metrics.metrics_sample.MajAtK

## LLM-as-a-Judge
### JudgeLM
[[autodoc]] metrics.llm_as_judge.JudgeLM
12 changes: 12 additions & 0 deletions docs/source/package_reference/model_config.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# ModelConfig

[[autodoc]] models.model_config.BaseModelConfig

[[autodoc]] models.model_config.AdapterModelConfig
[[autodoc]] models.model_config.DeltaModelConfig
[[autodoc]] models.model_config.InferenceEndpointModelConfig
[[autodoc]] models.model_config.InferenceModelConfig
[[autodoc]] models.model_config.TGIModelConfig
[[autodoc]] models.model_config.VLLMModelConfig

[[autodoc]] models.model_config.create_model_config
30 changes: 30 additions & 0 deletions docs/source/package_reference/models.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Models

## Model
### LightevalModel
[[autodoc]] models.abstract_model.LightevalModel

## Accelerate and Transformers Models
### BaseModel
[[autodoc]] models.base_model.BaseModel
[//]: # (TODO: Fix import error)
[//]: # (### AdapterModel)
[//]: # ([[autodoc]] models.adapter_model.AdapterModel)
### DeltaModel
[[autodoc]] models.delta_model.DeltaModel

## Inference Endpoints and TGI Models
### InferenceEndpointModel
[[autodoc]] models.endpoint_model.InferenceEndpointModel
### ModelClient
[[autodoc]] models.tgi_model.ModelClient

[//]: # (TODO: Fix import error)
[//]: # (## Nanotron Model)
[//]: # (### NanotronLightevalModel)
[//]: # ([[autodoc]] models.nanotron_model.NanotronLightevalModel)

[//]: # (TODO: Fix import error)
[//]: # (## VLLM Model)
[//]: # (### VLLMModel)
[//]: # ([[autodoc]] models.vllm_model.VLLMModel)
13 changes: 13 additions & 0 deletions docs/source/package_reference/pipeline.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Pipeline

## Pipeline

[[autodoc]] pipeline.Pipeline

## PipelineParameters

[[autodoc]] pipeline.PipelineParameters

## ParallelismManager

[[autodoc]] pipeline.ParallelismManager
38 changes: 38 additions & 0 deletions docs/source/package_reference/tasks.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Tasks

## LightevalTask
### LightevalTaskConfig
[[autodoc]] tasks.lighteval_task.LightevalTaskConfig
### LightevalTask
[[autodoc]] tasks.lighteval_task.LightevalTask

## PromptManager

[[autodoc]] tasks.prompt_manager.PromptManager

## Registry

[[autodoc]] tasks.registry.Registry

## Requests

[[autodoc]] tasks.requests.Request

[[autodoc]] tasks.requests.LoglikelihoodRequest

[[autodoc]] tasks.requests.LoglikelihoodSingleTokenRequest

[[autodoc]] tasks.requests.LoglikelihoodRollingRequest

[[autodoc]] tasks.requests.GreedyUntilRequest

[[autodoc]] tasks.requests.GreedyUntilMultiTurnRequest

## Datasets

[[autodoc]] data.DynamicBatchDataset
[[autodoc]] data.LoglikelihoodDataset
[[autodoc]] data.LoglikelihoodSingleTokenDataset
[[autodoc]] data.GenerativeTaskDataset
[[autodoc]] data.GenerativeTaskDatasetNanotron
[[autodoc]] data.GenDistributedSampler
7 changes: 4 additions & 3 deletions docs/source/using-the-python-api.mdx
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Using the Python API

Lighteval can be used from a custom python script. To evaluate a model you will
need to setup an `evaluation_tracker`, `pipeline_parameters`, `model_config`
and a `pipeline`.
Lighteval can be used from a custom python script. To evaluate a model you will need to set up an
[`~logging.evaluation_tracker.EvaluationTracker`], [`~pipeline.PipelineParameters`],
a [`model`](package_reference/models) or a [`model_config`](package_reference/model_config),
and a [`~pipeline.Pipeline`].

After that, simply run the pipeline and save the results.

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ dependencies = [
"torch>=2.0,<2.5",
"GitPython>=3.1.41", # for logging
"datasets>=2.14.0",
"numpy<2", # pinned to avoid incompatibilities
# Prettiness
"termcolor==2.3.0",
"pytablewriter",
Expand Down

0 comments on commit 9bfa1ea

Please sign in to comment.