Adjust vocab size calculation of DPO model to be dynamic #141

p-ferreira · 2023-08-28T13:44:48Z

The current DPO model returns a hardcoded value (-11 that will be exp(-11)) as the nearest integer value that is less than equal probability value across all logits (1/50257), reference in #138

Chunk of code of openvalidators/reward/dpo.py:

...
# Check if completion is 
        if completion.strip() == '' or len(completion) <= 5:
            return -11 # exp(-11)=1.67e-5 < 2e-5=1/50257 (typical vocab size)
...

The 50257 vocab size is taken as typical vocab size but that could be different for other models / tokenizers.
Ideally, this value would be calculated automatically like 1 / model.vocab_size , rather than a hard-coded number

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust vocab size calculation of DPO model to be dynamic #141

Adjust vocab size calculation of DPO model to be dynamic #141

p-ferreira commented Aug 28, 2023

Adjust vocab size calculation of DPO model to be dynamic #141

Adjust vocab size calculation of DPO model to be dynamic #141

Comments

p-ferreira commented Aug 28, 2023