Exllama2 Entropy Sampling implementation for Text-Generation-WebUI
This is kind of a hackjob, but it works.
Replace the sampler.py in /text-generation-webui-main/installer_files/env/Lib/site-packages/exllamav2/generator
and it should work.
- Set 1.84 temperature to override normal temp, and enable the Dynamic Temp sampling.
- You can configure the EntropyTemp.txt file that gets created in your
/text-generation-webui-main/
folder to change the minimum and maximum temperature respectively. By default, these are 0.0 minimum temp and 2.0 maximum temp.
The way Entropy sampling works is by measuring the amount of randomness in the probabilities. If there is high certainty in the predictions, it will use a value closer to your minimum temperature setting. This means that the more good choices there are, the more random or 'creative' it will get; this includes lowering Top P or Top K, since this will narrow it down to more probable choices which causes it to scale higher.
EDIT: Reuploaded to fix 0 temp issue specific to ExLlama not handling it gracefully like kobold.
EDIT EDIT: Reuploaded again to fix random typo. Oops.
ADDITIONAL NOTE:
- sampler_ALT is an alternative version. Obviously you will need to replace the 'sampler.py' in the same way. This one is different, in that it measures the raw probability distribution before it is changed, which may be more consistent across different sampler settings (e.g Top P 0.90, etc...) Your mileage may vary, this is not tested. It's just a hunch I had that measuring consistently might improve the results across different sampler settings; 2.0 max temp might be too low considering how high the entropy can (theoretically) scale, though, so keep that in mind.
- EDIT: I'm being told that sampler_ALT trends toward being a lot more deterministic, but still seems to introduce a good amount of variation into extremely quantized (think 2.5bpw) 70b models. YMMV though!