currently supports gguf models
![drawing](https://private-user-images.githubusercontent.com/195927/271113494-1f23cbf7-5e76-445e-b231-aedd213b5712.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg4OTgxNTAsIm5iZiI6MTczODg5Nzg1MCwicGF0aCI6Ii8xOTU5MjcvMjcxMTEzNDk0LTFmMjNjYmY3LTVlNzYtNDQ1ZS1iMjMxLWFlZGQyMTNiNTcxMi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwN1QwMzEwNTBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT05ZGQ1MzFkOTdjYjhkYWNmNzhhOTljOWU2Y2M0NDMwMDQ1M2UwZDIyODJlNWRmYWI3YjczZjNkMzcyNWFmYzVmJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.ii0ABWGA-nfrj8d-oo3KXufUnc-sYMFwKfDYBzpzMpA)
- Description: Specifies how many tokens from the initial prompt should be retained.
- Default:
0
- Description: Maximum number of new tokens to predict.
- Default:
-1
(infinite until completion)
- Description: A dictionary mapping specific tokens to their logit biases.
- Default:
null
- Description: Sequences where generation stops.
- Default: Empty array
- Description: File path for saving/loading model eval state.
- Default: Empty string
- Description: Suffix to add to user inputs.
- Default: Empty string
- Description: Prefix to add to user inputs.
- Default: Empty string
- Description: Number of most probable tokens to consider for generation.
- The topK parameter changes how the model selects tokens for output.
- A topK of 1 means the selected token is the most probable among all the tokens in the model’s vocabulary (also called greedy decoding), while a topK of 3 means that the next token is selected from among the 3 most probable using the temperature.
- For each token selection step, the topK tokens with the highest probabilities are sampled.
- Tokens are then further filtered based on topP with the final token selected using temperature sampling.
- Default:
40
- Description: Cumulative probability mass threshold.
- The topP parameter changes how the model selects tokens for output.
- Tokens are selected from the most to least probable until the sum of their probabilities equals the topP value.
- For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the topP value is 0.5, then the model will select either A or B as the next token by using the temperature and exclude C as a candidate.
- The default topP value is 0.95
- Default:
0.95
- Description: Unknown (Documentation missing).
- Default:
1.0
- Description: Unknown (Documentation missing).
- Default:
1.0
- Description: Controls randomness in token selection.
- The temperature controls the degree of randomness in token selection.
- The temperature is used for sampling during response generation, which occurs when topP and topK are applied.
- Lower temperatures are good for prompts that require a more deterministic/less open-ended response, while higher temperatures can lead to more diverse or creative results.
- A temperature of 0 is deterministic, meaning that the highest probability response is always selected.
- Default:
0.8
- Description: Penalty for token repetition.
- Default:
1.1
- Description: Last n tokens to penalize.
- Default:
64
- Description: Coefficient for frequency penalty.
- Default:
0.0
- Description: Coefficient for presence penalty.
- Default:
0.0
- Description: Algorithm type based on the paper https://arxiv.org/abs/2007.14966.
- Default:
Disabled
- Description: Target entropy for Mirostat.
- Default:
5.0
- Description: Learning rate for Mirostat.
- Default:
0.1
- Description: Consider newlines as repeatable tokens.
- Default:
true