You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
They use the same "MistralForCausalLM" structure and seem to share some parameters such as intermediate_size, and I was wondering if it would be possible to merge them together.
The text was updated successfully, but these errors were encountered:
The tokenizers and vocabulary size are radically different between the models (assuming Mistral v0.3 7B), as is hidden size. I would be surprised if the result is coherent.
They use the same "MistralForCausalLM" structure and seem to share some parameters such as intermediate_size, and I was wondering if it would be possible to merge them together.
The text was updated successfully, but these errors were encountered: