Is it possible to merge Mistral 7B and Mistral NeMo 12B? #407

azulika · 2024-08-23T06:42:46Z

They use the same "MistralForCausalLM" structure and seem to share some parameters such as intermediate_size, and I was wondering if it would be possible to merge them together.

jim-plus · 2024-08-25T14:23:06Z

The tokenizers and vocabulary size are radically different between the models (assuming Mistral v0.3 7B), as is hidden size. I would be surprised if the result is coherent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to merge Mistral 7B and Mistral NeMo 12B? #407

Is it possible to merge Mistral 7B and Mistral NeMo 12B? #407

azulika commented Aug 23, 2024

jim-plus commented Aug 25, 2024

Is it possible to merge Mistral 7B and Mistral NeMo 12B? #407

Is it possible to merge Mistral 7B and Mistral NeMo 12B? #407

Comments

azulika commented Aug 23, 2024

jim-plus commented Aug 25, 2024