Mixture-of-Experts-Sentence-Similarity

This repository serves as the official code base for the paper Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings

Logan Hallee, Rohan Kapur, Arjun Patel, Jason P. Gleghorn, and Bohdan Khomtchouk

Preprint: Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings

Peer review: preparing submission

Data and models

Huggingface

Main findings

Extending BERT models with N experts copied from their MLP section is highly effective for fine-tuning on downstream tasks, including multitask or multidomain data.
N experts are almost as effective as N individual models trained on N domains for sentence similarity tasks, we have tried up to five.
Enforced routing of experts can be handled with added special tokens for sentence-wise routing or designed token type IDs for token-wise routing, even when the router is a single linear layer. Enforced routing can also be accomplished by passing a list of desired indices. Mutual information based loss with top-k outputs also works well to correlate expert activation with specific types of data.
Cocitation networks are highly effective for gathering similar niche papers.
Using dot product with a learned temperature may be a more effective contrastive loss than standard Multiple Negatives Ranking loss.

Surprising findings

Small BERT models are not more effective with N experts, perhaps due to small shared attention layers. Our data supports that this threshold may be roughly 100 million parameters.
MoE extension with a SINGLE MoE layer gets 85% of the full MoE extension on average.
Token-wise routing is better than sentence-wise routing in general, even for sentence-wise tasks.

Applications of this work

Better vector databases / retrieval augmentation
Extending any sufficiently large BERT model with N experts for N tasks.
Vocabulary extension of BERT models with N experts for N vocabularies.

Docs

Coming soon

Please cite

@article{hallee2024contrastive,
      title={Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings}, 
      author={Logan Hallee and Rohan Kapur and Arjun Patel and Jason P. Gleghorn and Bohdan Khomtchouk},
      year={2024},
      eprint={2401.15713},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

and upvote on Huggingface!

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
data		data
documentation		documentation
models		models
results		results
yamls		yamls
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
load_model.py		load_model.py
main.py		main.py
metrics.py		metrics.py
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mixture-of-Experts-Sentence-Similarity

Data and models

Main findings

Surprising findings

Applications of this work

Docs

Please cite

About

Releases

Packages

Languages

License

Gleghorn-Lab/Mixture-of-Experts-Sentence-Similarity

Folders and files

Latest commit

History

Repository files navigation

Mixture-of-Experts-Sentence-Similarity

Data and models

Main findings

Surprising findings

Applications of this work

Docs

Please cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages