my collection of paper implementations and experiments. built to be modular, easy to extend, and experiment with.
- Multi-head Latent Attention
- Differential Transformer
- Rotary Embeddings
- Attention with Linear Biases
- Llama
- GPT
- Swap config structure to importing modules instead