Inspired by Andrew's https://github.com/karpathy/build-nanogpt
- fineweb.py: Constructs fineweb training data.
- hellaswag.py: Evaluates using the Hellaswag dataset.
- train_gpt2.py: Trains a nano GPT-2 model from scratch.
- train_llama.py: Trains a nano LLaMA-3 model from scratch.
- train_mixtral.py: Trains a nano Mixtral 8x model from scratch.
- train_deepseek.py: Trains a nano Deepseek Moe model from scratch.
- play_with_llama.ipynb: Compares the training results.
- mixtral_result.ipynb: Compares the MoE training results.