Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 764 Bytes

README.md

File metadata and controls

15 lines (11 loc) · 764 Bytes

Build GPT, from scratch

Based on the fantastic lecture serie by AK here https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ

The code is slightly altered with

  1. I use @dataclass decorators, which don't play well with nn.Modules out of the box, but I fix it
  2. I add DataSet and DataLoader classes, so I don't have too many global vars laying around
  3. no GPU at all (device is cpu only): the mac I have at the moment has no GPU, which forces me to think carefully about optimizing intead of just scaling up the net
  4. my own comments, for personal reference

files

Structure is just like the one from AK

  1. module.py has all the module definitions
  2. train.py has the data gathering, object instantiations and training loop