You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These look cool! I think the orthograd optimizer would be more suitable as a PR to https://github.com/stockeh/mlx-optimizers . Regarding stable max I think it should be fairly easy to add to any project as follows
StableMax and ⊥Grad are new LLM training algorithms introduced in the Grokking at the Edge of Numerical Stability paper. Here is their repo.
These algorithms significantly boost generalization speed, resulting in faster, more efficient training and fine-tuning.
Can I drop-in these algorithms for MLX training/fine-tuning or do we need to integrate them more formally into MLX?
The text was updated successfully, but these errors were encountered: