StableMax/⊥Grad compatibility #1850

HydrogenBombaklot · 2025-02-09T15:04:52Z

StableMax and ⊥Grad are new LLM training algorithms introduced in the Grokking at the Edge of Numerical Stability paper. Here is their repo.

These algorithms significantly boost generalization speed, resulting in faster, more efficient training and fine-tuning.

Can I drop-in these algorithms for MLX training/fine-tuning or do we need to integrate them more formally into MLX?

angeloskath · 2025-02-14T00:16:04Z

These look cool! I think the orthograd optimizer would be more suitable as a PR to https://github.com/stockeh/mlx-optimizers . Regarding stable max I think it should be fairly easy to add to any project as follows

@partial(mx.compile, shapeless=True)
def stablemax(x, axis=-1):
    x = mx.where(x < 0, -mx.log1p(-x), mx.log1p(x))
    return mx.softmax(x, axis=axis)

HydrogenBombaklot · 2025-02-14T00:31:47Z

Thank you kind stranger!

angeloskath closed this as completed Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StableMax/⊥Grad compatibility #1850

StableMax/⊥Grad compatibility #1850

HydrogenBombaklot commented Feb 9, 2025 •

edited

Loading

angeloskath commented Feb 14, 2025

HydrogenBombaklot commented Feb 14, 2025

StableMax/⊥Grad compatibility #1850

StableMax/⊥Grad compatibility #1850

Comments

HydrogenBombaklot commented Feb 9, 2025 • edited Loading

angeloskath commented Feb 14, 2025

HydrogenBombaklot commented Feb 14, 2025

HydrogenBombaklot commented Feb 9, 2025 •

edited

Loading