-
Notifications
You must be signed in to change notification settings - Fork 1
Issues: AI-Hypercomputer/torchprime
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[launcher] Specify an existing docker image URL that contains torchprime, for reproducibility.
good first issue
Good for newcomers
#141
opened Mar 3, 2025 by
tengyifei
[Llama 3.1 405B] Optimize device ID assignment to avoid data shuffling after DCN all-gather
#140
opened Mar 3, 2025 by
tengyifei
Adopt apply_xla_patch_to_nn_linear to replace all nn.Linear ops with einsum
#139
opened Mar 3, 2025 by
tengyifei
Add best practices or guides for troubleshooting the top 3 OOM errors
#130
opened Feb 28, 2025 by
cloudchrischan
Print of co["Flops"] in train.py errors out - TypeError: string indices must be integers
#128
opened Feb 28, 2025 by
brianchunkang
Able to persist logs into GCS
good first issue
Good for newcomers
#116
opened Feb 20, 2025 by
tengyifei
[torch_xla] Changing the sharding of
model.embed_tokens.weight
produces NaN gradients in Llama 3.1 405B
#114
opened Feb 18, 2025 by
tengyifei
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.