AI-Hypercomputer / torchprime Public

Notifications You must be signed in to change notification settings
Fork 1
Star 11

Code
Issues 45
Pull requests 6
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: AI-Hypercomputer/torchprime

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

45 Open 24 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[launcher] Specify an existing docker image URL that contains torchprime, for reproducibility. good first issue

Good for newcomers

#141 opened Mar 3, 2025 by tengyifei

[Llama 3.1 405B] Optimize device ID assignment to avoid data shuffling after DCN all-gather

#140 opened Mar 3, 2025 by tengyifei

Adopt apply_xla_patch_to_nn_linear to replace all nn.Linear ops with einsum

#139 opened Mar 3, 2025 by tengyifei

Mixtral 8x7B perf data

#137 opened Mar 1, 2025 by cloudchrischan

Llama 3.1-70B perf data

#136 opened Mar 1, 2025 by cloudchrischan

Llama 3.1-8B perf data

#135 opened Mar 1, 2025 by cloudchrischan

Clarify how to use non xpk clusters

#134 opened Feb 28, 2025 by cloudchrischan

Performance of Llama 3.1 8B matches Huggingface fork

#133 opened Feb 28, 2025 by tengyifei Good quality Llama 3.1 8B and 70B in torch_xla_models

Add example datasets

#132 opened Feb 28, 2025 by cloudchrischan

Add a guide for how to pick the best sharding strategy

#131 opened Feb 28, 2025 by cloudchrischan

Add best practices or guides for troubleshooting the top 3 OOM errors

#130 opened Feb 28, 2025 by cloudchrischan

Address correctness guarantees

#129 opened Feb 28, 2025 by cloudchrischan

Print of co["Flops"] in train.py errors out - TypeError: string indices must be integers

#128 opened Feb 28, 2025 by brianchunkang

Document the trick of reshaping 64x4 device mesh into ring

#124 opened Feb 26, 2025 by tengyifei

Able to persist logs into GCS good first issue

Good for newcomers

#116 opened Feb 20, 2025 by tengyifei

Able to load a Huggingface dataset from a GCS cache

#115 opened Feb 20, 2025 by tengyifei

[torch_xla] Changing the sharding of model.embed_tokens.weight produces NaN gradients in Llama 3.1 405B

#114 opened Feb 18, 2025 by tengyifei

Introduce a torch and torchax roller

#103 opened Feb 10, 2025 by tengyifei

Create nightly and stable release docker images

#99 opened Feb 8, 2025 by tengyifei

[Performance Report] What metrics to report?

#96 opened Feb 7, 2025 by zpcore

pytest --fork makes test easily hang

#94 opened Feb 7, 2025 by tengyifei

[torch_xla] MVP correctness check for Llama 3.0 8B

#90 opened Feb 6, 2025 by tengyifei Good quality Llama 3.1 8B and 70B in torch_xla_models

Bring up Stable Diffusion 2.1

#88 opened Feb 5, 2025 by tengyifei

Bring up Stable Diffusion 2.0

#87 opened Feb 5, 2025 by tengyifei

Hide ShardedModule from state dict

#82 opened Feb 5, 2025 by tengyifei

Previous 1 2 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly