Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs in et_converter #50

Merged
merged 7 commits into from
May 9, 2024
Merged

Fix bugs in et_converter #50

merged 7 commits into from
May 9, 2024

Conversation

TaekyungHeo
Copy link
Contributor

@TaekyungHeo TaekyungHeo commented May 8, 2024

Summary

Fix bugs in et_converter

  • Fix formatting due to ruff in et_converter/pytorch2chakra_converter.py
  • Remove UniqueIdAssigner from et_converter/pytorch2chakra_converter.py
  • Update pytorch2chakra_converter to support various versions
  • Ignore et_converter/ for pyre checks
  • Use GPU operator's name for identifying communication type
  • Fix communication size calculation logic
  • Remove inter-stream dependencies

Test Plan

1. Run trace_link

chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_0.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_0.json --output-file ~/megatron_0.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_1.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_1.json --output-file ~/megatron_1.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_2.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_2.json --output-file ~/megatron_2.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_3.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_3.json --output-file ~/megatron_3.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_4.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_4.json --output-file ~/megatron_4.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_5.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_5.json --output-file ~/megatron_5.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_6.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_6.json --output-file ~/megatron_6.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_7.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_7.json --output-file ~/megatron_7.json &

2. Run et_converter

chakra_converter --input_filename ~/megatron_0.json --output_filename megatron_0.chakra --input_type PyTorch > /tmp/rank_0 &
chakra_converter --input_filename ~/megatron_1.json --output_filename megatron_1.chakra --input_type PyTorch > /tmp/rank_1 &
chakra_converter --input_filename ~/megatron_2.json --output_filename megatron_2.chakra --input_type PyTorch > /tmp/rank_2 &
chakra_converter --input_filename ~/megatron_3.json --output_filename megatron_3.chakra --input_type PyTorch > /tmp/rank_3 &
chakra_converter --input_filename ~/megatron_4.json --output_filename megatron_4.chakra --input_type PyTorch > /tmp/rank_4 &
chakra_converter --input_filename ~/megatron_5.json --output_filename megatron_5.chakra --input_type PyTorch > /tmp/rank_5 &
chakra_converter --input_filename ~/megatron_6.json --output_filename megatron_6.chakra --input_type PyTorch > /tmp/rank_6 &
chakra_converter --input_filename ~/megatron_7.json --output_filename megatron_7.chakra --input_type PyTorch > /tmp/rank_7 &

3. Results
Screenshot 2024-05-08 at 7 42 27 PM

@TaekyungHeo TaekyungHeo requested a review from a team as a code owner May 8, 2024 23:55
Copy link

github-actions bot commented May 8, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@TaekyungHeo TaekyungHeo force-pushed the et-converter-bugfix branch 2 times, most recently from 319e63d to af4e466 Compare May 9, 2024 00:09
@TaekyungHeo TaekyungHeo changed the title Bugfix for et_converter Fix bugs in et_converter May 9, 2024
@TaekyungHeo TaekyungHeo force-pushed the et-converter-bugfix branch from af4e466 to 0d2c3c4 Compare May 9, 2024 00:12
Inter-stream dependencies result in bugs because of false dependencies,
especially when a single CPU operator issues multiple GPU operators.
@TaekyungHeo TaekyungHeo force-pushed the et-converter-bugfix branch from 0d2c3c4 to c06a14e Compare May 9, 2024 00:14
@srinivas212 srinivas212 merged commit f316efc into main May 9, 2024
7 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators May 9, 2024
@TaekyungHeo TaekyungHeo deleted the et-converter-bugfix branch May 9, 2024 18:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants