You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Chakra's kineto trace add metadata to the NCCL kernels, including message size, PG attributes (name, description, ranks, etc).
However in a trace I just recorded some kernels don't include any metadata.
The trace records gpt3/175b_fp8 training script from NeMo Launcher.
Here is the report of a normal all_reduce operation, we can see info about PG.
While here is the report of the nccl coalesced allgather, which displays no metadata:
The text was updated successfully, but these errors were encountered:
Chakra's kineto trace add metadata to the NCCL kernels, including message size, PG attributes (name, description, ranks, etc).
However in a trace I just recorded some kernels don't include any metadata.
The trace records
gpt3/175b_fp8
training script from NeMo Launcher.Here is the report of a normal all_reduce operation, we can see info about PG.
![image](https://private-user-images.githubusercontent.com/33977996/400770636-d44ab1b9-522e-4f92-852a-3359259ac574.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMDA0NTgsIm5iZiI6MTczOTAwMDE1OCwicGF0aCI6Ii8zMzk3Nzk5Ni80MDA3NzA2MzYtZDQ0YWIxYjktNTIyZS00ZjkyLTg1MmEtMzM1OTI1OWFjNTc0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDA3MzU1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTM0NWU4MzQyYTg0NDhmNWEzNTgyNjNhN2RjNzJiN2VjYTg0YzhmZDEyYWUwZmI5ZGRjM2Q3NTlhYzM3ZmUyZWMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.l96Yfb_DkxDxrT3-SimqFbt3JiQrJTBPpnICM9uml5A)
While here is the report of the nccl coalesced allgather, which displays no metadata:
![image](https://private-user-images.githubusercontent.com/33977996/400770808-f1d71f43-04e0-444d-8bcb-0b1d7a2f0c45.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMDA0NTgsIm5iZiI6MTczOTAwMDE1OCwicGF0aCI6Ii8zMzk3Nzk5Ni80MDA3NzA4MDgtZjFkNzFmNDMtMDRlMC00NDRkLThiY2ItMGIxZDdhMmYwYzQ1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDA3MzU1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWUzMWJhZjkyOTcwY2E1NTM4YjAxZDg4NjYwMmU0MjBjOGNkOTAwNTNkMDBiMDMyMDk4MmE2ZWVjYjQ3MTQzOWUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.phlVIHYxskWYM99AWDkp1RHpI3oOVEho7Iqo-DTdeAE)
The text was updated successfully, but these errors were encountered: