Skip to content

Pull requests: NVIDIA/Megatron-LM

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Enabling Alternative Path ABC implementations
#1393 opened Feb 10, 2025 by ashvinnihalani Loading…
Fix typo in GPTModel forward function comments
#1391 opened Feb 9, 2025 by Zzhiter Loading…
add qkv_bias
#1388 opened Feb 7, 2025 by Chandler-Bing Loading…
Update LICENSE
#1382 opened Feb 6, 2025 by maximevtush Loading…
Add LongRoPE support
#1377 opened Feb 5, 2025 by 16bitmood Loading…
add .sh file
#1373 opened Feb 4, 2025 by umsanmaru Loading…
KV-cache for T5 model
#1358 opened Jan 17, 2025 by YK-Fu Loading…
fix typo
#1352 opened Jan 10, 2025 by Jintao-Huang Loading…
fix param overwrite problem in saver_mcore
#1351 opened Jan 9, 2025 by Force1ess Loading…
Fix typo
#1347 opened Jan 4, 2025 by deep-sci Loading…
fix bugs of data preprocessing with multiple json keys
#1337 opened Dec 25, 2024 by junjzhang Loading…
Create python-package.yml stale No activity in 60 days on issue or PR
#1332 opened Dec 21, 2024 by invisiblepancake Loading…
Add Mamba TRTLLM support
#1320 opened Dec 12, 2024 by meatybobby Loading…
update network interface env stale No activity in 60 days on issue or PR
#1319 opened Dec 12, 2024 by lizamd Loading…
fix args.mock_data bug caused by func get_blend_and_blend_per_split stale No activity in 60 days on issue or PR
#1306 opened Nov 29, 2024 by 1195343015 Loading…
[Update] Print training log in rank0
#1296 opened Nov 21, 2024 by shijungg Loading…
support qwen2 hf<->mcore ckpt converter stale No activity in 60 days on issue or PR
#1290 opened Nov 19, 2024 by wenyujin333 Loading…
Fix: Resolve multimodal model errors and update README usage instructions stale No activity in 60 days on issue or PR
#1286 opened Nov 13, 2024 by singleheart Loading…
Set torch.multiprocessing start method as 'spawn' stale No activity in 60 days on issue or PR
#1285 opened Nov 12, 2024 by hxdtest Loading…
ProTip! Exclude everything labeled bug with -label:bug.