-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Pull requests: NVIDIA/Megatron-LM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: return float instead of tensor from
get_rotary_seq_len
#1419
opened Feb 20, 2025 by
jasonchiu-codeium
Loading…
Fix document regarding GQA (
--group-query-attention
) argument
#1401
opened Feb 12, 2025 by
eagle705
Loading…
Fix issue in converting Mixtral 8x7B checkpoints from HF to MCore and update doc
#1397
opened Feb 11, 2025 by
yeahdongcn
Loading…
support bf16 dtype for optimizer states using precision-aware optimizer in TransformerEngine
#1390
opened Feb 8, 2025 by
XiaobingSuper
•
Draft
add moe_router_device_choice_method argument to choose method …
#1381
opened Feb 6, 2025 by
bzantium
Loading…
fix bugs of data preprocessing with multiple json keys
#1337
opened Dec 25, 2024 by
junjzhang
Loading…
Create python-package.yml
stale
No activity in 60 days on issue or PR
#1332
opened Dec 21, 2024 by
invisiblepancake
Loading…
Fix: prevent double accumulation of load balancing loss and z-loss wi…
#1331
opened Dec 20, 2024 by
thuwzt
Loading…
update network interface env
stale
No activity in 60 days on issue or PR
#1319
opened Dec 12, 2024 by
lizamd
Loading…
fix args.mock_data bug caused by func get_blend_and_blend_per_split
stale
No activity in 60 days on issue or PR
#1306
opened Nov 29, 2024 by
1195343015
Loading…
support qwen2 hf<->mcore ckpt converter
stale
No activity in 60 days on issue or PR
#1290
opened Nov 19, 2024 by
wenyujin333
Loading…
Fix: Resolve multimodal model errors and update README usage instructions
stale
No activity in 60 days on issue or PR
#1286
opened Nov 13, 2024 by
singleheart
Loading…
Set No activity in 60 days on issue or PR
torch.multiprocessing
start method as 'spawn'
stale
#1285
opened Nov 12, 2024 by
hxdtest
Loading…
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.