NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2.6k
Star 11.4k

Code
Issues 205
Pull requests 163
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/Megatron-LM

Labels 11 Milestones 0

New pull request New

163 Open 268 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

fix: return float instead of tensor from get_rotary_seq_len

#1419 opened Feb 20, 2025 by jasonchiu-codeium

Loading…

Fix document regarding GQA (--group-query-attention) argument

#1401 opened Feb 12, 2025 by eagle705

Loading…

Fix issue in converting Mixtral 8x7B checkpoints from HF to MCore and update doc

#1397 opened Feb 11, 2025 by yeahdongcn

Loading…

Enabling Alternative Path ABC implementations

#1393 opened Feb 10, 2025 by ashvinnihalani

Loading…

Fix typo in GPTModel forward function comments

#1391 opened Feb 9, 2025 by Zzhiter

Loading…

support bf16 dtype for optimizer states using precision-aware optimizer in TransformerEngine

#1390 opened Feb 8, 2025 by XiaobingSuper • Draft

add qkv_bias

#1388 opened Feb 7, 2025 by Chandler-Bing

Loading…

Update LICENSE

#1382 opened Feb 6, 2025 by maximevtush

Loading…

add moe_router_device_choice_method argument to choose method …

#1381 opened Feb 6, 2025 by bzantium

Loading…

Add LongRoPE support

#1377 opened Feb 5, 2025 by 16bitmood

Loading…

add .sh file

#1373 opened Feb 4, 2025 by umsanmaru

Loading…

KV-cache for T5 model

#1358 opened Jan 17, 2025 by YK-Fu

Loading…

fix typo

#1352 opened Jan 10, 2025 by Jintao-Huang

Loading…

fix param overwrite problem in saver_mcore

#1351 opened Jan 9, 2025 by Force1ess

Loading…

Fix typo

#1347 opened Jan 4, 2025 by deep-sci

Loading…

fix bugs of data preprocessing with multiple json keys

#1337 opened Dec 25, 2024 by junjzhang

Loading…

Create python-package.yml stale

No activity in 60 days on issue or PR

#1332 opened Dec 21, 2024 by invisiblepancake

Loading…

Fix: prevent double accumulation of load balancing loss and z-loss wi…

#1331 opened Dec 20, 2024 by thuwzt

Loading…

Add Mamba TRTLLM support

#1320 opened Dec 12, 2024 by meatybobby

Loading…

update network interface env stale

No activity in 60 days on issue or PR

#1319 opened Dec 12, 2024 by lizamd

Loading…

fix args.mock_data bug caused by func get_blend_and_blend_per_split stale

No activity in 60 days on issue or PR

#1306 opened Nov 29, 2024 by 1195343015

Loading…

[Update] Print training log in rank0

#1296 opened Nov 21, 2024 by shijungg

Loading…

support qwen2 hf<->mcore ckpt converter stale

No activity in 60 days on issue or PR

#1290 opened Nov 19, 2024 by wenyujin333

Loading…

Fix: Resolve multimodal model errors and update README usage instructions stale

No activity in 60 days on issue or PR

#1286 opened Nov 13, 2024 by singleheart

Loading…

Set torch.multiprocessing start method as 'spawn' stale

No activity in 60 days on issue or PR

#1285 opened Nov 12, 2024 by hxdtest

Loading…

Previous 1 2 3 4 5 6 7 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly