Releases · microsoft/mscclpp

16 Jul 00:37

chhwang

v0.5.2

40cb196

MSCCL++ v0.5.2 Latest

Latest

What's Changed

Add C++ executor test by @chhwang in #304
Cumulative Updates by @Binyang2014 in #309
Add NPKit GPU event support by @yzygitzh in #310
Fix NPKit support for AMD by @yzygitzh in #312
Add "packet type" option for executor test by @Binyang2014 in #313
Add support for multicast reduce insruction by @roshandathathri in #316
Update quickstart.md by @angelica-moreira in #314
Simplify/improve barrier in AllReduce6 by @roshandathathri in #317
Support NCCL APIs by @caiomcbr in #319
Update allreduce_bench.py by @angelica-moreira in #318
Separate NPKit CPU timestamp access from different blocks for AMD platform by @yzygitzh in #321
AllReduce Kernel for Small Messages by @caiomcbr in #322
Resolve clang++ warnings by @chhwang in #325
Support to write packets via uint2 by @Binyang2014 in #327
Double buffering for NCCL APIs by @caiomcbr in #324
v0.5.2 by @chhwang in #328

New Contributors

@angelica-moreira made their first contribution in #314
@caiomcbr made their first contribution in #319

Full Changelog: v0.5.1...v0.5.2

Contributors

chhwang, Binyang2014, and 4 other contributors

Assets 2

26 May 21:32

chhwang

v0.5.1

cddffbc

MSCCL++ v0.5.1

What's Changed

Upgrade gtest by @chhwang in #300
Rename executor.cpp to executor_py.cpp by @chhwang in #301
Fix assert declaration & add a compile test by @chhwang in #303
Fix security issue by @Binyang2014 in #305
v0.5.1 by @chhwang in #308

Full Changelog: v0.5.0...v0.5.1

Contributors

chhwang and Binyang2014

Assets 2

04 May 23:53

chhwang

v0.5.0

9c2a960

MSCCL++ v0.5.0

What's Changed

Fix a typo name by @chhwang in #286
Add executor to execute schedule-plan file by @Binyang2014 in #283
Allow binding allocated memory to NVLS multicast pointer by @roshandathathri in #290
Seperate headers for GPU data types by @chhwang in #291
Refactoring NVLS interfaces by @chhwang in #293
Include GPU data types only for kernel code by @chhwang in #292
Ethernet support by @chhwang in #284
Resolve multi-nodes test failure issue by @Binyang2014 in #295
Move pipeline to Azure org by @Binyang2014 in #296
Optimized the execution kernel by @Binyang2014 in #294
Allow obtaining cuda stream handle from PyTorch stream when launching kernel by @aashaka in #297
v0.5.0 by @chhwang in #298

New Contributors

@roshandathathri made their first contribution in #290

Full Changelog: v0.4.3...v0.5.0

Contributors

chhwang, Binyang2014, and 2 other contributors

Assets 2

27 Mar 18:55

chhwang

v0.4.3

1a7cb98

MSCCL++ v0.4.3

What's Changed

Add optional prefix to installation paths by @chhwang in #235
Fix #235 by @chhwang in #239
Check nvidia_peermem during runtime by @chhwang in #234
Do not check value of __HIP_PLATFORM_AMD__ by @chhwang in #240
Fix crash in static variable deconstructor by @Binyang2014 in #238
Update interface to let user change fifo size by @Binyang2014 in #243
Mask each fields of the trigger by @chhwang in #244
Minor improvement on device syncer by @chhwang in #231
remove make pylib-copy command by @Binyang2014 in #249
Increase MSCCLPP_BITS_REGMEM_HANDLE to 9 by @aashaka in #251
Add putWithSignal() latency tests by @chhwang in #246
NVLS support. by @saeedmaleki in #250
Fix wrong offset calculation by @chhwang in #257
Fix NVLS support by @chhwang in #258
Allow MSCCL++ CommGroup to take PyTorch tensors in args by @aashaka in #255
Fix multi-nodes test failure by @Binyang2014 in #262
Allow semaphores and memory to be registered separately in ProxyService by @aashaka in #264
Remove cuda-python from project by @Binyang2014 in #245
Fix the comm.py for nvls by @saeedmaleki in #267
New packet format & optimizations by @chhwang in #256
Fix multi-node ci pipeline by @Binyang2014 in #272
add launch_bounds for mscclpp_test by @Binyang2014 in #273
Fix bootstrapping mechanism by @chhwang in #278
v0.4.3 by @chhwang in #279

New Contributors

@aashaka made their first contribution in #251

Full Changelog: v0.4.2...v0.4.3

Contributors

chhwang, Binyang2014, and 2 other contributors

Assets 2

20 Dec 12:25

chhwang

v0.4.2

f1605b7

MSCCL++ v0.4.2

What's Changed

Include cstdint in packet_device.hpp by @chhwang in #233
Fix & improve perf for ROCm by @chhwang in #232
v0.4.2 by @chhwang in #236

Full Changelog: v0.4.1...v0.4.2

Contributors

chhwang

Assets 2

06 Dec 02:14

chhwang

v0.4.1

c15a166

MSCCL++ v0.4.1

What's Changed

Fix performance downgrade issue & update doc by @Binyang2014 in #229
Add a documentation issue template by @chhwang in #230

Full Changelog: v0.4.0...v0.4.1

Contributors

chhwang and Binyang2014

Assets 2

24 Nov 09:09

chhwang

v0.4.0

351b95b

MSCCL++ v0.4.0

Add Python benchmark
Update documentation
Add ROCm support
Bug fixes

See details from #160.

Assets 2

11 Oct 14:37

chhwang

v0.3.0

8c0f9e8

MSCCL++ v0.3.0

Updated interfaces
Add Python bindings and interfaces
Add Python unit tests
Add more configurable parameters
Add a new single-node AllReduce kernel
Fix bugs

See details from #89.

Full Changelog: v0.2.0...v0.3.0

Assets 2

27 Mar 11:18

chhwang

v0.1.0

c706990

MSCCL++ v0.1.0 Pre-release

Pre-release

Features

Transport setup
- Bootstrap (initial meta-data exchange between ranks)
- Connection setup for P2P NVLink and InfiniBand
- CPU proxies for P2P NVLink and InfiniBand
Transport interface
- Trigger FIFO
- put-signal-wait interface
Tests
- AllToAll
- AllGather based on AllToAll

Full Changelog: https://github.com/microsoft/mscclpp/commits/v0.1.0

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Communication Features and Interfaces

GPU-side communication interfaces (DeviceChannel)

Host-side interfaces

Transports support

Performance Optimization

Development Pipeline

Documents

Features

Releases: microsoft/mscclpp

MSCCL++ v0.5.2

What's Changed

New Contributors

Contributors

MSCCL++ v0.5.1

What's Changed

Contributors

MSCCL++ v0.5.0

What's Changed

New Contributors

Contributors

MSCCL++ v0.4.3

What's Changed

New Contributors

Contributors

MSCCL++ v0.4.2

What's Changed

Contributors

MSCCL++ v0.4.1

What's Changed

Contributors

MSCCL++ v0.4.0

MSCCL++ v0.3.0

MSCCL++ v0.2.0

Communication Features and Interfaces

GPU-side communication interfaces (DeviceChannel)

Host-side interfaces

Transports support

Performance Optimization

Development Pipeline

Documents

MSCCL++ v0.1.0

Features