Skip to content

v0.56.0-rc4

Pre-release
Pre-release
Compare
Choose a tag to compare
@github-actions github-actions released this 01 Feb 02:07
· 18 commits to main since this release
70206b9

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/13083742254

📦 Uncategorized

  • #0: Add a way to specify custom dispatch topology
  • Initialize work_executor_ and set WorkExecutorMode::SYNCHRONOUS in MeshDevice constructor
  • [UMD] Switching to new coord API
  • #0: Increase create heads test coverage for Llama shapes
  • #16503: Optimize semaphore and CB writes
  • Quiet down the CMake output from dependencies
  • #16847: update to address the unaligned noc_async_copy from DRAM to L1
  • remove ND failing big shape in transpose failures that is already tracked and disabled in test_transpose_2d
  • #14898: pass in pad value to transpose in reduce
  • rm -rf build.yaml
  • #16982: Fixing program cache issues with reshape
  • Delete convd_host_weights and update all tests using conv2d
  • Update CODEOWNERS for the public API
  • Aliu/bug fix
  • #0: Move distributed headers into the public API directory
  • [Llama3.2-11b-vision] Add support for text-only inference through generator api
  • Remove references to ARCH_NAME in programming example
  • Enable PR Gate
  • #16806: Fixed watcher assert on reshape in debug mode
  • #0: Fix doc links and make them point to the new location
  • Remove ARCH_NAME references in prog_examples
  • Prefer MOLD over LLD over LD
  • Restore build-wrapper.yaml with updated method
  • [TT-Train] Fix text generation
  • LightMetal - Add Flatbuffers into cmake infra/build as cpm package (#17039)
  • Format broken Kernel APIs Tables
  • #0: Use MeshBuffer to store MeshWorkload kernel binaries
  • Increase rms_norm and layernorm coverage for Llama shapes
  • #17213: update fused and matmul trace sweep tests
  • Add support for reading from / writing to partial buffer regions that are page size aligned for sharded buffers
  • #0: Fix clang-format for dataflow_api.h
  • Kkabilar tt single card perf
  • [FABRIC] ASYNC_WR_ATOMIC_INC
  • #9945: Enable and fix SD device perf test
  • Check context switch pointer for eth cores before resetting
  • Pull llrt.hpp out of public interface
  • Update perf and latest features for llm models (Jan 27)
  • #0: (MINOR) Bump to generate RCs for v0.57.0
  • Remove dead includes of host_api.hpp from ttnn
  • Prevent UNet Shallow perf report entry from being overwritten
  • Fix setup.py for Anaconda
  • Do not run PR Gate on Draft PRs
  • Add a timeout for docker image building
  • LightMetal - New APIs LightMetalBeginCapture() and LightMetalEndCapture() and docs (#17039)
  • #0: Update distributed tests build to account for arch
  • #17227: Make dispatch core order match for single chip 2 CQ and multchip 2 CQ topologies
  • #17215: Add explicit dealloc for mesh buffer
  • #0: Add validation test for dispatched remote circular buffer config to device
  • Remove get_completion_queue_reader_core() API from Device
  • Add resharding to post all gather layernorm/ rms norm op
  • #0: Fix ttnn shared libs build
  • #0: Schedule runs for single card new models tests
  • Implement JointAttention
  • Revert "Add resharding to post all gather layernorm/ rms norm op (#17156)
  • Update memory config when using view op with height sharded tensors
  • #16812: Reordering cbs in reduce_init_delta
  • #17083: Add support for watcher printing phys coords
  • #16945: Add auto retries to post commit on branches
  • Remove CommandQueue redirecting usages straight to HWCQ
  • 1D support for tilize/reshape ops
  • #16138: W-broadcasting for sharded tensors
  • #0: Add PR Gate to data pipeline
  • #15174: Re-enable mistral7b demo test after fw upgrade
  • LightMetal - Add LoadTrace() API and move TraceDescriptor out of detail namespace (#17039)
  • #15974: Create device tensors table in report database
  • Privatize dprint_server.hpp
  • Uplift Allocator to be its own class + migrate calls to Allocator APIs
  • Bump CMake in the Docker image
  • Add perf reporting for ccl async mode
  • Fix debug checks for bank assignments when initializing the allocator
  • Add perf report for reduce scatter async
  • #0: expand halo documentation and fix images
  • Fix shard and physical height mismatch in ttnn.convert_to_chw tests
  • #17134: Remove unused components
  • #0: Fix retry comparison which causes endless retries until pass
  • Make runtime_args_ptr const ref to solve clang-tidy error due to 58f9654
  • Use padded shape to construct output memory config in ttnn.view
  • #17322 Remove transpose cpp unit tests
  • #17215: Initial MeshBuffer integration with TTNN
  • fix sub-height height sharded WH transpose by setting output memory config to width sharded
  • #0: Only produce cicd data on workflow runs that are success/failure/cancelled (ignore skipped runs)
  • #0: Use posted writes for profiler.
  • #16149: Add DeviceCommandCalculator to calculate command size
  • #15889: Fix handling of mantissa rounding to respect ties round to even
  • #17312: Fix type error when saving report config to json file
  • Implement ttnn.sampling op for top-k, top-p sampling
  • Test for a bad state before building
  • #0: Fix incorrect assertion introduced in #17259
  • #0: Add missing include for work_split.hpp
  • Replace ttnn::Shape/LegacyShape with SimpleShape in Python
  • #0: Correcting bad dim check in CCL tests
  • [tt-train] Add scatter workaround while proper version is in development
  • [skip-ci] Bump timeout
  • #15414: Read annotation data to determine job-level failure signature and reason
  • #0: TT-Mesh bug fix MeshCQ/MeshWorkload on device indexing
  • #17374: Add concurrency group for _produce-data.yaml
  • Revert "#17374: Add concurrency group for _produce-data.yaml"
  • #0: Fix broken link for "Programming Mesh of Devices" tech report
  • Revert "#15414: Read annotation data to determine job-level failure signature and reason"
  • Debug strange error
  • Extract CommandQueue interface
  • Revert "[skip-ci] Bump timeout (#17397)"
  • Move dispatch_s queries out of the device interface to a dedicated dispatch query layer
  • disable-pr-gate
  • Remove deprecated get_shape() from Moreh ops
  • 17059: Update golden functions
  • Ucheema/tt fabric async wr api
  • #0: seperate dispatch constants
  • #0: Add programming examples for TT-Metalium multi-device native APIs
  • Expanded im2col documentation
  • Removing skip_for_blackhole and fix some minor issues for blackhole conv2d
  • Delete tilize cpp tests
  • Page size assertion error
  • #0: Fuse batch slicing with create_qkv_heads
  • Post all gather layernorm/rms norm with resharding
  • Xuncai/all gather fixes
  • #15414: Read annotation data to determine job-level failure signature and reason
  • #0: Change noc_inline_dw_write to use write_at_cmd_buf
  • #0: Update pgm_dispatch_golden.json
  • #16380: add nD support for generic reductions
  • Address clang tidy error
  • Add Qwen2.5 vLLM generator (based on LlamaGenerator), fix batch 1 issue with generator's decode forward
  • Step 1 of normalizing Boost as a dependency
  • Add DeepSeek perf and instructions
  • [Llama3.2-11b-vision] Add max_cross_attn_tokens property to vLLM generator class
  • #0: Add failure signature for runner shutdown
  • #0: fix bug in createqkv
  • #0: Modify DeviceRange semantics to match CoreRange
  • Remove ttnn::Shape and LegacyShape from the codebase
  • Update the .rst to match code
  • Rename ttnn::SimpleShape to ttnn::Shape
  • Update vLLM commit for deepseek-distill-llama70b in README.md