Skip to content

1.1.0

Compare
Choose a tag to compare
@dame-cell dame-cell released this 26 Oct 10:31
· 217 commits to main since this release

This release completes the implementation of our custom linear layer by adding a fully-functional backward pass, enabling end-to-end training.

What's New

  • Complete Backward Pass Implementation: Added three specialized Triton kernels for efficient backpropagation:

    • backward_input_kernel: Computes gradients with respect to input
    • backward_weight_kernel: Computes gradients with respect to weights
    • fused_relu_bias_backward_kernel: Fused computation of bias gradients and ReLU backward pass
  • Performance Optimizations:

    • Kernel fusion to minimize memory operations
    • Autotuned configurations for optimal performance
    • Block-based computation patterns for efficient GPU utilization
    • Mixed precision (float32 for computation, float16 for storage)

Previous Release

  • Forward pass implementation with fused linear transformation and ReLU activation

Technical Details

The backward pass maintains the same performance philosophy as the forward pass:

  • Leverages Triton for GPU acceleration
  • Uses autotuning to optimize kernel configurations
  • Implements efficient memory access patterns
  • Maintains numerical stability through careful handling of data types

Usage

The layer can now be used as a drop-in replacement for nn.Linear in training scenarios:

layer = TritonLinear(in_features=512, out_features=256)
optimizer = torch.optim.Adam(layer.parameters())
Forward pass
output = layer(input_tensor)
loss = criterion(output, target)
Backward pass (now supported!)
loss.backward()
optimizer.step()