Release 1.1.0 · dame-cell/Triformer

This release completes the implementation of our custom linear layer by adding a fully-functional backward pass, enabling end-to-end training.

What's New

Complete Backward Pass Implementation: Added three specialized Triton kernels for efficient backpropagation:
- backward_input_kernel: Computes gradients with respect to input
- backward_weight_kernel: Computes gradients with respect to weights
- fused_relu_bias_backward_kernel: Fused computation of bias gradients and ReLU backward pass
Performance Optimizations:
- Kernel fusion to minimize memory operations
- Autotuned configurations for optimal performance
- Block-based computation patterns for efficient GPU utilization
- Mixed precision (float32 for computation, float16 for storage)

Previous Release

Forward pass implementation with fused linear transformation and ReLU activation

Technical Details

The backward pass maintains the same performance philosophy as the forward pass:

Leverages Triton for GPU acceleration
Uses autotuning to optimize kernel configurations
Implements efficient memory access patterns
Maintains numerical stability through careful handling of data types

Usage

The layer can now be used as a drop-in replacement for nn.Linear in training scenarios:

layer = TritonLinear(in_features=512, out_features=256)
optimizer = torch.optim.Adam(layer.parameters())
Forward pass
output = layer(input_tensor)
loss = criterion(output, target)
Backward pass (now supported!)
loss.backward()
optimizer.step()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.1.0

What's New

Previous Release

Technical Details

Usage