Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refar code generation for pointwise operation & PointwiseDynamicFunction
1. Now we generate code with nd tiles and 1d grid with grid-stride-loop, where the n is the ndim of the task space; 2. Add some simple logic to simplify task space(when all operand have the same shape and same stride, and all of them are non overlapping and dense, we simplify the task space into a 1d space, although we can use better policy but we leave it for future work); 3. Use a smarter policy for output layout inference:(the output will follow the stride order of the first tensor that has the same shape as the broadcasted shape, pre-defined ouputs has higher priority than all input tensors; otherwise, the output in c-contiguous); 4. make tile size and grid size in generated code configurable; 5. work around the problem that save to block pointer does not automatically cast the value to the pointer's dtype; 6. work around the problem that values loaded from a pointer to bool is int8 and block pointer from pointer to bool has dtype int8; 7. fix the bitwise-* operators without those work arounds, and add test-cases with bool inputs & outputs for them; 8. add TypedPtr and StridedBuffer as drop-in replament for torch.Tensor to be used in generated triton kernel & wrappers, which allows some unsafe reinterpretation of Tensors(dtype, shape, stride, offset), which cannot be done by torch APIs; 9. fix a bug in flip op where the flipped view(shifted data pointer and negative strides) from input in applied to the output.
- Loading branch information