This project implements an SIMD processor that can be used to offload matrix and vector operations from the Zynq SoC. Matrix operations are subdivided into the vector operations add, subtract, dot product, and transpose.
The processor uses 4 pipeline stages with each taking 2 clock cycles.
- Instruction fetch and decode
- Load data
- Execute
- Store result
The compiler is used to convert a matrix operation into a series of vector operations.
AXI CDMA was used to transfer data between the PS and the block RAMs.