Skip to content

mgielda/CFU-Playground

 
 

Repository files navigation

CFU Playground

Want a faster ML processor? Do it yourself!

This project provides a framework that an engineer, intern, or student can use to design and evaluate enhancements to an FPGA-based “soft” processor, specifically to increase the performance of machine learning (ML) tasks. The goal is to abstract away most infrastructure details so that the user can get up to speed quickly and focus solely on adding new processor instructions, exploiting them in the computation, and measuring the results.

This project enables rapid iteration on processor improvements -- multiple iterations per day.

This is how it works at the highest level:

  • Choose a TensorFlow Lite model; a quantized person detection model is provided
  • Execute the inference on the Arty FPGA board to get cycle counts per layer
  • Choose an TFLite operator to accelerate, and dig into that code
  • Design new instruction(s) that can replace multiple basic operations
  • Build a custom function unit (a small amount of hardware) that performs the new instruction(s)
  • Modify the TFLite/Micro library kernel to use the new instruction(s), which are available as intrinsics with function call syntax.
  • Rebuild the FPGA Soc, recompile the TFLM library, and rerun to measure improvement (simple make targets are provided)

The focus here is performance, not demos. The inputs to the ML inference are canned/faked, and the only output is cycle counts.
It would be possible to export the improvements made here to an actual demo, but no pathway has been set up for doing so.

Disclaimer: This is not an officially supported Google project. Support and/or new releases may be limited.

With the exception of Vivado, everything used by this project is open source.

Required Hardware/OS

  • Currently, the only supported target is the Arty 35T board from Digilent.
  • The only supported host OS is Linux (Debian / Ubuntu).

If you want to try things out using Renode simulation, then you don't need either the Arty board or Vivado software. You can also perform Verilog-level cycle-accurate simulation with Verilator, but this is much slower.

Assumed Software

  • Vivado must be manually installed.

Other required packages will be checked for and, if on a Debian-based system, automatically installed by the setup script below.

Setup

Clone this repo, cd into it, then get run:

scripts/setup

Use

Build the SoC and load the bitstream onto Arty:

cd proj/proj_template
make prog

This builds the SoC with the default CFU from proj/proj_template. Later you'll make your own project, and rerun those make commands with a modified PROJ=proj_myproj definition.

Build a program and execute it on the SoC you just loaded onto the Arty

make load

To use renode to execute on a simulator on your own machine, execute:

make renode

Underlying open-source technology

  • LiteX: Open-source framework for assembling the SoC (CPU + peripherals)
  • VexRiscv: Open-source RISC-V soft CPU optimized for FPGAs
  • nMigen: Python toolbox for building digital hardware

Licensed under Apache-2.0 license

See the file LICENSE.

Contribution guidelines

If you want to contribute to CFU Playground, be sure to review the contribution guidelines. This project adheres to Google's code of conduct. By participating, you are expected to uphold this code.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 57.9%
  • Python 23.1%
  • C 14.5%
  • Makefile 2.3%
  • Verilog 1.2%
  • Shell 0.6%
  • Assembly 0.4%