diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/README.md b/VLSI24/submitted_notebooks/SJSystolicArray/README.md
new file mode 100644
index 00000000..e85bd4e0
--- /dev/null
+++ b/VLSI24/submitted_notebooks/SJSystolicArray/README.md
@@ -0,0 +1,2 @@
+# SiliconJackets Systolic Array
+This notebook goes through the process of design specification, simulation, and implementation of a Systolic Array with open-source tools and PDKs. The parallel computation and data reuse ability of a systolic array is crucial for the acceleration of neural networks, and this notebook with the reusable design aims to contribute to the hardware open-source community to enable more efficient ML applications. This project will explain the principles behind how a systolic array operates 2D convolution, demonstrate the performance of our implementation with image results, and show the final GDS generated with open-source flow. Additionally, to further demonstrate the feasibility of the open-source flow and our design, we are also submitting this systolic array design to the open-source silicon initiative, [Tiny Tapeout](https://tinytapeout.com/). This submission is completed by members of SiliconJackets. We are a student run organization at Georgia Tech that introduces students to semiconductor design, verification, and implementation through a large collaborative project. We are hoping to use this notebook as an example for future members of the club.
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/SystolicArray.ipynb b/VLSI24/submitted_notebooks/SJSystolicArray/SystolicArray.ipynb
new file mode 100644
index 00000000..c394e27d
--- /dev/null
+++ b/VLSI24/submitted_notebooks/SJSystolicArray/SystolicArray.ipynb
@@ -0,0 +1,1598 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VV2vOq0Oq_CF"
+ },
+ "source": [
+ "# Row Stationary Systolic Array With Openlane\n",
+ "\n",
+ "```\n",
+ "Copyright 2024 SiliconJackets @ Georgia Institute of Technology\n",
+ "SPDX-License-Identifier: GPL-3.0-or-later\n",
+ "```\n",
+ "\n",
+ "Running a 3x3 systolic array design inspired by [EYERISS](https://courses.cs.washington.edu/courses/cse550/21au/papers/CSE550.Eyeriss.pdf) design thru the [OpenLane](https://github.com/The-OpenROAD-Project/OpenLane/) GDS to RTL flow targeting the [open source SKY130 PDK](https://github.com/google/skywater-pdk/)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "pzPuBWmSrjK_"
+ },
+ "source": [
+ "|Name|Affiliation| Email |IEEE Member|SSCS Member|\n",
+ "|:--:|:----------:|:----------:|:----------:|:----------:|\n",
+ "|Zachary Ellis|Georgia Institute of Technology|zellis7@gatech.edu|Yes|Yes|\n",
+ "|Nealson Li|Georgia Institute of Technology|nealson@gatech.edu|Yes|Yes|\n",
+ "|Addison Elliott|Georgia Institute of Technology|addisonelliott@gatech.edu|No|No|\n",
+ "|Zeyan Wu|Georgia Institute of Technology|zwu477@gatech.edu|No|No|\n",
+ "|Jingsong Guo|Georgia Institute of Technology|guojingsong@gatech.edu|No|No|"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "wCAFcKzsBMgE"
+ },
+ "source": [
+ "This notebook goes through the process of design specification, simulation, and implementation of a Systolic Array with open-source tools and PDKs. The parallel computation and data reuse ability of a systolic array is crucial for the acceleration of neural networks, and this notebook with the reusable design aims to contribute to the hardware open-source community to enable more efficient ML applications. This project will explain the principles behind how a systolic array operates 2D convolution, demonstrate the performance of our implementation with image results, and show the final GDS generated with open-source flow. Additionally, to further demonstrate the feasibility of the open-source flow and our design, we are also submitting this systolic array design to the open-source silicon initiative, [Tiny Tapeout](https://tinytapeout.com/). This submission is completed by members of SiliconJackets. We are a student run organization at Georgia Tech that introduces students to semiconductor design, verification, and implementation through a large collaborative project. We are hoping to use this notebook as an example for future members of the club.\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "4RZvXIjpdd7d"
+ },
+ "source": [
+ "## Introduction\n",
+ "---\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cmDD-DzHEB0H"
+ },
+ "source": [
+ "In this noteboook, we will first explain what a systolic array is and its application by referencing the row stationary data flow introduced in [EYERISS](https://courses.cs.washington.edu/courses/cse550/21au/papers/CSE550.Eyeriss.pdf), which our design is losely based on. Then, the hardware specification and design of the high level architecture and processing unit are explained. We will then demonstrate the performance by simulating the hardware design to perform convolution for an edge detection task, and verify it with the software golden referrence. Lastly, the systolic array is pushed through [OpenLane](https://github.com/The-OpenROAD-Project/OpenLane/) RTL to GDS flow with the open-source [SKY130 PDK](https://github.com/google/skywater-pdk/)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "w-Nig77dBMgF"
+ },
+ "source": [
+ "## Systolic Array\n",
+ "---"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "c_0UEEX4fn75"
+ },
+ "source": [
+ "### What is a Systolic Array\n",
+ "\n",
+ "A systolic array is a 2D array of individual processing elements, which can each independently compute. Systolic arrays allow massive parallel computation which is largely useful for machine learning applications which can require a large number of MACs (Multiply and Accumulates) that are not dependent on one another. This array construction not only allows the massive parallel computation abilities, but also facilitates data reuse which reduces the memory bottleneck. By scheduling operations correctly for something like matrix multiplication of 2D+ convolution, PEs that are next to each other may use similar data in their operations. Because of the PE arrangement, this data can be passed between PEs directly, which means it only has to be fetched once from memory. In this notebook we present a 3x3 systolic array which uses row stationary dataflow and show how it passes data between PEs for maximum data reuse and minimum required bandwidth."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Convolutions and Systolic Arrays\n",
+ "In the realm of signal processing and machine learning, convolution plays a fundamental role in various applications such as image processing, video processing, and digital filtering. A two-dimensional convolution (Conv2D) is a mathematical operation involving sliding a filter matrix over a larger input matrix to produce an output, which is a fundamental operation in many algorithms, including those employed in computer vision and machine learning applications.\n",
+ "The convolution operation naturally allows for significant data reuse, as any value from the input, filter, or output matrices will be used many times in different multiplication/addition calculations (MACs). The key to exploiting this opportunity for efficient convolutions is to use highly parallel hardware to reuse data loaded from memory as much as possible before it's returned to memory.\n",
+ "In a systolic array, data is loaded from memory and flows through a grid of identical processing elements (PEs), being reused differently in each PE over different clock cycles. The flow of data through the system can be compared to the flow of blood being pumped through the circulatory system.\n",
+ "Systolic arrays are very useful for matrix multiplication (GEMM), and before the row-stationary (RS) dataflow was used, convolution operations were converted into large GEMM operations before they could flow through the array of PEs. However, with the modified RS dataflow which we implemented in our design, the systolic-like array directly computes the Conv2D of the input and filter matrix efficiently."
+ ],
+ "metadata": {
+ "id": "fwC2HFbnlfwD"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XwpY9TT-fs0C"
+ },
+ "source": [
+ "#### Row Stationary Dataflow"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "pgWw7tPeBMgF"
+ },
+ "source": [
+ "In a row stationary dataflow, the individual processing elements in a systolic array each have small amounts of scratchpad memory which is devoted to keep row value data in place while it is operated on. In this mode, each processing element computes a single output from a 1D convolution operation and then those partial sums are added down the columns for the final outputs. During the initial loading of the filter weights and row data, the full scratchpads need to be populated before any computation can occur, but as the convolution operation moves across the rows, only one new byte of data needs to be read per PE making this form of 2D convolution operation very memory efficient.\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "rk799ZBaBMgF"
+ },
+ "source": [
+ "#### Applications"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XJCgjjfJBMgH"
+ },
+ "source": [
+ "The main application for row stationary systolic arrays is 2D convolution. A convolution operation applies a filter kernel to a 2D input (for example an image) which then transforms the image to pull out specific details. A convolution may be able to pick out the edges of objects as shown in this notebook, or a chain of convolutions such as in a convolutional neural network may be able to filter out more complex shapes for object recognition or something like a dog. Using a systolic array to do 2D convolution is very quick and efficient which is why this hardware is the basis for many machine learning accelerators."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "nE8mvJTtf8Vj"
+ },
+ "source": [
+ "### How is the hardware designed?\n",
+ "\n",
+ "In order to show off the high memory efficiency of row stationary dataflow, the external memory connections for the top level of this design are very limited. With 2 read ports and 1 write port, this design is only able to read in 16 bits of data each cycle and write 8 bits. However, this data is reused across PEs allowing up to 9 MACs a cycle with different data combinations.\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IiS5mhFrBMgH"
+ },
+ "source": [
+ "#### Top Level Design"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xjEMHpZ2BMgH"
+ },
+ "source": [
+ "The top-level controller is responsible for controlling the timing of data read and operation start for all the PEs. Taking in the size of the input from the memory interface on the first cycle, the top-level controller then schedules the control signals for the individual PEs to read the data on the memory bus when it is their turn. When a PE has the data it needs, and it is it's turn in the sequence to start it's 1D convolution, the top-level controller asserts the start signal for that PE. Because of the staggering of start times, the state machines inside the PEs will run such that the data is automatically summed up the column of the PE and only one result is available for writing at a time.\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Wxlw9LhCBMgH"
+ },
+ "source": [
+ "#### PE Design"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "fMnogy0ZBMgH"
+ },
+ "source": [
+ "In order to reduce complexity and area the control structure inside each PE is kept very simple. When the PE sees a control signal to read in a new input or filter value from the top-level controller, it will read in a new value into the scratchpad and shift existing values over evicting the oldest value (with a depth of 3). Once the PE sees a start signal it will spend 3 cycles doing MACs with the scratchpad values and then sum with the input psum. With the PE start signals staggered across the array, the psum_o for one PE in a column becomes psum_i for the PE above it with the top PE presenting a final value at the output. These PEs always rely on the correct data being present at the correct time which is possible with the scheduling of the memory transactions and top-level controller.\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "PXz0NAIzgDob"
+ },
+ "source": [
+ "## Simulation\n",
+ "---"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "qxSqLtXoBMgH"
+ },
+ "source": [
+ "### Edge Detection with 2D Convolution accelarated by Systolic Array\n",
+ "\n",
+ "\n",
+ "To demonstrate our systolic array's ability to accelerate convolution operation, we are performing Canny edge detection which requires convolving an image. The edges of an image are enhanced after it is convolved with Sobel filters in x and y direction separately. The filters are 3x3 kernels show as below:\n",
+ "\n",
+ "$$\\text{Sobel Filter x} =\n",
+ "\\begin{bmatrix}\n",
+ "-0.5 & 0 & 0.5 \\\\\n",
+ "-1.0 & 0 & 1.0 \\\\\n",
+ "-0.5 & 0 & 0.5\n",
+ "\\end{bmatrix}$$\n",
+ "\n",
+ "$$\\text{Sobel Filter y} =\n",
+ "\\begin{bmatrix}\n",
+ "-0.5 & -1.0 & -0.5 \\\\\n",
+ "0 & 0 & 0 \\\\\n",
+ "0.5 & 1.0 & 0.5\n",
+ "\\end{bmatrix}$$\n",
+ "\n",
+ "The results are the first derivative in the x and y directions, $grad_x$ and $grad_y$, we can then iterate through all the pixels and calculate the intensity gradient of the image, which represents the edges, with:\n",
+ "\n",
+ "$$\\text{Grad Intensity} = \\sqrt{grad_x ^ 2 + grad_y ^ 2}$$\n",
+ "\n",
+ "We have a python implementation of the Canny edge detection algorithm as our golden reference to verify or systolic array design. Dedicated data sequence generator is developed with the hardware architecture data flow to process the image and kernels and generate input sequence to the systolic array. The example image that undergoes the convolution operation in both the hardware simulation and the software is:\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "The demonstration has following steps:\n",
+ "1. Install the software dependencies\n",
+ "2. Download the python and verilog files of our design\n",
+ "3. Run convolution in both software and hardware:\n",
+ "\n",
+ " a. Grayscale and resize the input image to 256 by 256\n",
+ " \n",
+ " b. In software, performe convolution and generate the golden image\n",
+ " \n",
+ " c. In hardware, performe convolution\n",
+ "\n",
+ " d. In software, process the hardware result and generate the output image\n",
+ "\n",
+ "4. Compare the golden image and the output image\n",
+ "\n",
+ "We would first demonstrate with the rubik's cube image, after this example you can upload any image to try it out, and see how well the systolic array accelerated edge detection is performing.\n",
+ "\n",
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [],
+ "metadata": {
+ "id": "DfQc3lWL4722"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "_nx0Xss6Nu3u",
+ "outputId": "8374f171-c07d-4e18-b3e7-6e3ecc8ceaf7"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Reading package lists... Done\n",
+ "Building dependency tree... Done\n",
+ "Reading state information... Done\n",
+ "verilator is already the newest version (4.038-1).\n",
+ "0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.\n",
+ "Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (2.2.1+cu121)\n",
+ "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch) (3.13.4)\n",
+ "Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch) (4.11.0)\n",
+ "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch) (1.12)\n",
+ "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch) (3.3)\n",
+ "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch) (3.1.3)\n",
+ "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch) (2023.6.0)\n",
+ "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch) (12.1.105)\n",
+ "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch) (12.1.105)\n",
+ "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch) (12.1.105)\n",
+ "Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /usr/local/lib/python3.10/dist-packages (from torch) (8.9.2.26)\n",
+ "Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.10/dist-packages (from torch) (12.1.3.1)\n",
+ "Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch) (11.0.2.54)\n",
+ "Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.10/dist-packages (from torch) (10.3.2.106)\n",
+ "Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch) (11.4.5.107)\n",
+ "Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch) (12.1.0.106)\n",
+ "Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in /usr/local/lib/python3.10/dist-packages (from torch) (2.19.3)\n",
+ "Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch) (12.1.105)\n",
+ "Requirement already satisfied: triton==2.2.0 in /usr/local/lib/python3.10/dist-packages (from torch) (2.2.0)\n",
+ "Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch) (12.4.127)\n",
+ "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch) (2.1.5)\n",
+ "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch) (1.3.0)\n",
+ "Requirement already satisfied: torchvision in /usr/local/lib/python3.10/dist-packages (0.17.1+cu121)\n",
+ "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from torchvision) (1.25.2)\n",
+ "Requirement already satisfied: torch==2.2.1 in /usr/local/lib/python3.10/dist-packages (from torchvision) (2.2.1+cu121)\n",
+ "Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3.10/dist-packages (from torchvision) (9.4.0)\n",
+ "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (3.13.4)\n",
+ "Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (4.11.0)\n",
+ "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (1.12)\n",
+ "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (3.3)\n",
+ "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (3.1.3)\n",
+ "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (2023.6.0)\n",
+ "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (12.1.105)\n",
+ "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (12.1.105)\n",
+ "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (12.1.105)\n",
+ "Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (8.9.2.26)\n",
+ "Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (12.1.3.1)\n",
+ "Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (11.0.2.54)\n",
+ "Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (10.3.2.106)\n",
+ "Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (11.4.5.107)\n",
+ "Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (12.1.0.106)\n",
+ "Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (2.19.3)\n",
+ "Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (12.1.105)\n",
+ "Requirement already satisfied: triton==2.2.0 in /usr/local/lib/python3.10/dist-packages (from torch==2.2.1->torchvision) (2.2.0)\n",
+ "Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch==2.2.1->torchvision) (12.4.127)\n",
+ "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch==2.2.1->torchvision) (2.1.5)\n",
+ "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch==2.2.1->torchvision) (1.3.0)\n",
+ "Requirement already satisfied: opencv-python in /usr/local/lib/python3.10/dist-packages (4.8.0.76)\n",
+ "Requirement already satisfied: numpy>=1.21.2 in /usr/local/lib/python3.10/dist-packages (from opencv-python) (1.25.2)\n",
+ "Requirement already satisfied: fxpmath in /usr/local/lib/python3.10/dist-packages (0.4.9)\n",
+ "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from fxpmath) (1.25.2)\n",
+ "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (1.25.2)\n"
+ ]
+ }
+ ],
+ "source": [
+ "#@title Install Dependencies {display-mode: \"form\"}\n",
+ "#@markdown Click the ▷ button to setup the simulation environment.\n",
+ "\n",
+ "#@markdown Main components we will install\n",
+ "\n",
+ "#@markdown * verilator : a free and open-source software tool which converts Verilog (a hardware description language) to a cycle-accurate behavioral model in C++ or SystemC.\n",
+ "#@markdown * pytorch : Used to format input data for the systolic array from the image files and do edge detection in software for the golden reference\n",
+ "#@markdown * opencv : Used for input image manipulation\n",
+ "#@markdown * fxpmath : This module helps emulate the floating point math behavior of our systolic array\n",
+ "\n",
+ "%load_ext autoreload\n",
+ "%autoreload 2\n",
+ "!apt-get install verilator\n",
+ "!pip install torch\n",
+ "!pip install torchvision\n",
+ "!pip install opencv-python\n",
+ "!pip install fxpmath\n",
+ "!pip install numpy"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "id": "ekdNyNjnBMgI",
+ "cellView": "form"
+ },
+ "outputs": [],
+ "source": [
+ "%%capture\n",
+ "\n",
+ "#@title Download Systolic Array Files\n",
+ "\n",
+ "#@markdown Click the ▷ button to download the rtl files.\n",
+ "#@markdown The files will be downloaded to the SytolicArray directory\n",
+ "#@markdown the file structure is described below:\n",
+ "\n",
+ "#@markdown * SystolicArray/src\n",
+ "#@markdown * python/\n",
+ "#@markdown * `canny.py` : python implementation of the Canny Edge Detection algorithm\n",
+ "#@markdown * `full_flow.py` : performes the edge detection on a given image with either software or hardware\n",
+ "#@markdown * `rubiks_cube.jpg` : the default example image\n",
+ "#@markdown * `seq_generator.py` : generates the Systolic Array input sequence from the image and kernel\n",
+ "#@markdown * `PE.sv` : This is the implementation of an individual processing element that is instantiated in a 3x3 array\n",
+ "#@markdown * `tb_top.cpp` : This is the verilator testbench used for running image data through the systolic array\n",
+ "#@markdown * `top.sv` : This is the top level file where all modules are instantiated and connected to each other / IO ports\n",
+ "#@markdown * `topLevelControl.sv` : This module generates control signals for all of the PEs based on the size of the input. It coordinates data loading as well as starts the 1D convolution operations in each PE\n",
+ "\n",
+ "%cd /content/\n",
+ "!rm -rf SystolicArray\n",
+ "!git clone https://github.com/SiliconJackets/sscs-ose-code-a-chip.github.io.git SystolicArray\n",
+ "!mv SystolicArray/VLSI24/submitted_notebooks/SJSystolicArray/src SystolicArray/\n",
+ "!mv SystolicArray/VLSI24/submitted_notebooks/SJSystolicArray/img SystolicArray/\n",
+ "!rm -rf SystolicArray/ISSCC23/\n",
+ "!rm -rf SystolicArray/ISSCC24/\n",
+ "!rm -rf SystolicArray/VLSI23/\n",
+ "!rm -rf SystolicArray/VLSI24/\n",
+ "!rm SystolicArray/*.md\n",
+ "!rm SystolicArray/LICENSE"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uE5D3C5IxlWy"
+ },
+ "source": [
+ "### Compile Verilator Testbench"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "QXfKFOdVTMJF",
+ "outputId": "59970d1c-b267-4f29-86cc-3c179e407884"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "/content\n",
+ "make: Entering directory '/content/obj_dir'\n",
+ "g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -Os -c -o tb_top.o ../SystolicArray/src/tb_top.cpp\n",
+ "g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -Os -c -o verilated.o /usr/share/verilator/include/verilated.cpp\n",
+ "g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -Os -c -o verilated_vcd_c.o /usr/share/verilator/include/verilated_vcd_c.cpp\n",
+ "/usr/bin/perl /usr/share/verilator/bin/verilator_includer -DVL_INCLUDE_OPT=include Vtop.cpp Vtop__Trace.cpp Vtop__Slow.cpp Vtop__Syms.cpp Vtop__Trace__Slow.cpp > Vtop__ALL.cpp\n",
+ "g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -Os -c -o Vtop__ALL.o Vtop__ALL.cpp\n",
+ "ar -cr Vtop__ALL.a Vtop__ALL.o\n",
+ "ranlib Vtop__ALL.a\n",
+ "g++ tb_top.o verilated.o verilated_vcd_c.o Vtop__ALL.a -o Vtop\n",
+ "make: Leaving directory '/content/obj_dir'\n"
+ ]
+ }
+ ],
+ "source": [
+ "%cd /content/\n",
+ "!rm -rf obj_dir\n",
+ "!verilator --trace --cc SystolicArray/src/top.sv SystolicArray/src/topLevelControl.sv SystolicArray/src/PE.sv --exe SystolicArray/src/tb_top.cpp\n",
+ "!make -C obj_dir -f Vtop.mk Vtop"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "z0vRPk19x0z2"
+ },
+ "source": [
+ "### Run 2D Convolution in both Software and Hardware\n",
+ "Using full_flow.py, we run an image through a software only version of the edge detection algorithm to get a golden reference. Then we generate `seq_x.txt`, and `seq_y.txt` which are used as input data for the verilator testbench. The results from verilator are collected by the script and written to the image file `edge_rubiks_cube_sa.jpg` all files can be found in the `SystolicArray/src/python/` directory."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "c4xh0hf5m552",
+ "outputId": "7e05221b-0f99-4ff4-e604-e0df64151a99"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "/content/SystolicArray/src/python\n"
+ ]
+ }
+ ],
+ "source": [
+ "%cd /content/SystolicArray/src/python/\n",
+ "!python3 full_flow.py rubikscube"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "cellView": "form",
+ "id": "4X678sSQxyc2",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 305
+ },
+ "outputId": "b950f39d-2601-4522-85b3-18c6a28b2fdd"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Systolic Array Edge Detection')"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 5
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "\n"
+ },
+ "metadata": {}
+ }
+ ],
+ "source": [
+ "#@title Compare Results\n",
+ "\n",
+ "#@markdown Because the hardware is limited to 8 bit integer math, the output is not as bright as the software version, but it is still able to achieve a similar looking result\n",
+ "\n",
+ "\n",
+ "# code for displaying multiple images in one figure\n",
+ "\n",
+ "#import libraries\n",
+ "import cv2\n",
+ "from matplotlib import pyplot as plt\n",
+ "\n",
+ "# create figure\n",
+ "fig = plt.figure(figsize=(10, 7))\n",
+ "\n",
+ "# setting values to rows and column variables\n",
+ "rows = 1\n",
+ "columns = 3\n",
+ "\n",
+ "# reading images\n",
+ "Image1 = cv2.imread('/content/SystolicArray/src/python/rubiks_cube.jpg')\n",
+ "Image2 = cv2.imread('/content/SystolicArray/src/python/edge_rubiks_cube.jpg')\n",
+ "Image3 = cv2.imread('/content/SystolicArray/src/python/edge_rubiks_cube_sa.jpg')\n",
+ "\n",
+ "#Adds a subplot at the 1st position\n",
+ "fig.add_subplot(rows, columns, 1)\n",
+ "\n",
+ "# showing image\n",
+ "plt.imshow(Image1)\n",
+ "plt.axis('off')\n",
+ "plt.title(\"Original\")\n",
+ "\n",
+ "# Adds a subplot at the 2nd position\n",
+ "fig.add_subplot(rows, columns, 2)\n",
+ "\n",
+ "# showing image\n",
+ "plt.imshow(Image2)\n",
+ "plt.axis('off')\n",
+ "plt.title(\"Software Edge Detection\")\n",
+ "\n",
+ "# Adds a subplot at the 3rd position\n",
+ "fig.add_subplot(rows, columns, 3)\n",
+ "\n",
+ "# showing image\n",
+ "plt.imshow(Image3)\n",
+ "plt.axis('off')\n",
+ "plt.title(\"Systolic Array Edge Detection\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "t4auaKP0BMgJ"
+ },
+ "source": [
+ "### Try it yourself"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "lwsAIO0WBMgJ",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 67,
+ "referenced_widgets": [
+ "0e66c80f52844968a03503ce83a094d3",
+ "466bae39c60e4d9e959805ad72c5b0ca",
+ "69a7b45aa0c8418dbf29b03f181842f8",
+ "a190641ce2f2487b9acfc91e931f7ece",
+ "dbeebae2d6e54cffb3f2bb1ba751071d",
+ "330f9dbba2714e35bcc1335fd74f8dcd"
+ ]
+ },
+ "cellView": "form",
+ "outputId": "f15a0271-6179-4b49-a1c6-cfa5f094a2b5"
+ },
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "FileUpload(value={}, accept='.jpg', description='Upload')"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "a190641ce2f2487b9acfc91e931f7ece"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Image successfully uploaded!\n"
+ ]
+ }
+ ],
+ "source": [
+ "#@markdown Click the ▷ button to upload your own image for edge detection\n",
+ "\n",
+ "from google.colab import files\n",
+ "import ipywidgets as widgets\n",
+ "from IPython.display import display, clear_output\n",
+ "\n",
+ "UPLOADED = False\n",
+ "\n",
+ "def upload_image(_):\n",
+ " clear_output()\n",
+ " upload_widget = widgets.FileUpload(accept='.jpg', multiple=False)\n",
+ " display(upload_widget)\n",
+ " upload_widget.observe(save_image, names='value')\n",
+ "\n",
+ "def save_image(change):\n",
+ " global UPLOADED\n",
+ " if change.new:\n",
+ " uploaded_filename = next(iter(change.new))\n",
+ " content = change.new[uploaded_filename]['content']\n",
+ " with open('/content/SystolicArray/src/python/uploadedimage.jpg', 'wb') as f:\n",
+ " f.write(content)\n",
+ " UPLOADED = True\n",
+ " print('Image successfully uploaded!')\n",
+ " else:\n",
+ " print('Please select a file.')\n",
+ "\n",
+ "upload_button = widgets.Button(description=\"Upload Image\")\n",
+ "upload_button.on_click(upload_image)\n",
+ "display(upload_button)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "2XB-luVMBMgJ",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 305
+ },
+ "cellView": "form",
+ "outputId": "e14c7766-a732-4ea5-ba07-8237428c68d8"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "/content/SystolicArray/src/python\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "\n"
+ },
+ "metadata": {}
+ }
+ ],
+ "source": [
+ "#@markdown Click the ▷ button to start the demonstration with your image\n",
+ "\n",
+ "%cd /content/SystolicArray/src/python/\n",
+ "import cv2\n",
+ "from matplotlib import pyplot as plt\n",
+ "if not UPLOADED:\n",
+ " print(\"First, upload a jpg in the cell above\")\n",
+ "else:\n",
+ " !python3 full_flow.py userinput\n",
+ " #@markdown Because the hardware is limited to 8 bit integer math, the output is not as bright as the software version, but it is still able to achieve a similar looking result\n",
+ "\n",
+ "\n",
+ " # code for displaying multiple images in one figure\n",
+ "\n",
+ "\n",
+ " # create figure\n",
+ " fig = plt.figure(figsize=(10, 7))\n",
+ "\n",
+ " # setting values to rows and column variables\n",
+ " rows = 1\n",
+ " columns = 3\n",
+ "\n",
+ " # reading images\n",
+ " Image1 = cv2.imread('/content/SystolicArray/src/python/uploadedimage.jpg')\n",
+ " Image2 = cv2.imread('/content/SystolicArray/src/python/edge_uploadedimage.jpg')\n",
+ " Image3 = cv2.imread('/content/SystolicArray/src/python/edge_uploadedimage_sa.jpg')\n",
+ "\n",
+ " #Adds a subplot at the 1st position\n",
+ " fig.add_subplot(rows, columns, 1)\n",
+ "\n",
+ " # showing image\n",
+ " plt.imshow(Image1)\n",
+ " plt.axis('off')\n",
+ " plt.title(\"Original\")\n",
+ "\n",
+ " # Adds a subplot at the 2nd position\n",
+ " fig.add_subplot(rows, columns, 2)\n",
+ "\n",
+ " # showing image\n",
+ " plt.imshow(Image2)\n",
+ " plt.axis('off')\n",
+ " plt.title(\"Software Edge Detection\")\n",
+ "\n",
+ " # Adds a subplot at the 3rd position\n",
+ " fig.add_subplot(rows, columns, 3)\n",
+ "\n",
+ " # showing image\n",
+ " plt.imshow(Image3)\n",
+ " plt.axis('off')\n",
+ " plt.title(\"Systolic Array Edge Detection\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "FeDMr6K6BMgJ"
+ },
+ "source": [
+ "### RTL2GDS Flow"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#@markdown We need to remove the previously installed version of Verilator and also install libparse in order for OpenLane to function properly. In order for everything to run the first time in the notebook we will also need to restart the runtime. Once you click the ▷ button for this cell, at the bottom it will prompt you **Once deleted, variables cannot be recovered. Proceed (y/[n])?** Please type y\n",
+ "!apt remove -y verilator\n",
+ "!pip install libparse\n",
+ "%reset"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "xY2thuIDURi7",
+ "outputId": "2ba8a375-25d5-4614-a5bc-00fe55ab3420",
+ "cellView": "form"
+ },
+ "execution_count": 10,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Reading package lists... Done\n",
+ "Building dependency tree... Done\n",
+ "Reading state information... Done\n",
+ "Package 'verilator' is not installed, so not removed\n",
+ "The following packages were automatically installed and are no longer required:\n",
+ " libsystemc libsystemc-dev\n",
+ "Use 'apt autoremove' to remove them.\n",
+ "0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.\n",
+ "Requirement already satisfied: libparse in ./conda-env/lib/python3.7/site-packages (0.3.1)\n",
+ "Requirement already satisfied: wheel in ./conda-env/lib/python3.7/site-packages (from libparse) (0.38.4)\n",
+ "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
+ "\u001b[0mOnce deleted, variables cannot be recovered. Proceed (y/[n])? y\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Y6snMoxpslsr",
+ "outputId": "3db41f66-6e1d-4252-a26f-e9abb77d7227"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Reading package lists... Done\n",
+ "Building dependency tree... Done\n",
+ "Reading state information... Done\n",
+ "Package 'verilator' is not installed, so not removed\n",
+ "The following packages were automatically installed and are no longer required:\n",
+ " libsystemc libsystemc-dev\n",
+ "Use 'apt autoremove' to remove them.\n",
+ "0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.\n",
+ "Empty environment created at prefix: /content/conda-env\n",
+ "\n",
+ "Pinned packages:\n",
+ " - python 3.7*\n",
+ "\n",
+ "\n",
+ "Transaction\n",
+ "\n",
+ " Prefix: /content/conda-env\n",
+ "\n",
+ " Updating specs:\n",
+ "\n",
+ " - openlane=2023.11.03_0_gf4f8dad8\n",
+ " - open_pdks.sky130a=1.0.458_0_g8c68aca\n",
+ " - openroad=2.0_10927_g0922eecb9\n",
+ " - verilator=5.018_57_ga022b672a\n",
+ "\n",
+ "\n",
+ " Package Version Build Channel Size\n",
+ "──────────────────────────────────────────────────────────────────────────────────────────────────────────────\n",
+ " Install:\n",
+ "──────────────────────────────────────────────────────────────────────────────────────────────────────────────\n",
+ "\n",
+ " \u001b[32m+ open_pdks.sky130a \u001b[0m 1.0.458_0_g8c68aca 20231104_052339 litex-hub\u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ _libgcc_mutex \u001b[0m 0.1 main main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libstdcxx-ng \u001b[0m 11.2.0 h1234567_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ ca-certificates \u001b[0m 2024.3.11 h06a4308_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libstdcxx-devel_linux-64 \u001b[0m 11.2.0 h1234567_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libgcc-devel_linux-64 \u001b[0m 11.2.0 h1234567_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ ld_impl_linux-64 \u001b[0m 2.38 h1181459_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libgomp \u001b[0m 11.2.0 h1234567_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ _openmp_mutex \u001b[0m 5.1 1_gnu main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libgcc-ng \u001b[0m 11.2.0 h1234567_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libuuid \u001b[0m 1.41.5 h5eee18b_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ yaml \u001b[0m 0.1.7 had09818_2 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ gmp \u001b[0m 6.2.1 h295c915_3 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ lz4-c \u001b[0m 1.9.4 h6a678d5_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ bzip2 \u001b[0m 1.0.8 h5eee18b_5 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libev \u001b[0m 4.33 h7f8727e_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ c-ares \u001b[0m 1.19.1 h5eee18b_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libffi \u001b[0m 3.4.4 h6a678d5_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ fmt \u001b[0m 8.1.1 hd09550d_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ xz \u001b[0m 5.4.6 h5eee18b_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ pixman \u001b[0m 0.40.0 h7f8727e_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ ncurses \u001b[0m 6.4 h6a678d5_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ openssl \u001b[0m 1.1.1w h7f8727e_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libxcb \u001b[0m 1.15 h7f8727e_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ jpeg \u001b[0m 9e h5eee18b_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ icu \u001b[0m 58.2 he6710b0_3 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libiconv \u001b[0m 1.16 h7f8727e_2 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ expat \u001b[0m 2.5.0 h6a678d5_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ m4 \u001b[0m 1.4.18 h4e445db_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ zlib \u001b[0m 1.2.13 h5eee18b_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ spdlog \u001b[0m 1.9.2 hd09550d_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libedit \u001b[0m 3.1.20230828 h5eee18b_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ readline \u001b[0m 8.2 h5eee18b_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libssh2 \u001b[0m 1.10.0 h37d81fd_2 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ flex \u001b[0m 2.6.4 ha10e3a4_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ zstd \u001b[0m 1.5.5 hc292b87_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libnghttp2 \u001b[0m 1.52.0 ha637b67_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libpng \u001b[0m 1.6.39 h5eee18b_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ pcre2 \u001b[0m 10.42 hebb0a14_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ tk \u001b[0m 8.6.12 h1ccaba5_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ krb5 \u001b[0m 1.20.1 h568e23c_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ sqlite \u001b[0m 3.41.2 h5eee18b_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ bison \u001b[0m 3.7.5 h2531618_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libboost \u001b[0m 1.73.0 h28710b8_12 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ freetype \u001b[0m 2.12.1 h4a9f257_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libglib \u001b[0m 2.78.4 hdc74915_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ ruby \u001b[0m 2.5.1 haf1161a_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libcurl \u001b[0m 8.2.1 h91b91d3_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ python \u001b[0m 3.7.16 h7a1cb2a_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ glib-tools \u001b[0m 2.78.4 h6a678d5_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ curl \u001b[0m 8.2.1 h37d81fd_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ certifi \u001b[0m 2022.12.7 py37h06a4308_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ wheel \u001b[0m 0.38.4 py37h06a4308_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libgit2 \u001b[0m 1.6.4 ha637b67_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ setuptools \u001b[0m 65.6.3 py37h06a4308_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ pip \u001b[0m 22.3.1 py37h06a4308_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ pyyaml \u001b[0m 5.3.1 py37h7b6447c_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ zipp \u001b[0m 3.11.0 py37h06a4308_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ glib \u001b[0m 2.78.4 h6a678d5_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ gstreamer \u001b[0m 1.14.1 h5eee18b_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ dbus \u001b[0m 1.13.18 hb2f20db_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ gst-plugins-base \u001b[0m 1.14.1 h6a678d5_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ _sysroot_linux-64_curr_repodata_hack\u001b[0m 3 haa98f57_10 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ flit-core \u001b[0m 3.6.0 pyhd3eb1b0_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ kernel-headers_linux-64 \u001b[0m 3.10.0 h57e8cba_10 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ sysroot_linux-64 \u001b[0m 2.17 h57e8cba_10 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ libxml2 \u001b[0m 2.9.9 20220706_155948 litex-hub\u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ netgen \u001b[0m 1.5.272_0_g178af5f 20240223_100318 litex-hub\u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ yosys \u001b[0m 0.38_93_g84116c9a3 20240223_100318_py37 litex-hub\u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ tcllib \u001b[0m 1_21_150_g102aa4b6e 20240223_100318 litex-hub\u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ typing_extensions \u001b[0m 4.4.0 py37h06a4308_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ binutils_impl_linux-64 \u001b[0m 2.38 h2a08ee3_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ fontconfig \u001b[0m 2.13.0 h9420a91_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ importlib-metadata \u001b[0m 4.11.3 py37h06a4308_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ gcc_impl_linux-64 \u001b[0m 11.2.0 h1234567_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ binutils_linux-64 \u001b[0m 2.38.0 hc2dff05_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ cairo \u001b[0m 1.14.12 h8948797_3 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ qt \u001b[0m 5.9.7 h5867ecd_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ click \u001b[0m 8.0.4 py37h06a4308_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ gxx_impl_linux-64 \u001b[0m 11.2.0 h1234567_1 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ gcc_linux-64 \u001b[0m 11.2.0 h5c386dc_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ gxx_linux-64 \u001b[0m 11.2.0 hc2dff05_0 main \u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ magic \u001b[0m 8.3.450_0_g2133660 20231104_052339 litex-hub\u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ klayout \u001b[0m 0.28.17_212_gfa14afbbf 20240223_100318_py37 litex-hub\u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ openroad \u001b[0m 2.0_10927_g0922eecb9 20231104_052339_py37 litex-hub\u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ verilator \u001b[0m 5.018_57_ga022b672a 20231104_052339 litex-hub\u001b[32m Cached\u001b[0m\n",
+ " \u001b[32m+ openlane \u001b[0m 2023.11.03_0_gf4f8dad8 20231104_052339_py37 litex-hub\u001b[32m Cached\u001b[0m\n",
+ "\n",
+ " Summary:\n",
+ "\n",
+ " Install: 87 packages\n",
+ "\n",
+ " Total download: 0 B\n",
+ "\n",
+ "──────────────────────────────────────────────────────────────────────────────────────────────────────────────\n",
+ "\n",
+ "\n",
+ "\n",
+ "Transaction starting\n",
+ "Linking open_pdks.sky130a-1.0.458_0_g8c68aca-20231104_052339\n",
+ "Linking _libgcc_mutex-0.1-main\n",
+ "Linking libstdcxx-ng-11.2.0-h1234567_1\n",
+ "Linking ca-certificates-2024.3.11-h06a4308_0\n",
+ "Linking libstdcxx-devel_linux-64-11.2.0-h1234567_1\n",
+ "Linking libgcc-devel_linux-64-11.2.0-h1234567_1\n",
+ "Linking ld_impl_linux-64-2.38-h1181459_1\n",
+ "Linking libgomp-11.2.0-h1234567_1\n",
+ "Linking _openmp_mutex-5.1-1_gnu\n",
+ "Linking libgcc-ng-11.2.0-h1234567_1\n",
+ "Linking libuuid-1.41.5-h5eee18b_0\n",
+ "Linking yaml-0.1.7-had09818_2\n",
+ "Linking gmp-6.2.1-h295c915_3\n",
+ "Linking lz4-c-1.9.4-h6a678d5_0\n",
+ "Linking bzip2-1.0.8-h5eee18b_5\n",
+ "Linking libev-4.33-h7f8727e_1\n",
+ "Linking c-ares-1.19.1-h5eee18b_0\n",
+ "Linking libffi-3.4.4-h6a678d5_0\n",
+ "Linking fmt-8.1.1-hd09550d_1\n",
+ "Linking xz-5.4.6-h5eee18b_0\n",
+ "Linking pixman-0.40.0-h7f8727e_1\n",
+ "Linking ncurses-6.4-h6a678d5_0\n",
+ "Linking openssl-1.1.1w-h7f8727e_0\n",
+ "Linking libxcb-1.15-h7f8727e_0\n",
+ "Linking jpeg-9e-h5eee18b_1\n",
+ "Linking icu-58.2-he6710b0_3\n",
+ "Linking libiconv-1.16-h7f8727e_2\n",
+ "Linking expat-2.5.0-h6a678d5_0\n",
+ "Linking m4-1.4.18-h4e445db_0\n",
+ "Linking zlib-1.2.13-h5eee18b_0\n",
+ "Linking spdlog-1.9.2-hd09550d_0\n",
+ "Linking libedit-3.1.20230828-h5eee18b_0\n",
+ "Linking readline-8.2-h5eee18b_0\n",
+ "Linking libssh2-1.10.0-h37d81fd_2\n",
+ "Linking flex-2.6.4-ha10e3a4_1\n",
+ "Linking zstd-1.5.5-hc292b87_0\n",
+ "Linking libnghttp2-1.52.0-ha637b67_1\n",
+ "Linking libpng-1.6.39-h5eee18b_0\n",
+ "Linking pcre2-10.42-hebb0a14_0\n",
+ "Linking tk-8.6.12-h1ccaba5_0\n",
+ "Linking krb5-1.20.1-h568e23c_1\n",
+ "Linking sqlite-3.41.2-h5eee18b_0\n",
+ "Linking bison-3.7.5-h2531618_1\n",
+ "Linking libboost-1.73.0-h28710b8_12\n",
+ "Linking freetype-2.12.1-h4a9f257_0\n",
+ "Linking libglib-2.78.4-hdc74915_0\n",
+ "Linking ruby-2.5.1-haf1161a_0\n",
+ "Linking libcurl-8.2.1-h91b91d3_0\n",
+ "Linking python-3.7.16-h7a1cb2a_0\n",
+ "Linking glib-tools-2.78.4-h6a678d5_0\n",
+ "Linking curl-8.2.1-h37d81fd_0\n",
+ "Linking certifi-2022.12.7-py37h06a4308_0\n",
+ "Linking wheel-0.38.4-py37h06a4308_0\n",
+ "Linking libgit2-1.6.4-ha637b67_0\n",
+ "Linking setuptools-65.6.3-py37h06a4308_0\n",
+ "Linking pip-22.3.1-py37h06a4308_0\n",
+ "Linking pyyaml-5.3.1-py37h7b6447c_0\n",
+ "Linking zipp-3.11.0-py37h06a4308_0\n",
+ "Linking glib-2.78.4-h6a678d5_0\n",
+ "Linking gstreamer-1.14.1-h5eee18b_1\n",
+ "Linking dbus-1.13.18-hb2f20db_0\n",
+ "Linking gst-plugins-base-1.14.1-h6a678d5_1\n",
+ "Linking _sysroot_linux-64_curr_repodata_hack-3-haa98f57_10\n",
+ "Linking flit-core-3.6.0-pyhd3eb1b0_0\n",
+ "Linking kernel-headers_linux-64-3.10.0-h57e8cba_10\n",
+ "Linking sysroot_linux-64-2.17-h57e8cba_10\n",
+ "Linking libxml2-2.9.9-20220706_155948\n",
+ "Linking netgen-1.5.272_0_g178af5f-20240223_100318\n",
+ "Linking yosys-0.38_93_g84116c9a3-20240223_100318_py37\n",
+ "Linking tcllib-1_21_150_g102aa4b6e-20240223_100318\n",
+ "Linking typing_extensions-4.4.0-py37h06a4308_0\n",
+ "Linking binutils_impl_linux-64-2.38-h2a08ee3_1\n",
+ "\u001b[33m\u001b[1mwarning libmamba\u001b[m [binutils_impl_linux-64-2.38-h2a08ee3_1] The following files were already present in the environment:\n",
+ " - lib/liblsan.so\n",
+ " - lib/liblsan.so.0\n",
+ " - lib/liblsan.so.0.0.0\n",
+ " - share/info/libgomp.info\n",
+ " - share/info/libquadmath.info\n",
+ " - share/licenses/gcc-libs/RUNTIME.LIBRARY.EXCEPTION\n",
+ " - share/licenses/gcc-libs/RUNTIME.LIBRARY.EXCEPTION.gomp_copy\n",
+ " - share/licenses/libstdc++/RUNTIME.LIBRARY.EXCEPTION\n",
+ "Linking fontconfig-2.13.0-h9420a91_0\n",
+ "Linking importlib-metadata-4.11.3-py37h06a4308_0\n",
+ "Linking gcc_impl_linux-64-11.2.0-h1234567_1\n",
+ "Linking binutils_linux-64-2.38.0-hc2dff05_0\n",
+ "Linking cairo-1.14.12-h8948797_3\n",
+ "Linking qt-5.9.7-h5867ecd_1\n",
+ "Linking click-8.0.4-py37h06a4308_0\n",
+ "Linking gxx_impl_linux-64-11.2.0-h1234567_1\n",
+ "Linking gcc_linux-64-11.2.0-h5c386dc_0\n",
+ "Linking gxx_linux-64-11.2.0-hc2dff05_0\n",
+ "Linking magic-8.3.450_0_g2133660-20231104_052339\n",
+ "Linking klayout-0.28.17_212_gfa14afbbf-20240223_100318_py37\n",
+ "Linking openroad-2.0_10927_g0922eecb9-20231104_052339_py37\n",
+ "Linking verilator-5.018_57_ga022b672a-20231104_052339\n",
+ "Linking openlane-2023.11.03_0_gf4f8dad8-20231104_052339_py37\n",
+ "\n",
+ "Transaction finished\n",
+ "\n",
+ "To activate this environment, use:\n",
+ "\n",
+ " micromamba activate /content/conda-env\n",
+ "\n",
+ "Or to execute a single command in this environment, use:\n",
+ "\n",
+ " micromamba run -p /content/conda-env mycommand\n",
+ "\n",
+ "\u001b[33m\u001b[1mwarning libmamba\u001b[m [libblas-3.9.0-16_linux64_openblas] The following files were already present in the environment:\n",
+ " - lib/libblas.so\n",
+ "\u001b[33m\u001b[1mwarning libmamba\u001b[m [libcblas-3.9.0-16_linux64_openblas] The following files were already present in the environment:\n",
+ " - lib/libcblas.so\n",
+ "\u001b[33m\u001b[1mwarning libmamba\u001b[m [liblapack-3.9.0-16_linux64_openblas] The following files were already present in the environment:\n",
+ " - lib/liblapack.so\n",
+ "Collecting libparse\n",
+ " Using cached libparse-0.3.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)\n",
+ "Requirement already satisfied: wheel in ./conda-env/lib/python3.7/site-packages (from libparse) (0.38.4)\n",
+ "Installing collected packages: libparse\n",
+ "Successfully installed libparse-0.3.1\n",
+ "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
+ "\u001b[0menv: CONDA_PREFIX=/content/conda-env\n",
+ "env: PATH=/content/conda-env/bin:/content/conda-env/bin:/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin\n"
+ ]
+ }
+ ],
+ "source": [
+ "#@title Install Dependencies {display-mode: \"form\"}\n",
+ "#@markdown Click the ▷ button to setup the digital design environment based on [conda-eda](https://github.com/hdl/conda-eda).\n",
+ "\n",
+ "#@markdown Main components we will install\n",
+ "\n",
+ "#@markdown * Open_pdks.sky130a : a PDK installer for open-source EDA tools.\n",
+ "#@markdown * Openlane : an automated RTL to GDSII flow based on several components including OpenROAD, Yosys, Magic, Netgen, CVC, SPEF-Extractor, KLayout and a number of custom scripts for design exploration and optimization.\n",
+ "#@markdown * GDSTK : a C++ library for creation and manipulation of GDSII and OASIS files.\n",
+ "\n",
+ "!apt remove -y verilator\n",
+ "#openlane_version = 'custom_set' #@param {type:\"string\"}\n",
+ "#open_pdks_version = 'custom_set' #@param {type:\"string\"}\n",
+ "\n",
+ "#if openlane_version == 'latest':\n",
+ "# openlane_version = ''\n",
+ "#if open_pdks_version == 'latest':\n",
+ "# open_pdks_version = ''\n",
+ "\n",
+ "import os\n",
+ "import pathlib\n",
+ "\n",
+ "!curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xj bin/micromamba\n",
+ "conda_prefix_path = pathlib.Path('conda-env')\n",
+ "CONDA_PREFIX = str(conda_prefix_path.resolve())\n",
+ "!bin/micromamba create --yes --prefix $CONDA_PREFIX\n",
+ "!echo 'python ==3.7*' >> {CONDA_PREFIX}/conda-meta/pinned\n",
+ "!CI=0 bin/micromamba install --yes --prefix $CONDA_PREFIX \\\n",
+ " --channel litex-hub \\\n",
+ " --channel main \\\n",
+ " openlane={\"2023.11.03_0_gf4f8dad8\"} \\\n",
+ " open_pdks.sky130a={\"1.0.458_0_g8c68aca\"} \\\n",
+ " openroad={\"2.0_10927_g0922eecb9\"} \\\n",
+ " verilator={\"5.018_57_ga022b672a\"}\n",
+ "!bin/micromamba install --quiet \\\n",
+ " --yes \\\n",
+ " --prefix $CONDA_PREFIX \\\n",
+ " --channel conda-forge \\\n",
+ " --channel main \\\n",
+ " gdstk\n",
+ "\n",
+ "!pip install libparse libparse\n",
+ "PATH = os.environ['PATH']\n",
+ "%env CONDA_PREFIX={CONDA_PREFIX}\n",
+ "%env PATH={CONDA_PREFIX}/bin:{PATH}\n",
+ "#%reset"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "2JPlqKcqZjg8",
+ "outputId": "6dfd44e3-143a-4559-bfe9-54437a8f114b"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Overwriting config.json\n"
+ ]
+ }
+ ],
+ "source": [
+ "%%writefile config.json\n",
+ "{\n",
+ " \"DESIGN_NAME\": \"top\",\n",
+ " \"VERILOG_FILES\": \"dir::SystolicArray/src/*.sv\",\n",
+ " \"CLOCK_PERIOD\": 40,\n",
+ " \"CLOCK_NET\": \"clk\",\n",
+ " \"CLOCK_PORT\": \"clk\",\n",
+ "\n",
+ " \"FP_SIZING\": \"absolute\",\n",
+ " \"DIE_AREA\": \"0 0 480 200\",\n",
+ " \"PL_TARGET_DENSITY\": 0.8\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Run Flow\n",
+ "In the event that the flow fails due to a verilator (linter) or libparse (on step 34) error please restart runtime and rerun install dependencies. Just re-running install dependencies may work as well"
+ ],
+ "metadata": {
+ "id": "2fgWsyDvzvgD"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "pgpxBIVAaSIo",
+ "outputId": "1293ffa6-c504-4680-adb8-bb3f31eae7cf"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "env: PDK=sky130A\n",
+ "OpenLane 2023.11.03_0_gf4f8dad8-conda\n",
+ "All rights reserved. (c) 2020-2022 Efabless Corporation and contributors.\n",
+ "Available under the Apache License, version 2.0. See the LICENSE file for more details.\n",
+ "\n",
+ "\u001b[36m[INFO]: Using configuration in 'config.json'...\u001b[39m\n",
+ "\u001b[36m[INFO]: PDK Root: /content/conda-env/share/pdk\u001b[39m\n",
+ "\u001b[36m[INFO]: Process Design Kit: sky130A\u001b[39m\n",
+ "\u001b[36m[INFO]: Standard Cell Library: sky130_fd_sc_hd\u001b[39m\n",
+ "\u001b[36m[INFO]: Optimization Standard Cell Library: sky130_fd_sc_hd\u001b[39m\n",
+ "\u001b[36m[INFO]: Run Directory: /content/runs/RUN_2024.04.15_18.01.03\u001b[39m\n",
+ "\u001b[36m[INFO]: Saving runtime environment...\u001b[39m\n",
+ "\u001b[36m[INFO]: Preparing LEF files for the nom corner...\u001b[39m\n",
+ "\u001b[36m[INFO]: Preparing LEF files for the min corner...\u001b[39m\n",
+ "\u001b[36m[INFO]: Preparing LEF files for the max corner...\u001b[39m\n",
+ "\u001b[33m[WARNING]: PNR_SDC_FILE is not set. It is recommended to write a custom SDC file for the design. Defaulting to BASE_SDC_FILE\u001b[39m\n",
+ "\u001b[33m[WARNING]: SIGNOFF_SDC_FILE is not set. It is recommended to write a custom SDC file for the design. Defaulting to BASE_SDC_FILE\u001b[39m\n",
+ "\u001b[36m[INFO]: Running linter (Verilator) (log: runs/RUN_2024.04.15_18.01.03/logs/synthesis/linter.log)...\u001b[39m\n",
+ "\u001b[36m[INFO]: 0 errors found by linter\u001b[39m\n",
+ "\u001b[33m[WARNING]: 10 warnings found by linter\u001b[39m\n",
+ "[STEP 1]\n",
+ "\u001b[36m[INFO]: Running Synthesis (log: runs/RUN_2024.04.15_18.01.03/logs/synthesis/1-synthesis.log)...\u001b[39m\n",
+ "[STEP 2]\n",
+ "\u001b[36m[INFO]: Running Single-Corner Static Timing Analysis (log: runs/RUN_2024.04.15_18.01.03/logs/synthesis/2-sta.log)...\u001b[39m\n",
+ "[STEP 3]\n",
+ "\u001b[36m[INFO]: Running Initial Floorplanning (log: runs/RUN_2024.04.15_18.01.03/logs/floorplan/3-initial_fp.log)...\u001b[39m\n",
+ "\u001b[36m[INFO]: Floorplanned with width 468.74 and height 176.8.\u001b[39m\n",
+ "[STEP 4]\n",
+ "\u001b[36m[INFO]: Running IO Placement (log: runs/RUN_2024.04.15_18.01.03/logs/floorplan/4-io.log)...\u001b[39m\n",
+ "[STEP 5]\n",
+ "\u001b[36m[INFO]: Running Tap/Decap Insertion (log: runs/RUN_2024.04.15_18.01.03/logs/floorplan/5-tap.log)...\u001b[39m\n",
+ "\u001b[36m[INFO]: Power planning with power {VPWR} and ground {VGND}...\u001b[39m\n",
+ "[STEP 6]\n",
+ "\u001b[36m[INFO]: Generating PDN (log: runs/RUN_2024.04.15_18.01.03/logs/floorplan/6-pdn.log)...\u001b[39m\n",
+ "[STEP 7]\n",
+ "\u001b[36m[INFO]: Running Global Placement (log: runs/RUN_2024.04.15_18.01.03/logs/placement/7-global_skip_io.log)...\u001b[39m\n",
+ "[STEP 8]\n",
+ "\u001b[36m[INFO]: Running Single-Corner Static Timing Analysis (log: runs/RUN_2024.04.15_18.01.03/logs/placement/8-gpl_sta.log)...\u001b[39m\n",
+ "[STEP 9]\n",
+ "\u001b[36m[INFO]: Running IO Placement (log: runs/RUN_2024.04.15_18.01.03/logs/placement/9-io.log)...\u001b[39m\n",
+ "[STEP 10]\n",
+ "\u001b[36m[INFO]: Running Global Placement (log: runs/RUN_2024.04.15_18.01.03/logs/placement/10-global.log)...\u001b[39m\n",
+ "[STEP 11]\n",
+ "\u001b[36m[INFO]: Running Single-Corner Static Timing Analysis (log: runs/RUN_2024.04.15_18.01.03/logs/placement/11-gpl_sta.log)...\u001b[39m\n",
+ "[STEP 12]\n",
+ "\u001b[36m[INFO]: Running Placement Resizer Design Optimizations (log: runs/RUN_2024.04.15_18.01.03/logs/placement/12-resizer.log)...\u001b[39m\n",
+ "[STEP 13]\n",
+ "\u001b[36m[INFO]: Running Detailed Placement (log: runs/RUN_2024.04.15_18.01.03/logs/placement/13-detailed.log)...\u001b[39m\n",
+ "[STEP 14]\n",
+ "\u001b[36m[INFO]: Running Single-Corner Static Timing Analysis (log: runs/RUN_2024.04.15_18.01.03/logs/placement/14-dpl_sta.log)...\u001b[39m\n",
+ "[STEP 15]\n",
+ "\u001b[36m[INFO]: Running Clock Tree Synthesis (log: runs/RUN_2024.04.15_18.01.03/logs/cts/15-cts.log)...\u001b[39m\n",
+ "[STEP 16]\n",
+ "\u001b[36m[INFO]: Running Single-Corner Static Timing Analysis (log: runs/RUN_2024.04.15_18.01.03/logs/cts/16-cts_sta.log)...\u001b[39m\n",
+ "[STEP 17]\n",
+ "\u001b[36m[INFO]: Running Placement Resizer Timing Optimizations (log: runs/RUN_2024.04.15_18.01.03/logs/cts/17-resizer.log)...\u001b[39m\n",
+ "[STEP 18]\n",
+ "\u001b[36m[INFO]: Running Global Routing Resizer Design Optimizations (log: runs/RUN_2024.04.15_18.01.03/logs/routing/18-resizer_design.log)...\u001b[39m\n",
+ "[STEP 19]\n",
+ "\u001b[36m[INFO]: Running Single-Corner Static Timing Analysis (log: runs/RUN_2024.04.15_18.01.03/logs/routing/19-rsz_design_sta.log)...\u001b[39m\n",
+ "[STEP 20]\n",
+ "\u001b[36m[INFO]: Running Global Routing Resizer Timing Optimizations (log: runs/RUN_2024.04.15_18.01.03/logs/routing/20-resizer_timing.log)...\u001b[39m\n",
+ "[STEP 21]\n",
+ "\u001b[36m[INFO]: Running Single-Corner Static Timing Analysis (log: runs/RUN_2024.04.15_18.01.03/logs/routing/21-rsz_timing_sta.log)...\u001b[39m\n",
+ "[STEP 22]\n",
+ "\u001b[36m[INFO]: Running Global Routing (log: runs/RUN_2024.04.15_18.01.03/logs/routing/22-global.log)...\u001b[39m\n",
+ "\u001b[36m[INFO]: Starting OpenROAD Antenna Repair Iterations...\u001b[39m\n",
+ "[STEP 23]\n",
+ "\u001b[36m[INFO]: Writing Verilog (log: runs/RUN_2024.04.15_18.01.03/logs/routing/22-global_write_netlist.log)...\u001b[39m\n",
+ "[STEP 24]\n",
+ "\u001b[36m[INFO]: Running Single-Corner Static Timing Analysis (log: runs/RUN_2024.04.15_18.01.03/logs/routing/24-grt_sta.log)...\u001b[39m\n",
+ "[STEP 25]\n",
+ "\u001b[36m[INFO]: Running Fill Insertion (log: runs/RUN_2024.04.15_18.01.03/logs/routing/25-fill.log)...\u001b[39m\n",
+ "[STEP 26]\n",
+ "\u001b[36m[INFO]: Running Detailed Routing (log: runs/RUN_2024.04.15_18.01.03/logs/routing/26-detailed.log)...\u001b[39m\n",
+ "\u001b[36m[INFO]: No DRC violations after detailed routing.\u001b[39m\n",
+ "[STEP 27]\n",
+ "\u001b[36m[INFO]: Checking Wire Lengths (log: runs/RUN_2024.04.15_18.01.03/logs/routing/27-wire_lengths.log)...\u001b[39m\n",
+ "[STEP 28]\n",
+ "\u001b[36m[INFO]: Running SPEF Extraction at the min process corner (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/28-parasitics_extraction.min.log)...\u001b[39m\n",
+ "[STEP 29]\n",
+ "\u001b[36m[INFO]: Running Multi-Corner Static Timing Analysis at the min process corner (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/29-rcx_mcsta.min.log)...\u001b[39m\n",
+ "[STEP 30]\n",
+ "\u001b[36m[INFO]: Running SPEF Extraction at the max process corner (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/30-parasitics_extraction.max.log)...\u001b[39m\n",
+ "[STEP 31]\n",
+ "\u001b[36m[INFO]: Running Multi-Corner Static Timing Analysis at the max process corner (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/31-rcx_mcsta.max.log)...\u001b[39m\n",
+ "[STEP 32]\n",
+ "\u001b[36m[INFO]: Running SPEF Extraction at the nom process corner (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/32-parasitics_extraction.nom.log)...\u001b[39m\n",
+ "[STEP 33]\n",
+ "\u001b[36m[INFO]: Running Multi-Corner Static Timing Analysis at the nom process corner (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/33-rcx_mcsta.nom.log)...\u001b[39m\n",
+ "\u001b[33m[WARNING]: Module sky130_fd_sc_hd__tapvpwrvgnd_1 blackboxed during sta\u001b[39m\n",
+ "\u001b[33m[WARNING]: Module sky130_ef_sc_hd__decap_12 blackboxed during sta\u001b[39m\n",
+ "\u001b[33m[WARNING]: Module sky130_fd_sc_hd__fill_1 blackboxed during sta\u001b[39m\n",
+ "\u001b[33m[WARNING]: Module sky130_fd_sc_hd__fill_2 blackboxed during sta\u001b[39m\n",
+ "[STEP 34]\n",
+ "\u001b[36m[INFO]: Creating IR Drop Report (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/34-irdrop.log)...\u001b[39m\n",
+ "\u001b[33m[WARNING]: VSRC_LOC_FILES is not defined. The IR drop analysis will run, but the values may be inaccurate.\u001b[39m\n",
+ "[STEP 35]\n",
+ "\u001b[36m[INFO]: Running Magic to generate various views...\u001b[39m\n",
+ "\u001b[36m[INFO]: Streaming out GDSII with Magic (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/35-gdsii.log)...\u001b[39m\n",
+ "\u001b[36m[INFO]: Generating MAGLEF views...\u001b[39m\n",
+ "\u001b[36m[INFO]: Generating lef with Magic (/content/runs/RUN_2024.04.15_18.01.03/logs/signoff/35-lef.log)...\u001b[39m\n",
+ "[STEP 36]\n",
+ "\u001b[36m[INFO]: Streaming out GDSII with KLayout (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/36-gdsii-klayout.log)...\u001b[39m\n",
+ "\u001b[33m[WARNING]: 'runs/RUN_2024.04.15_18.01.03/results/signoff/top.klayout.gds' wasn't found. Skipping GDS XOR.\u001b[39m\n",
+ "[STEP 37]\n",
+ "\u001b[36m[INFO]: Running Magic Spice Export from LEF (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/37-spice.log)...\u001b[39m\n",
+ "[STEP 38]\n",
+ "\u001b[36m[INFO]: Writing Powered Verilog (logs: runs/RUN_2024.04.15_18.01.03/logs/signoff/38-write_powered_def.log, runs/RUN_2024.04.15_18.01.03/logs/signoff/38-write_powered_verilog.log)...\u001b[39m\n",
+ "[STEP 39]\n",
+ "\u001b[36m[INFO]: Writing Verilog (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/38-write_powered_verilog.log)...\u001b[39m\n",
+ "[STEP 40]\n",
+ "\u001b[36m[INFO]: Running LVS (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/40-lvs.lef.log)...\u001b[39m\n",
+ "[STEP 41]\n",
+ "\u001b[36m[INFO]: Running Magic DRC (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/41-drc.log)...\u001b[39m\n",
+ "\u001b[36m[INFO]: Converting Magic DRC database to various tool-readable formats...\u001b[39m\n",
+ "\u001b[36m[INFO]: No DRC violations after GDS streaming out.\u001b[39m\n",
+ "[STEP 42]\n",
+ "\u001b[36m[INFO]: Running OpenROAD Antenna Rule Checker (log: runs/RUN_2024.04.15_18.01.03/logs/signoff/42-arc.log)...\u001b[39m\n",
+ "\u001b[36m[INFO]: Saving current set of views in 'runs/RUN_2024.04.15_18.01.03/results/final'...\u001b[39m\n",
+ "\u001b[36m[INFO]: Saving runtime environment...\u001b[39m\n",
+ "\u001b[36m[INFO]: Generating final set of reports...\u001b[39m\n",
+ "\u001b[36m[INFO]: Created manufacturability report at 'runs/RUN_2024.04.15_18.01.03/reports/manufacturability.rpt'.\u001b[39m\n",
+ "\u001b[36m[INFO]: Created metrics report at 'runs/RUN_2024.04.15_18.01.03/reports/metrics.csv'.\u001b[39m\n",
+ "\u001b[33m[WARNING]: There are max fanout violations in the design at the Typical corner. Please refer to 'runs/RUN_2024.04.15_18.01.03/reports/signoff/33-sta-rcx_nom/multi_corner_sta.checks.rpt'.\u001b[39m\n",
+ "\u001b[36m[INFO]: There are no hold violations in the design at the Typical corner.\u001b[39m\n",
+ "\u001b[36m[INFO]: There are no setup violations in the design at the Typical corner.\u001b[39m\n",
+ "\u001b[32m[SUCCESS]: Flow complete.\u001b[39m\n",
+ "\u001b[36m[INFO]: Note that the following warnings have been generated:\u001b[39m\n",
+ "\u001b[33m[WARNING]: PNR_SDC_FILE is not set. It is recommended to write a custom SDC file for the design. Defaulting to BASE_SDC_FILE\n",
+ "[WARNING]: SIGNOFF_SDC_FILE is not set. It is recommended to write a custom SDC file for the design. Defaulting to BASE_SDC_FILE\n",
+ "[WARNING]: 10 warnings found by linter\n",
+ "[WARNING]: Module sky130_fd_sc_hd__tapvpwrvgnd_1 blackboxed during sta\n",
+ "[WARNING]: Module sky130_ef_sc_hd__decap_12 blackboxed during sta\n",
+ "[WARNING]: Module sky130_fd_sc_hd__fill_1 blackboxed during sta\n",
+ "[WARNING]: Module sky130_fd_sc_hd__fill_2 blackboxed during sta\n",
+ "[WARNING]: VSRC_LOC_FILES is not defined. The IR drop analysis will run, but the values may be inaccurate.\n",
+ "[WARNING]: 'runs/RUN_2024.04.15_18.01.03/results/signoff/top.klayout.gds' wasn't found. Skipping GDS XOR.\n",
+ "[WARNING]: There are max fanout violations in the design at the Typical corner. Please refer to 'runs/RUN_2024.04.15_18.01.03/reports/signoff/33-sta-rcx_nom/multi_corner_sta.checks.rpt'.\n",
+ "\u001b[39m\n"
+ ]
+ }
+ ],
+ "source": [
+ "%env PDK=sky130A\n",
+ "!flow.tcl -design ."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "gnz4w7-ofBv4",
+ "cellView": "form"
+ },
+ "outputs": [],
+ "source": [
+ "#@title View Results\n",
+ "#@markdown Click the ▷ button to generate an SVG from the GDS\n",
+ "#@markdown in our testing sometimes the svg does not show or is too large to render properly so we have converted to png offline for viewing. The result is displayed below\n",
+ "import pathlib\n",
+ "import gdstk\n",
+ "\n",
+ "gdss = sorted(pathlib.Path('runs').glob('*/results/final/gds/*.gds'))\n",
+ "library = gdstk.read_gds(gdss[-1])\n",
+ "top_cells = library.top_level()\n",
+ "top_cells[0].write_svg('systolicarray.svg')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ ""
+ ],
+ "metadata": {
+ "id": "Le05msrOFkVZ"
+ }
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.18"
+ },
+ "widgets": {
+ "application/vnd.jupyter.widget-state+json": {
+ "0e66c80f52844968a03503ce83a094d3": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "ButtonModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "ButtonModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "ButtonView",
+ "button_style": "",
+ "description": "Upload Image",
+ "disabled": false,
+ "icon": "",
+ "layout": "IPY_MODEL_466bae39c60e4d9e959805ad72c5b0ca",
+ "style": "IPY_MODEL_69a7b45aa0c8418dbf29b03f181842f8",
+ "tooltip": ""
+ }
+ },
+ "466bae39c60e4d9e959805ad72c5b0ca": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "69a7b45aa0c8418dbf29b03f181842f8": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "ButtonStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "ButtonStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "button_color": null,
+ "font_weight": ""
+ }
+ },
+ "a190641ce2f2487b9acfc91e931f7ece": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "FileUploadModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_counter": 1,
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "FileUploadModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "FileUploadView",
+ "accept": ".jpg",
+ "button_style": "",
+ "data": [
+ null
+ ],
+ "description": "Upload",
+ "description_tooltip": null,
+ "disabled": false,
+ "error": "",
+ "icon": "upload",
+ "layout": "IPY_MODEL_dbeebae2d6e54cffb3f2bb1ba751071d",
+ "metadata": [
+ {
+ "name": "Gibbon_Wallpaper.jpg",
+ "type": "image/jpeg",
+ "size": 280843,
+ "lastModified": 1709176092406
+ }
+ ],
+ "multiple": false,
+ "style": "IPY_MODEL_330f9dbba2714e35bcc1335fd74f8dcd"
+ }
+ },
+ "dbeebae2d6e54cffb3f2bb1ba751071d": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "330f9dbba2714e35bcc1335fd74f8dcd": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "ButtonStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "ButtonStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "button_color": null,
+ "font_weight": ""
+ }
+ }
+ }
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/img/Ctrl.png b/VLSI24/submitted_notebooks/SJSystolicArray/img/Ctrl.png
new file mode 100644
index 00000000..946e16c6
Binary files /dev/null and b/VLSI24/submitted_notebooks/SJSystolicArray/img/Ctrl.png differ
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/img/PE.png b/VLSI24/submitted_notebooks/SJSystolicArray/img/PE.png
new file mode 100644
index 00000000..af22ed36
Binary files /dev/null and b/VLSI24/submitted_notebooks/SJSystolicArray/img/PE.png differ
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/img/Top.png b/VLSI24/submitted_notebooks/SJSystolicArray/img/Top.png
new file mode 100644
index 00000000..9d140889
Binary files /dev/null and b/VLSI24/submitted_notebooks/SJSystolicArray/img/Top.png differ
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/img/systolic_array_flow.gif b/VLSI24/submitted_notebooks/SJSystolicArray/img/systolic_array_flow.gif
new file mode 100644
index 00000000..c389ad12
Binary files /dev/null and b/VLSI24/submitted_notebooks/SJSystolicArray/img/systolic_array_flow.gif differ
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/img/systolicarray.jpg b/VLSI24/submitted_notebooks/SJSystolicArray/img/systolicarray.jpg
new file mode 100644
index 00000000..148aecd0
Binary files /dev/null and b/VLSI24/submitted_notebooks/SJSystolicArray/img/systolicarray.jpg differ
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/img/systolicarray.png b/VLSI24/submitted_notebooks/SJSystolicArray/img/systolicarray.png
new file mode 100644
index 00000000..688e0e94
Binary files /dev/null and b/VLSI24/submitted_notebooks/SJSystolicArray/img/systolicarray.png differ
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/src/PE.sv b/VLSI24/submitted_notebooks/SJSystolicArray/src/PE.sv
new file mode 100644
index 00000000..282c6c4e
--- /dev/null
+++ b/VLSI24/submitted_notebooks/SJSystolicArray/src/PE.sv
@@ -0,0 +1,109 @@
+module PE
+ (
+ input logic clk_i, rstn_i,
+ input logic [9:0] psum_i,
+ input logic [7:0] filter_i,
+ input logic [7:0] ifmap_i,
+ input logic read_new_filter_val,
+ input logic read_new_ifmap_val,
+ input logic start_conv,
+ output logic [9:0] psum_o,
+ output logic psum_valid_o
+ );
+
+ //Scratchpad regs
+ logic signed [7:0] filter_spad [0:2];
+ logic signed [7:0] ifmap_spad [0:2];
+ logic signed [9:0] psum_spad;
+
+ //psum buffer reg
+ logic signed [9:0] psum_buffer;
+
+ //datapath wires
+ // logic signed [DATA_SIZE-1:0] mult_input_filter, mult_input_ifmap; //wires between regs and multiplier
+ logic signed [15:0] mult_out_raw; //full multiplication result
+ logic signed [9:0] mult_out_trunc;
+ logic signed [9:0] adder_input, adder_output, psum_spad_input; // result of multiplexor. chooses either result of MAC or the psum from above PE to go to adder
+
+ //counter reg and wires
+ logic [1:0] counter; //Tells which regs to use in scratchpad
+ logic [1:0] next_counter; // 1 + index
+ logic acc_psum;
+
+ //state reg and wire
+ logic next_calculating;
+ logic calculating;
+
+
+ always_comb begin
+ //============= Time to accumulate psum? ===============
+ acc_psum = (counter == 2'd3);
+
+ //============= Next State ==================
+ if ((!calculating && start_conv) || (calculating && !acc_psum)) next_calculating = '1;
+ else next_calculating = '0;
+
+ //============= Next Counter =================
+ next_counter = calculating ? counter + 1 : '0;
+
+ //============= Multiplication ===============
+ mult_out_raw = filter_spad[counter] * ifmap_spad[counter];
+ mult_out_trunc = mult_out_raw[15:6]; //truncate to 10 bits
+
+ //============= Accumulation ================
+ adder_input = acc_psum ? psum_i : mult_out_trunc;
+ adder_output = adder_input + psum_spad;
+ psum_spad_input = (calculating && !acc_psum) ? adder_output : '0;
+
+ //============= Set Output =================
+ psum_o = psum_buffer;
+ end
+
+ always_ff @(posedge clk_i, negedge rstn_i) begin
+ if (!rstn_i) begin
+ //============ set all the registers to 0 =========
+ counter <= '0;
+ for (int i = 0; i < 3; i++) begin
+ filter_spad[i] <= '0;
+ ifmap_spad[i] <= '0;
+ end
+ psum_spad <= '0;
+ psum_buffer <= '0;
+ calculating <= '0;
+ psum_valid_o <= '0;
+
+ end else begin
+ //========== update state ===========
+ calculating <= next_calculating;
+
+ //========== update counter =============
+ counter <= next_counter;
+
+ //========== update filter scratchpad =============
+ if (read_new_filter_val) begin
+ for (int i = 0; i < 2; i++) begin
+ filter_spad[i] <= filter_spad[i+1];
+ end
+ filter_spad[2] <= filter_i;
+ end
+
+ //========== update ifmap scratchpad =============
+ if (read_new_ifmap_val) begin
+ for (int i = 0; i < 2; i++) begin
+ ifmap_spad[i] <= ifmap_spad[i+1];
+ end
+ ifmap_spad[2] <= ifmap_i;
+ end
+
+ //========= update psum buffer ==========
+ if (acc_psum) psum_buffer <= adder_output;
+
+ //========= update psum scratchpad ======
+ psum_spad <= psum_spad_input;
+
+ //============= valid bit ===================
+ psum_valid_o <= acc_psum;
+ end
+ end
+
+endmodule
\ No newline at end of file
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/src/python/canny.py b/VLSI24/submitted_notebooks/SJSystolicArray/src/python/canny.py
new file mode 100644
index 00000000..49c43ccd
--- /dev/null
+++ b/VLSI24/submitted_notebooks/SJSystolicArray/src/python/canny.py
@@ -0,0 +1,220 @@
+'''
+Reference:
+https://towardsdatascience.com/implement-canny-edge-detection-from-scratch-with-pytorch-a1cccfa58bed
+'''
+import numpy as np
+import torch
+import torch.nn as nn
+import torchvision.transforms as transforms
+import cv2
+
+def get_gaussian_kernel(k=3, mu=0, sigma=1, normalize=True):
+ # compute 1 dimension gaussian
+ gaussian_1D = np.linspace(-1, 1, k)
+ # compute a grid distance from center
+ x, y = np.meshgrid(gaussian_1D, gaussian_1D)
+ distance = (x ** 2 + y ** 2) ** 0.5
+
+ # compute the 2 dimension gaussian
+ gaussian_2D = np.exp(-(distance - mu) ** 2 / (2 * sigma ** 2))
+ gaussian_2D = gaussian_2D / (2 * np.pi *sigma **2)
+
+ # normalize part (mathematically)
+ if normalize:
+ gaussian_2D = gaussian_2D / np.sum(gaussian_2D)
+ return gaussian_2D
+
+def get_sobel_kernel(k=3):
+ # get range
+ range = np.linspace(-(k // 2), k // 2, k)
+ # compute a grid the numerator and the axis-distances
+ x, y = np.meshgrid(range, range)
+ sobel_2D_numerator = x
+ sobel_2D_denominator = (x ** 2 + y ** 2)
+ sobel_2D_denominator[:, k // 2] = 1 # avoid division by zero
+ sobel_2D = sobel_2D_numerator / sobel_2D_denominator
+ return sobel_2D
+
+
+def get_thin_kernels(start=0, end=360, step=45):
+ k_thin = 3 # actual size of the directional kernel
+ # increase for a while to avoid interpolation when rotating
+ k_increased = k_thin + 2
+
+ # get 0° angle directional kernel
+ thin_kernel_0 = np.zeros((k_increased, k_increased))
+ thin_kernel_0[k_increased // 2, k_increased // 2] = 1
+ thin_kernel_0[k_increased // 2, k_increased // 2 + 1:] = -1
+
+ # rotate the 0° angle directional kernel to get the other ones
+ thin_kernels = []
+ for angle in range(start, end, step):
+ (h, w) = thin_kernel_0.shape
+ # get the center to not rotate around the (0, 0) coord point
+ center = (w // 2, h // 2)
+ # apply rotation
+ rotation_matrix = cv2.getRotationMatrix2D(center, angle, 1)
+ kernel_angle_increased = cv2.warpAffine(thin_kernel_0, rotation_matrix, (w, h), cv2.INTER_NEAREST)
+
+ # get the k=3 kerne
+ kernel_angle = kernel_angle_increased[1:-1, 1:-1]
+ is_diag = (abs(kernel_angle) == 1) # because of the interpolation
+ kernel_angle = kernel_angle * is_diag # because of the interpolation
+ thin_kernels.append(kernel_angle)
+ return thin_kernels
+
+
+def write_to_pt_file(data, filename, print_data=False):
+ torch.save(data, filename)
+ if print_data:
+ print(data)
+
+
+class CannyFilter(nn.Module):
+ def __init__(self,
+ k_gaussian=3,
+ mu=0,
+ sigma=1,
+ k_sobel=3,
+ use_cuda=False):
+ super(CannyFilter, self).__init__()
+ # device
+ self.device = 'cuda' if use_cuda else 'cpu'
+
+ # sobel
+ sobel_2D = get_sobel_kernel(k_sobel)
+ self.sobel_filter_x = nn.Conv2d(in_channels=1,
+ out_channels=1,
+ kernel_size=k_sobel,
+ padding=k_sobel // 2,
+ bias=False)
+ self.sobel_filter_y = nn.Conv2d(in_channels=1,
+ out_channels=1,
+ kernel_size=k_sobel,
+ padding=k_sobel // 2,
+ bias=False)
+ with torch.no_grad():
+ self.sobel_filter_x.weight.copy_(
+ torch.from_numpy(sobel_2D).unsqueeze(0).unsqueeze(0).float())
+ with torch.no_grad():
+ self.sobel_filter_y.weight.copy_(
+ torch.from_numpy(sobel_2D.T).unsqueeze(0).unsqueeze(0).float())
+
+
+ # thin
+ thin_kernels = get_thin_kernels()
+ directional_kernels = np.stack(thin_kernels)
+ self.directional_filter = nn.Conv2d(in_channels=1,
+ out_channels=8,
+ kernel_size=thin_kernels[0].shape,
+ padding=thin_kernels[0].shape[-1] // 2,
+ bias=False)
+ with torch.no_grad():
+ self.directional_filter.weight.copy_(
+ torch.from_numpy(directional_kernels).unsqueeze(1).float())
+
+ # hysteresis
+ hysteresis = np.ones((3, 3)) + 0.25
+ self.hysteresis = nn.Conv2d(in_channels=1,
+ out_channels=1,
+ kernel_size=3,
+ padding=1,
+ bias=False)
+ with torch.no_grad():
+ self.hysteresis.weight.copy_(
+ torch.from_numpy(hysteresis).unsqueeze(0).unsqueeze(0).float())
+
+
+ def forward(self, img, low_threshold=None, high_threshold=None, hysteresis=False,
+ use_sa=False, grad_x_sa=0, grad_y_sa=0):
+ # set the setps tensors
+ B, C, H, W = img.shape
+ grad_x = torch.zeros((B, 1, H, W)).to(self.device)
+ grad_y = torch.zeros((B, 1, H, W)).to(self.device)
+ grad_magnitude = torch.zeros((B, 1, H, W)).to(self.device)
+ grad_orientation = torch.zeros((B, 1, H, W)).to(self.device)
+
+ # sobel
+ if use_sa: # caculate the grads with Systolic Array
+ grad_x = grad_x_sa
+ grad_y = grad_y_sa
+ else: # calculate the grads with Python
+ for c in range(C):
+ soble_result_x = self.sobel_filter_x(img[:, c:c+1])
+ soble_result_y = self.sobel_filter_y(img[:, c:c+1])
+ grad_x = grad_x + soble_result_x
+ grad_y = grad_y + soble_result_y
+ write_to_pt_file(img[:, c:c+1], f'img_{c}.pt')
+ write_to_pt_file(soble_result_x, f'soble_result_x_{c}.pt')
+ write_to_pt_file(soble_result_y, f'soble_result_y_{c}.pt')
+ write_to_pt_file(self.sobel_filter_x.weight, f'soble_filter_x_weight.pt')
+ write_to_pt_file(self.sobel_filter_y.weight, f'soble_filter_y_weight.pt')
+
+ # thick edges
+ grad_x, grad_y = grad_x / C, grad_y / C
+ grad_magnitude = (grad_x ** 2 + grad_y ** 2) ** 0.5
+ grad_orientation = torch.atan(grad_y / grad_x)
+ grad_orientation = grad_orientation * (360 / np.pi) + 180 # convert to degree
+ grad_orientation = torch.round(grad_orientation / 45) * 45 # keep a split by 45
+
+ # thin edges
+ directional = self.directional_filter(grad_magnitude)
+ # get indices of positive and negative directions
+ positive_idx = (grad_orientation / 45) % 8
+ negative_idx = ((grad_orientation / 45) + 4) % 8
+ thin_edges = grad_magnitude.clone()
+ # non maximum suppression direction by direction
+ for pos_i in range(4):
+ neg_i = pos_i + 4
+ # get the oriented grad for the angle
+ is_oriented_i = (positive_idx == pos_i) * 1
+ is_oriented_i = is_oriented_i + (positive_idx == neg_i) * 1
+ pos_directional = directional[:, pos_i]
+ neg_directional = directional[:, neg_i]
+ selected_direction = torch.stack([pos_directional, neg_directional])
+ # get the local maximum pixels for the angle
+ is_max = selected_direction.min(dim=0)[0] > 0.0
+ is_max = torch.unsqueeze(is_max, dim=1)
+ # apply non maximum suppression
+ to_remove = (is_max == 0) * 1 * (is_oriented_i) > 0
+ thin_edges[to_remove] = 0.0
+
+ # thresholds
+ if low_threshold is not None:
+ low = thin_edges > low_threshold
+ if high_threshold is not None:
+ high = thin_edges > high_threshold
+ # get black/gray/white only
+ thin_edges = low * 0.5 + high * 0.5
+ if hysteresis:
+ # get weaks and check if they are high or not
+ weak = (thin_edges == 0.5) * 1
+ weak_is_high = (self.hysteresis(thin_edges) > 1) * weak
+ thin_edges = high * 1 + weak_is_high * 1
+ else:
+ thin_edges = low * 1
+ return grad_x, grad_y, grad_magnitude, grad_orientation, thin_edges
+
+
+def main():
+ # Load the input image
+ image = cv2.imread('rubiks_cube.jpg')
+ image = cv2.resize(image, (256, 256)) # original 256*256
+
+ gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
+
+
+ # Convert the image to Torch tensor
+ img_tensor = torch.from_numpy(image) # transform(image)
+ img_tensor = img_tensor.permute(2, 0, 1).unsqueeze(0)
+
+ # Run inference
+ model = CannyFilter()
+ grad_x, grad_y, grad_magnitude, grad_orientation, thin_edges = model(img_tensor.float())
+
+ # Save image results
+ cv2.imwrite('edge_rubiks_cube.jpg', grad_magnitude[0].permute(1, 2, 0).detach().numpy())
+
+
+if __name__ == '__main__':
+ main()
\ No newline at end of file
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/src/python/full_flow.py b/VLSI24/submitted_notebooks/SJSystolicArray/src/python/full_flow.py
new file mode 100644
index 00000000..ff48b884
--- /dev/null
+++ b/VLSI24/submitted_notebooks/SJSystolicArray/src/python/full_flow.py
@@ -0,0 +1,126 @@
+import torch
+import torch.nn.functional as F
+import cv2
+from canny import CannyFilter
+from seq_generator import (gen_load_seq_idx, gen_load_seq, binary, fp32_to_fxps86binary,
+ fp32_to_fxps86, add_result_seq, gen_load_result_seq)
+
+import os
+import sys
+
+if len(sys.argv) != 2:
+ print("Usage: python3 full_flow.py \n filename is either \"rubikscube\" or \"userinput\"")
+ sys.exit(1)
+filenamearg = sys.argv[1].strip()
+if filenamearg == "rubikscube":
+ name = 'rubiks_cube'
+elif filenamearg == "userinput":
+ name = 'uploadedimage'
+else:
+ print("Usage: python3 full_flow.py \n filename is either \"rubikscube\" or \"userinput\"")
+filename = name + '.jpg'
+edge_filename = 'edge_' + name + '.jpg'
+edge_sa_filename = 'edge_' + name + '_sa.jpg'
+
+
+filter_size = 3
+pad = filter_size // 2
+image_size = 256
+psum_size = int(image_size - 2*((filter_size-1)/2) + 2*pad)
+
+
+# Read image, resize and convert to grayscale
+image = cv2.imread(filename)
+image = cv2.resize(image, (image_size, image_size)) # original 256*256
+gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
+
+# Convert the image to Torch tensor
+img_tensor = torch.from_numpy(gray_image).unsqueeze(2) # transform(image)
+img_tensor = img_tensor.permute(2, 0, 1).unsqueeze(0)
+
+# Run inference
+model = CannyFilter()
+grad_x, grad_y, grad_magnitude, grad_orientation, thin_edges = model(img_tensor.float())
+
+# Save image results
+cv2.imwrite(edge_filename, grad_magnitude[0].permute(1, 2, 0).detach().numpy())
+
+# Generate load sequence for Systolic Array
+ifmap = torch.load('img_0.pt').squeeze()//2
+ifmap = F.pad(ifmap, (pad, pad, pad, pad), "constant", 0)
+psum = torch.zeros((psum_size, psum_size), dtype=torch.int8)
+load_seq_idx = gen_load_seq_idx(image_size+2, filter_size)
+load_seq_idx = add_result_seq(load_seq_idx)
+
+# Sequence for convolving image and sobel_filter_x to get grad_x
+filter = torch.load('soble_filter_x_weight.pt').squeeze()
+filter = fp32_to_fxps86binary(filter.squeeze())
+result_x = torch.load('soble_result_x_0.pt').squeeze()//2
+result_x = torch.clamp(result_x, min=-127, max=127)
+load_seq_x = gen_load_result_seq(ifmap, filter, psum, result_x, load_seq_idx)
+f = open("seq_x.txt", "w")
+f.write("255,214,0\n")
+for seq in torch.tensor(load_seq_x):
+ print_seq = ','.join([str(x) for x in seq.to(dtype=torch.uint8).tolist()])
+ f.write(f'{print_seq}\n')
+f.close()
+
+# Sequence for convolving image and sobel_filter_y to get grad_y
+filter = torch.load('soble_filter_y_weight.pt').squeeze()
+filter = fp32_to_fxps86binary(filter.squeeze())
+result_y = torch.load('soble_result_y_0.pt').squeeze()//2
+result_y = torch.clamp(result_y, min=-127, max=127)
+load_seq_y = gen_load_result_seq(ifmap, filter, psum, result_y, load_seq_idx)
+f = open("seq_y.txt", "w")
+f.write("255,214,0\n")
+for seq in torch.tensor(load_seq_y):
+ print_seq = ','.join([str(x) for x in seq.to(dtype=torch.uint8).tolist()])
+ f.write(f'{print_seq}\n')
+f.close()
+
+os.system("cp seq_x.txt /content/convInput.txt")
+os.system("/content/obj_dir/Vtop > /content/SystolicArray/src/python/seq_x_SA.txt")
+
+# Get the Systolic Array result
+result_x_seq_sa = []
+with open("seq_x_SA.txt", "r") as filestream:
+ filestream.readline()
+ for line in filestream:
+ line = line.strip()
+ if line:
+ elems = line.split(",")
+ result_x_seq_sa += [[int(elems[0]), int(elems[1]), int(elems[2])]]
+result_x_seq_sa = torch.tensor(result_x_seq_sa)
+result_x_sa = torch.zeros(result_x.shape, dtype=torch.int8)
+for seq, ele in zip(load_seq_idx, result_x_seq_sa):
+ if seq[2][0] != -1:
+ result_x_sa[seq[2][1], seq[2][2]] = ele[2]
+
+os.system("cp seq_y.txt /content/convInput.txt")
+os.system("/content/obj_dir/Vtop > /content/SystolicArray/src/python/seq_y_SA.txt")
+#os.system("rm /content/convInput.txt")
+
+result_y_seq_sa = []
+with open("seq_y_SA.txt", "r") as filestream:
+ filestream.readline()
+ for line in filestream:
+ line = line.strip()
+ if line:
+ elems = line.split(",")
+ result_y_seq_sa += [[int(elems[0]), int(elems[1]), int(elems[2])]]
+result_y_seq_sa = torch.tensor(result_y_seq_sa)
+result_y_sa = torch.zeros(result_y.shape, dtype=torch.int8)
+for seq, ele in zip(load_seq_idx, result_y_seq_sa):
+ if seq[2][0] != -1:
+ result_y_sa[seq[2][1], seq[2][2]] = ele[2]
+
+# Verify Systolic Array result
+compare = result_x_sa.eq(result_x)
+# print(f'Systolic Array Result Correct: {torch.all(compare)}')
+
+# Plot Systolic Array result
+grad_x, grad_y, grad_magnitude, grad_orientation, thin_edges = model(img_tensor.float(),
+ use_sa=True,
+ grad_x_sa=result_x_sa.unsqueeze(0).unsqueeze(0)*2,
+ grad_y_sa=result_y_sa.unsqueeze(0).unsqueeze(0)*2)
+cv2.imwrite(edge_sa_filename, grad_magnitude[0].permute(1, 2, 0).detach().numpy())
\ No newline at end of file
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/src/python/rubiks_cube.jpg b/VLSI24/submitted_notebooks/SJSystolicArray/src/python/rubiks_cube.jpg
new file mode 100644
index 00000000..052a789f
Binary files /dev/null and b/VLSI24/submitted_notebooks/SJSystolicArray/src/python/rubiks_cube.jpg differ
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/src/python/seq_generator.py b/VLSI24/submitted_notebooks/SJSystolicArray/src/python/seq_generator.py
new file mode 100644
index 00000000..cc3023db
--- /dev/null
+++ b/VLSI24/submitted_notebooks/SJSystolicArray/src/python/seq_generator.py
@@ -0,0 +1,231 @@
+
+import torch
+import argparse
+import numpy as np
+from fxpmath import Fxp
+
+INIT, LOAD, RELOAD, RSLT = 0, 1, 2, 3
+IFMP, FILT, PSUM = 0, 1, 2
+IN1, IN2 = 0, 1
+NUM_PE = 3
+
+def get_int8_matrix(m, n):
+ return torch.randint(-127, 127, (m, n), dtype=torch.int8)
+
+def get_rand01_matrix(m, n):
+ return torch.rand(m, n)
+
+def binary(x, reverse=False, bits=8):
+ if reverse: # for readability
+ mask = 2**torch.arange(bits-1,-1,-1).to(x.device, x.dtype)
+ else:
+ mask = 2**torch.arange(bits).to(x.device, x.dtype)
+ return x.unsqueeze(-1).bitwise_and(mask).ne(0).byte()
+
+
+def fp32_to_fxps86(matrix):
+ return Fxp(matrix.detach().numpy(), dtype='fxp-s8/6') # 8bit: 1b-sign, 1b-int, 6b-frac
+
+
+def fxps86_to_binary(matrix):
+ matrix = [[np.packbits(list(map(int, ele))) for ele in row] for row in matrix.bin()]
+ return torch.tensor(np.array(matrix)).squeeze()
+
+
+def fp32_to_fxps86binary(matrix):
+ return fxps86_to_binary(fp32_to_fxps86(matrix))
+
+
+def ceildiv(a, b):
+ return int(-(a // -b))
+
+
+def gen_load_seq_idx(ifmap_size, filter_size):
+ # data_idx = [matrix, row, col]
+ load_seq = []
+
+ edge = int((filter_size - 1)/2)
+ output_size = ifmap_size - edge*2
+ num_row_iter = ceildiv(output_size, NUM_PE)
+ row_coverage = NUM_PE+edge*2
+
+ state = INIT
+ for iter in range(num_row_iter+1):
+ row_offset = iter*NUM_PE # 0, 3, 5 ...
+ for ifmp_col in range(ifmap_size): # 0, 1 ..., 27
+ for row in range(row_coverage): # 0, 1, 2, 3, 4
+ target_row = row_offset + row
+ in1_idx = [IFMP, target_row, ifmp_col]
+ if state == INIT:
+ if row < NUM_PE: # 0, 1, 2, 3
+ in2_idx = [FILT, target_row, ifmp_col]
+ elif row == row_coverage-1: # 4
+ load_seq[-1][IN2] = [IFMP, target_row, ifmp_col]
+ if ifmp_col == (filter_size-1):
+ state = LOAD # enter load state sequence
+ in1_idx, in2_idx = [-1,-1,-1], [-1,-1,-1]
+ load_seq += [[in1_idx, in2_idx]]
+ continue
+ load_seq += [[in1_idx, in2_idx]]
+ elif state == LOAD:
+ if row < row_coverage-1:
+ in1_idx = [IFMP, target_row, ifmp_col]
+ in2_idx = [-1,-1,-1]
+ else:
+ in1_idx = [-1,-1,-1]
+ in2_idx = [IFMP, target_row, ifmp_col]
+ if row < NUM_PE: # PSUM load handling
+ if not (target_row >= output_size):
+ if row == 0:
+ load_seq[-1][IN1] = [PSUM, target_row, ifmp_col-3]
+ else:
+ load_seq[-1][IN2] = [PSUM, target_row, ifmp_col-3]
+ if (ifmp_col == (ifmap_size-1)) and (row == (row_coverage-1)) :
+ state = RELOAD
+ if target_row >= ifmap_size:
+ in1_idx, in2_idx = [-1,-1,-1], [-1,-1,-1]
+ load_seq += [[in1_idx, in2_idx]]
+ continue
+ load_seq += [[in1_idx, in2_idx]]
+ elif state == RELOAD:
+ if row < NUM_PE and ifmp_col==0: # PSUM load handling
+ if not (target_row-filter_size >= output_size):
+ if row == 0:
+ load_seq[-1][IN1] = [PSUM, target_row-filter_size, output_size-1]
+ else:
+ load_seq[-1][IN2] = [PSUM, target_row-filter_size, output_size-1]
+ if row == row_coverage-1: # 4
+ if not (target_row >= ifmap_size):
+ load_seq[-1][IN2] = [IFMP, target_row, ifmp_col]
+ if ifmp_col == (filter_size-1):
+ state = LOAD # enter load state sequence
+ in1_idx, in2_idx = [-1,-1,-1], [-1,-1,-1]
+ load_seq += [[in1_idx, in2_idx]]
+ continue
+ if target_row >= ifmap_size:
+ in1_idx, in2_idx = [-1,-1,-1], [-1,-1,-1]
+ load_seq += [[in1_idx, in2_idx]]
+ continue
+ in2_idx = [-1,-1,-1]
+ load_seq += [[in1_idx, in2_idx]]
+ return load_seq
+
+
+def add_result_seq(load_seq):
+ for idx, seq in enumerate(load_seq):
+ if len(seq)==3:
+ in1, in2, in3 = seq
+ else:
+ in1, in2 = seq
+ load_seq[idx] += [[-1,-1,-1]]
+ if in1[0]==PSUM:
+ load_seq[idx+3] += [[RSLT, in1[1], in1[2]]]
+ elif in2[0]==PSUM:
+ load_seq[idx+3] += [[RSLT, in2[1], in2[2]]]
+
+ return load_seq
+
+
+def gen_load_seq(ifmap, filter, psum, load_seq_idx):
+ load_seq = []
+ in_matrix = [ifmap, filter, psum]
+
+ for seq in load_seq_idx:
+ if seq[0][0] == -1:
+ in1 = 0
+ else:
+ in1 = in_matrix[seq[0][0]][seq[0][1]][seq[0][2]]
+ if seq[1][0] == -1:
+ in2 = 0
+ else:
+ in2 = in_matrix[seq[1][0]][seq[1][1]][seq[1][2]]
+ load_seq += [[in1, in2]]
+
+ return load_seq
+
+
+def gen_load_result_seq(ifmap, filter, psum, result, load_seq_idx):
+ load_seq = []
+ in_matrix = [ifmap, filter, psum, result]
+
+ for seq in load_seq_idx:
+ if seq[0][0] == -1:
+ in1 = 0
+ else:
+ in1 = in_matrix[seq[0][0]][seq[0][1]][seq[0][2]]
+ if seq[1][0] == -1:
+ in2 = 0
+ else:
+ in2 = in_matrix[seq[1][0]][seq[1][1]][seq[1][2]]
+ if seq[2][0] == -1:
+ in3 = 0
+ else:
+ in3 = in_matrix[seq[2][0]][seq[2][1]][seq[2][2]]
+ load_seq += [[in1, in2, in3]]
+
+ return load_seq
+
+
+def main():
+ parser = argparse.ArgumentParser(description='PyTorch INT8 Conv Example')
+ parser.add_argument('--m', type=int, default=6,
+ help='ifmap m x m (default: 6)')
+ parser.add_argument('--n', type=int, default=3,
+ help='filter n x n (default: 3)')
+
+ parser.add_argument('--no-binary', action='store_true', default=False,
+ help='print in binary 2s complement')
+ parser.add_argument('--no-cuda', action='store_true', default=False,
+ help='disables CUDA training')
+ parser.add_argument('--seed', type=int, default=1,
+ help='random seed (default: 1)')
+ args = parser.parse_args()
+ torch.manual_seed(args.seed)
+
+ use_cuda = not args.no_cuda and torch.cuda.is_available()
+ device = torch.device("cuda") if use_cuda else torch.device("cpu")
+
+ filter_fp32 = get_rand01_matrix(2,3)
+ filter_fxp = fp32_to_fxps86(filter_fp32)
+ filter_fxpbin = fxps86_to_binary(filter_fxp)
+ print(filter_fp32)
+ print(filter_fxp)
+ print(filter_fxpbin)
+
+ load_seq_idx =gen_load_seq_idx(6, 3)
+ # load_seq_idx = (torch.tensor(load_seq_idx) + 1).tolist()
+ for seq_idx in load_seq_idx:
+ print(seq_idx)
+
+ ifmap = get_int8_matrix(args.m, args.m)
+ filter = get_int8_matrix(args.n, args.n)
+ result = torch.nn.functional.conv2d(torch.unsqueeze(torch.unsqueeze(ifmap,0),0).type(torch.int32),
+ torch.unsqueeze(torch.unsqueeze(filter,0),0).type(torch.int32))
+ result = torch.squeeze(result)
+
+ psum = torch.zeros((4, 4), dtype=torch.int8)
+ load_seq = gen_load_seq(ifmap, filter, psum, load_seq_idx)
+ print("========================")
+ for seq in torch.tensor(load_seq):
+ print(seq.tolist())
+
+ print("========================")
+ for seq in binary(torch.tensor(load_seq)):
+ print(seq.tolist())
+
+ print("\n- ifmap - INT8 ---------------")
+ print(ifmap)
+ print("\n- filter - INT8 ---------------")
+ print(filter)
+ print("\n- Conv result - INT32 ----------")
+ print(result)
+ if not args.no_binary:
+ print("\n- ifmap - 2's ----------------")
+ print(binary(ifmap, reverse=True))
+ print("\n- filter - 2's ----------------")
+ print(binary(filter, reverse=True))
+ print("\n- result - 2's -----------")
+ print(binary(result, reverse=True, bits=32))
+
+if __name__ == '__main__':
+ main()
\ No newline at end of file
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/src/tb_top.cpp b/VLSI24/submitted_notebooks/SJSystolicArray/src/tb_top.cpp
new file mode 100644
index 00000000..57182b10
--- /dev/null
+++ b/VLSI24/submitted_notebooks/SJSystolicArray/src/tb_top.cpp
@@ -0,0 +1,123 @@
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include "Vtop.h"
+
+VerilatedVcdC* trace = NULL; // Waveform Generation
+static Vtop* top; // DUT
+vluint64_t sim_time = 0; // Simultation time
+const int TESTCASE_SIZE = 3; // Test cases per line
+
+// Function to evaluate the DUT and dump waveforms
+void eval_and_dump_wave() {
+
+ top->eval(); // Evaluate the DUT
+ trace->dump(sim_time++); // Dump waveforms to VCD file and increment simulation time
+}
+
+// Function to execute a single clock cycle
+void single_cycle() {
+
+ top->clk = 0; // Set clock low
+ eval_and_dump_wave(); // Evaluate and dump waveforms
+ top->clk = 1; // Set clock high
+ eval_and_dump_wave(); // Evaluate and dump waveforms
+}
+
+// Function to reset the DUT
+void reset(int n) {
+
+ top->nRST = 0; // Assert reset
+ while(n-->0) // Loop for specified number of cycles
+ single_cycle();
+ top->nRST = 1; // Deassert reset
+}
+
+// Function to initialize simulation
+void sim_init() {
+
+ trace = new VerilatedVcdC;
+ top = new Vtop;
+ top->trace(trace,0); // Enable tracing and set start time
+ trace->open("dump.vcd"); // Open VCD file for writing waveform
+
+ top->readA = 0; // Initialize readA signal
+ top->readB = 0; // Initialize readB signal
+ reset(2); // Reset the DUT for 2 cycles
+}
+
+// Function to finalize simulation and clean up
+int sim_exit() {
+
+ eval_and_dump_wave();
+ top->final(); // Finalize DUT
+ trace->close(); // Close VCD file
+ delete top; // Delete DUT instance
+
+ return EXIT_SUCCESS;
+}
+
+// Function to read in test case to a vector
+void readNumbers(const std::string& filename, std::vector& numbers) {
+
+ // Open test case file
+ std::ifstream file(filename);
+ if (!file.is_open()) {
+ std::cerr << "Failed to open file: " << filename << std::endl;
+ return;
+ }
+
+ // Read test case file line by line, extract tokens separated by ','
+ std::string line;
+ while (std::getline(file, line)) {
+ std::istringstream iss(line);
+ std::string token;
+ while (std::getline(iss, token, ',')) {
+ numbers.push_back(std::stoi(token));
+ }
+ }
+
+ // Close test case file
+ file.close();
+}
+
+// Test main
+int main(int argc, char** argv) {
+
+ // Initialize test
+ Verilated::commandArgs(argc, argv);
+ Verilated::traceEverOn(true);
+ sim_init();
+
+ // Read in test cases
+ std::vector test_cases;
+ readNumbers("/content/convInput.txt", test_cases);
+ int cycle = ceil(test_cases.size() / TESTCASE_SIZE);
+
+ std::ofstream dumpfile;
+ //dumpfile.open("resultdump.txt");
+
+ // Loop through test cases
+ for (int i = 0; i < cycle; i++) {
+ uint8_t a = test_cases[i * TESTCASE_SIZE] & 0xFF;
+ uint8_t b = test_cases[i * TESTCASE_SIZE + 1] & 0xFF;
+ uint8_t w = test_cases[i * TESTCASE_SIZE + 2] & 0xFF;
+
+ // Assign signals
+ top->readA = a;
+ top->readB = b;
+
+ std::cout << "0,0," << +top->write << std::endl;
+
+ single_cycle();
+ }
+
+ //dumpfile.close();
+
+ return sim_exit();
+}
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/src/top.sv b/VLSI24/submitted_notebooks/SJSystolicArray/src/top.sv
new file mode 100644
index 00000000..68ba2563
--- /dev/null
+++ b/VLSI24/submitted_notebooks/SJSystolicArray/src/top.sv
@@ -0,0 +1,208 @@
+module top(
+ input logic clk,
+ input logic nRST,
+ input logic [7:0] readA,
+ input logic [7:0] readB,
+ output logic [7:0] write
+);
+
+ logic [4:0] PERead;
+ logic [4:0] PEStart;
+ logic [2:0] filtRead;
+
+ logic [2:0] PENewOuput;
+
+ topLevelControl U1(
+ .clk(clk),
+ .nRST(nRST),
+ .readA(readA),
+ .readB(readB),
+ .PERead(PERead),
+ .PEStart(PEStart),
+ .filtRead(filtRead)
+ );
+
+ //PE Group 0
+ //PE 0,0
+ logic [9:0] psum_o00;
+
+ PE U2(
+ .clk_i(clk),
+ .rstn_i(nRST),
+ .psum_i({{3{readA[7]}}, readA[6:0]}),
+ .filter_i(readB),
+ .ifmap_i(readA),
+ .read_new_filter_val(filtRead[0]),
+ .read_new_ifmap_val(PERead[0]),
+ .start_conv(PEStart[0]),
+ .psum_o(psum_o00),
+ .psum_valid_o()
+ );
+
+ //PE Group 1
+ //PE 1,0
+
+ logic [9:0] psum_o10;
+
+ PE U3(
+ .clk_i(clk),
+ .rstn_i(nRST),
+ .psum_i(psum_o00),
+ .filter_i(readB),
+ .ifmap_i(readA),
+ .read_new_filter_val(filtRead[1]),
+ .read_new_ifmap_val(PERead[1]),
+ .start_conv(PEStart[1]),
+ .psum_o(psum_o10),
+ .psum_valid_o()
+ );
+
+ //PE 0,1
+ logic [9:0] psum_o01;
+
+ PE U4(
+ .clk_i(clk),
+ .rstn_i(nRST),
+ .psum_i({{3{readB[7]}}, readB[6:0]}),
+ .filter_i(readB),
+ .ifmap_i(readA),
+ .read_new_filter_val(filtRead[0]),
+ .read_new_ifmap_val(PERead[1]),
+ .start_conv(PEStart[1]),
+ .psum_o(psum_o01),
+ .psum_valid_o()
+ );
+
+ //PE Group 2
+ //PE 2,0
+ logic [9:0] psum_o20;
+
+ PE U5(
+ .clk_i(clk),
+ .rstn_i(nRST),
+ .psum_i(psum_o10),
+ .filter_i(readB),
+ .ifmap_i(readA),
+ .read_new_filter_val(filtRead[2]),
+ .read_new_ifmap_val(PERead[2]),
+ .start_conv(PEStart[2]),
+ .psum_o(psum_o20),
+ .psum_valid_o(PENewOuput[2])
+ );
+
+ //PE 1,1
+ logic [9:0] psum_o11;
+
+ PE U6(
+ .clk_i(clk),
+ .rstn_i(nRST),
+ .psum_i(psum_o01),
+ .filter_i(readB),
+ .ifmap_i(readA),
+ .read_new_filter_val(filtRead[1]),
+ .read_new_ifmap_val(PERead[2]),
+ .start_conv(PEStart[2]),
+ .psum_o(psum_o11),
+ .psum_valid_o()
+ );
+
+ //PE 0,2
+ logic [9:0] psum_o02;
+
+ PE U7(
+ .clk_i(clk),
+ .rstn_i(nRST),
+ .psum_i({{3{readB[7]}}, readB[6:0]}),
+ .filter_i(readB),
+ .ifmap_i(readA),
+ .read_new_filter_val(filtRead[0]),
+ .read_new_ifmap_val(PERead[2]),
+ .start_conv(PEStart[2]),
+ .psum_o(psum_o02),
+ .psum_valid_o()
+ );
+
+ //PE Group 3
+ //PE 2,1
+ logic [9:0] psum_o21;
+
+ PE U8(
+ .clk_i(clk),
+ .rstn_i(nRST),
+ .psum_i(psum_o11),
+ .filter_i(readB),
+ .ifmap_i(readA),
+ .read_new_filter_val(filtRead[2]),
+ .read_new_ifmap_val(PERead[3]),
+ .start_conv(PEStart[3]),
+ .psum_o(psum_o21),
+ .psum_valid_o(PENewOuput[1])
+ );
+
+ //PE 1,2
+ logic [9:0] psum_o12;
+
+ PE U9(
+ .clk_i(clk),
+ .rstn_i(nRST),
+ .psum_i(psum_o02),
+ .filter_i(readB),
+ .ifmap_i(readA),
+ .read_new_filter_val(filtRead[1]),
+ .read_new_ifmap_val(PERead[3]),
+ .start_conv(PEStart[3]),
+ .psum_o(psum_o12),
+ .psum_valid_o()
+ );
+
+ //PE Group 4
+ //PE 2,2
+ logic [9:0] psum_o22;
+
+ PE U10(
+ .clk_i(clk),
+ .rstn_i(nRST),
+ .psum_i(psum_o12),
+ .filter_i(readB),
+ .ifmap_i(readB),
+ .read_new_filter_val(filtRead[2]),
+ .read_new_ifmap_val(PERead[4]),
+ .start_conv(PEStart[4]),
+ .psum_o(psum_o22),
+ .psum_valid_o(PENewOuput[0])
+ );
+
+ logic[9:0] writeIntermediate;
+ logic overflowPos;
+ logic overflowNeg;
+
+ always_comb begin //select which PE is routed to output
+ casez({PENewOuput})
+ 3'b1??: begin
+ writeIntermediate = psum_o20;
+ end
+ 3'b01?: begin
+ writeIntermediate = psum_o21;
+ end
+ 3'b001: begin
+ writeIntermediate = psum_o22;
+ end
+ default: begin
+ writeIntermediate = '0;
+ end
+ endcase
+
+ write = {writeIntermediate[9], writeIntermediate[6:0]}; //cap output to +/-127 by detecting overflows and writing max value to output in case of overflow
+
+ overflowPos = !writeIntermediate[9] & (writeIntermediate[8] | writeIntermediate[7]);
+ overflowNeg = writeIntermediate[9] & (!writeIntermediate[8] | !writeIntermediate[7]);
+
+ if(overflowPos) begin
+ write[6:0] = '1;
+ end
+ if(overflowNeg) begin
+ write[6:0] = '0;
+ end
+ end
+
+endmodule
\ No newline at end of file
diff --git a/VLSI24/submitted_notebooks/SJSystolicArray/src/topLevelControl.sv b/VLSI24/submitted_notebooks/SJSystolicArray/src/topLevelControl.sv
new file mode 100644
index 00000000..f83376cf
--- /dev/null
+++ b/VLSI24/submitted_notebooks/SJSystolicArray/src/topLevelControl.sv
@@ -0,0 +1,144 @@
+module topLevelControl (
+ input logic clk,
+ input logic nRST,
+ input logic [7:0] readA,
+ input logic [7:0] readB,
+ output logic [4:0] PERead,
+ output logic [4:0] PEStart,
+ output logic [2:0] filtRead
+);
+
+typedef enum logic [1:0] {
+ idle,
+ loadInit,
+ loadSingle,
+ reload
+} state_t;
+
+state_t state, nextState;
+
+logic [4:0] PEReadNaive;
+
+logic [2:0] countPE, nextCountPE;
+logic [7:0] countRow, nextCountRow;
+logic [6:0] countTile, nextCountTile;
+
+logic [7:0] rowLen, nextRowLen;
+logic [6:0] colTiles, nextColTiles;
+
+always_ff @(posedge clk, negedge nRST) begin
+ if(nRST == '0) begin
+ state <= idle;
+ countPE <= '0;
+ countRow <= '0;
+ countTile <= '0;
+ rowLen <= '0;
+ colTiles <= '0;
+ end
+ else begin
+ state <= nextState;
+ countPE <= nextCountPE;
+ countRow <= nextCountRow;
+ countTile <= nextCountTile;
+ rowLen <= nextRowLen;
+ colTiles <= nextColTiles;
+ end
+end
+
+always_comb begin
+ nextState = state;
+ nextCountPE = countPE;
+ nextCountRow = countRow;
+ nextCountTile = countTile;
+
+ nextRowLen = rowLen;
+ nextColTiles = colTiles;
+
+ PEReadNaive = '0;
+
+ PERead = '0;
+ PEStart = '0;
+ filtRead = '0;
+
+ case(countPE)
+ 3'd1: PEReadNaive[0] = 1'b1;
+ 3'd2: PEReadNaive[1] = 1'b1;
+ 3'd3: PEReadNaive[2] = 1'b1;
+ 3'd4: PEReadNaive[3] = 1'b1;
+ 3'd5: PEReadNaive[4] = 1'b1;
+ default:PEReadNaive = '0;
+ endcase
+
+ case(state)
+ idle: begin
+ nextCountPE = '0;
+ nextCountRow = '0;
+ nextCountTile = '0;
+ if(readA[7]) begin
+ nextRowLen = {readA[6:0], readB[7]};
+ nextColTiles = readB[6:0];
+ nextCountPE = 3'd1;
+ nextState = loadInit;
+ end
+ end
+ loadInit: begin
+ PERead = {PEReadNaive[3], PEReadNaive[3:0]};
+ filtRead = PEReadNaive[2:0];
+ nextCountPE = countPE + 3'd1;
+ if(countPE == 3'd4) begin
+ nextCountPE = 3'd1;
+ nextCountRow = countRow + 8'd1;
+ end
+ if(countRow == 8'd2) begin
+ PEStart = PEReadNaive;
+ end
+ if(countRow == 8'd3) begin
+ nextCountPE = 3'd1;
+ PEStart[4] = 1'b1;
+ filtRead = '0;
+ PERead = '0;
+ nextState = loadSingle;
+ nextCountRow = 8'd1;
+ end
+ end
+ loadSingle: begin
+ PEStart = PEReadNaive;
+ PERead = PEReadNaive;
+ nextCountPE = countPE + 3'd1;
+ if(countPE == 3'd5) begin
+ nextCountPE = 3'd1;
+ nextCountRow = countRow + 8'd1;
+ if(countRow == rowLen) begin
+ nextState = reload;
+ nextCountTile = countTile + 7'd1;
+ nextCountRow = '0;
+ end
+ end
+ end
+ reload: begin
+ PERead = {PEReadNaive[3], PEReadNaive[3:0]};
+ nextCountPE = countPE + 3'd1;
+ if(countPE == 3'd4) begin
+ nextCountPE = 3'd1;
+ nextCountRow = countRow + 8'd1;
+ end
+ if(countRow == 8'd2) begin
+ PEStart = PEReadNaive;
+ end
+ if(countRow == 8'd3) begin
+ nextCountPE = 3'd1;
+ PEStart[4] = 1'b1;
+ filtRead = '0;
+ PERead = '0;
+ nextState = loadSingle;
+ nextCountRow = 8'd1;
+ end
+ if(countTile == colTiles) begin
+ nextState = idle;
+ PERead = '0;
+ end
+ end
+ endcase
+end
+
+endmodule
\ No newline at end of file