Skip to content

Repository containing work on "Sparsity Through Spiking Convolutional Neural Network for Audio Classification at the Edge".

Notifications You must be signed in to change notification settings

CongSheng/SpikingConvNN-AudioClassification

Repository files navigation

Sparsity through Spiking Convolutional Neural Network (SCNN) for Audio Classification at the Edge

Cong Sheng Leow, Wang Ling Goh, Yuan Gao
Published in IEEE International Symposium on Circuits and Systems 2023 (ISCAS 2023).
Vector for image

Abstract: Convolutional neural networks (CNNs) have shown to be effective for audio classification. However, deep CNNs can be computationally heavy and unsuitable for edge intelligence as embedded devices are generally constrained by memory and energy requirements. Spiking neural networks (SNNs) offer potential as energy-efficient networks but typically underperform typical deep neural networks in accuracy. This paper proposes a spiking convolutional neural network (SCNN) that exhibits excellent accuracy of above 98 % on a multi-class audio classification task. Accuracy remains high with weight quantization to INT8- precision. Additionally, this paper examines the role of neuron pagit rameters in co-optimizing activation sparsity and accuracy. Model A Architecture Fig 1: Model A Network (SCNN) inspired by the original CNN used for Free spoken digit dataset or FSDD. Model B Architecture Fig 2: Model B Network (SCNN) with leaner architecture than model A. Training Loss Fig 3: Training loss plot against training epochs for Model B. Confusion Matrix Fig 4: Confusion matrix for model B.

Model Architecture No. of time steps Test Accuracy Quantized Test Accuracy No. of Parameters FLOPS
Control (CNN) 6C3-P2-16C3-128FC-64FC-10FC NA 97.3 % - 0.837 M 3.326M
A 6C3-16C3-128FC-64FC-10FC 10 98.7 ± 0.57 % 74.1 ± 9.22 % 1.616 M 281.000 M
1 95.4 ± 3.24 % 61.6 ± 12.30 % 28.085 M
B 6C3-128FC-10FC 10 98.4 ± 0.35 % 97.0 ± 0.77 % 0.702 M 88.930 M
1 93.1 ± 0.77 % 91.3 ± 2.40 % 8.893 M
Table showing the results for the model. Benchmark table available in original paper.

Table of Content

  1. Structure
  2. Dependencies
  3. Usage
  4. Scripts
  5. Future Work
  6. Citations and Acknowledgements

Structure

Folder PATH listing for volume Windows
Volume serial number is 945B-7E82
C:.
│   .gitignore
│   automatedSearch.py
│   feature_exploration.py
│   iii_quantize.py
│   ii_spikingTest.py
│   iv_layerAnalysis.py
│   i_print_dict.py
│   main.py
│   manualSearch.py
│   README.md
│   requirements.txt
│   scriptRun.py
││   
├───checkpoints    
|    
├───datasets
│   │   customDataset.py
│   │   mfcc_dataset.py
│   │   __init__.py
│           
├───Expt
│   │   expt.log
│   │   exptPlot.log
│   │   exptProfile.log
|
├───figures
├───free-spoken-digit-dataset-v1.0.8
│   └───FSDD
│       │   .gitignore
│       │   metadata.py
│       │   pip_requirements.txt
│       │   README.md
│       │   __init__.py
│       │   
│       ├───acquire_data
│       │       say_numbers_prompt.py
│       │       split_and_label_numbers.py
│       │       
│       ├───recordings
│       │    
│       └───utils
│               fsdd.py
│               spectogramer.py
│               trimmer.py
│               __init__.py
│               
├───hyperTuning
│   │   parameterScriptRun.log
│   │   profileScriptRun.log
│   │   results.log
│   │   resultsScriptRun.log
│   │   tuningPara.log
│   │       
│   ├───confuseMatrix
│   │       
│   └───Train
│           
├───logs
├───models
│   │   AlexCNN.py
│   │   CustomCNN.py
│   │   LeNet.py
│   │   test.py
│   │   train.py
│           
├───transformedData
│               
├───utils
│   │   audioProcessing.py
│   │   loggingFn.py
│   │   plotFigure.py
│   │   spikingNeuron.py
│   │   __init__.py

Dependencies

It recommended for this to be run on docker or a virtual environment. Click on the above links to get started before installing the packages.

connda install pip
conda install -file requirements.txt

or $cond acreate --name <environment_name> -- file requirements.txt to create a new enivronment with the packages ready.

Usage

The whole repository can be divided in several levels:

  1. Input
  2. Output
  3. Utility and Self-defined Packages
  4. Python Scripts

Input

You should first download the dataset to the home directory. This project focuses on the free spoken digit dataset (FSDD) which can be downloaded via https://github.com/Jakobovski/free-spoken-digit-dataset.

Output

There are different types of files this project works with. Some of these include: .png (figures), .log(log files), .pt (model checkpoints and transformed features). These are mainly stored at their individual folders, but the directory can be specified at the scripts which produces them (typically at the top to configure log path or figure path).

Utility and Self-defined Packages

Utility packages are written mainly for re-use in the scripts for purposes such as signal processing, logging, or plotting figures. These are located in the utils folder. On the other hand, the models and scripts to train or test the networks are located inside the models folder.

Python Scripts

Ideally, most modification should only be done on the scripts within the home directory. These scripts offer configuration options at the command-line level through argparse for simple modifications. Realistically, modifications can also be done at the script level, especially at directory configuration, change in sizes of plots, and etc. More explanations will be illustrated at the following section on the scripts.

Scripts

The scripts can be segmented into numbered scripts and non-numbered scripts. Numbered scripts are scripts which were mainly used to better understand the functions or logic behind the tools and options available. On the other hand, non-numbered scripts are more stand-alone. Instructions on the arguments are written within the argparse's -help section.

Numbered Scripts
i_print_dict.py: Used to visualize dictionaries, such as the checkpoints.
ii_spikingTest.py: Used to visualize spiking neurons through printed plot.
iii_quantize.py: Used for Quantized-Aware-Training (QAT).
iv_layerAnalysis: Used to visualize and analysis layer's weights and output.

Non-numbered Scripts
automatedSearch: Validation/Search through neuron parameters using Tune.
Automated Search Fig 5: Random and Bayesian search for highest accuracy across neuron parameters for both Model A and Model B: Beta, No. of time steps, and Threshold voltage.
feature_exploration.py: Used to explore different features on a single audio.
main.py: Main script to train and evaluate different models at different configurations.
manualSearch.py: Manually sweep along individual neuron parameters to examine the accuracy and sparasity.
Manual Search for Model A Fig 6: Manual Search for Model A - Varying single parameter while keeping the rest constant. Manual Search for Model B Fig 7: Manual Search for Model B - Varying single parameter while keeping the rest constant.
scriptRun.py: Similar to main.py, but without argparse and requires modification directly on the script.

Future Works

This project is limited in its scope and more can be done to improve it. Some of the future directions include:

  1. Implemention and test of hardware implementation.
  2. Multi-objective optimization (Accuracy, sparsity and hardware resources).
  3. Online optimization.

Citations and Acknowledgements

This repository can be slightly messy, but hope that you are still able to get the script to wor. If you have any questions or feedback, do email me (leowcs AT umich.edu) or drop me a Linkedin message. If this has been useful for your work, do cite it and feel free to let me know too! Cheers!

C. S. Leow, W. L. Goh, and Y. Gao, “Sparsity through spiking convolutional neural network for audio classification at the edge,” IEEE Int. Symp. Circuits Syst. (ISCAS), 2023, pp. 1‑5.
[Citation will be updated once proceeding becomes available.]

This work is built upon the great works of others, some of them are listed below. However, for the full list of references, please do refer to the bibliography of the original publication. Other useful works:
snnTorch: https://github.com/jeshraghian/snntorch/tree/master
Brevitas: https://github.com/Xilinx/brevitas
Flop Counter: https://github.com/facebookresearch/fvcore/blob/main/docs/flop_count.md
Ray (Tune): https://github.com/ray-project/ray

About

Repository containing work on "Sparsity Through Spiking Convolutional Neural Network for Audio Classification at the Edge".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages