Cong Sheng Leow, Wang Ling Goh, Yuan Gao
Published in IEEE International Symposium on Circuits and Systems 2023 (ISCAS 2023).
Abstract: Convolutional neural networks (CNNs) have shown to be effective for audio classification. However, deep CNNs can be computationally heavy and unsuitable for edge intelligence as embedded devices are generally constrained by memory and energy requirements. Spiking neural networks (SNNs) offer potential as energy-efficient networks but typically underperform typical deep neural networks in accuracy. This paper proposes a spiking convolutional neural network (SCNN) that exhibits excellent accuracy of above 98 % on a multi-class audio classification task. Accuracy remains high with weight quantization to INT8- precision. Additionally, this paper examines the role of neuron pagit rameters in co-optimizing activation sparsity and accuracy. Fig 1: Model A Network (SCNN) inspired by the original CNN used for Free spoken digit dataset or FSDD. Fig 2: Model B Network (SCNN) with leaner architecture than model A. Fig 3: Training loss plot against training epochs for Model B. Fig 4: Confusion matrix for model B.
Model | Architecture | No. of time steps | Test Accuracy | Quantized Test Accuracy | No. of Parameters | FLOPS |
---|---|---|---|---|---|---|
Control (CNN) | 6C3-P2-16C3-128FC-64FC-10FC | NA | 97.3 % | - | 0.837 M | 3.326M |
A | 6C3-16C3-128FC-64FC-10FC | 10 | 98.7 ± 0.57 % | 74.1 ± 9.22 % | 1.616 M | 281.000 M |
1 | 95.4 ± 3.24 % | 61.6 ± 12.30 % | 28.085 M | |||
B | 6C3-128FC-10FC | 10 | 98.4 ± 0.35 % | 97.0 ± 0.77 % | 0.702 M | 88.930 M |
1 | 93.1 ± 0.77 % | 91.3 ± 2.40 % | 8.893 M |
Folder PATH listing for volume Windows
Volume serial number is 945B-7E82
C:.
│ .gitignore
│ automatedSearch.py
│ feature_exploration.py
│ iii_quantize.py
│ ii_spikingTest.py
│ iv_layerAnalysis.py
│ i_print_dict.py
│ main.py
│ manualSearch.py
│ README.md
│ requirements.txt
│ scriptRun.py
││
├───checkpoints
|
├───datasets
│ │ customDataset.py
│ │ mfcc_dataset.py
│ │ __init__.py
│
├───Expt
│ │ expt.log
│ │ exptPlot.log
│ │ exptProfile.log
|
├───figures
├───free-spoken-digit-dataset-v1.0.8
│ └───FSDD
│ │ .gitignore
│ │ metadata.py
│ │ pip_requirements.txt
│ │ README.md
│ │ __init__.py
│ │
│ ├───acquire_data
│ │ say_numbers_prompt.py
│ │ split_and_label_numbers.py
│ │
│ ├───recordings
│ │
│ └───utils
│ fsdd.py
│ spectogramer.py
│ trimmer.py
│ __init__.py
│
├───hyperTuning
│ │ parameterScriptRun.log
│ │ profileScriptRun.log
│ │ results.log
│ │ resultsScriptRun.log
│ │ tuningPara.log
│ │
│ ├───confuseMatrix
│ │
│ └───Train
│
├───logs
├───models
│ │ AlexCNN.py
│ │ CustomCNN.py
│ │ LeNet.py
│ │ test.py
│ │ train.py
│
├───transformedData
│
├───utils
│ │ audioProcessing.py
│ │ loggingFn.py
│ │ plotFigure.py
│ │ spikingNeuron.py
│ │ __init__.py
It recommended for this to be run on docker or a virtual environment. Click on the above links to get started before installing the packages.
connda install pip
conda install -file requirements.txt
or $cond acreate --name <environment_name> -- file requirements.txt
to create
a new enivronment with the packages ready.
The whole repository can be divided in several levels:
- Input
- Output
- Utility and Self-defined Packages
- Python Scripts
You should first download the dataset to the home directory. This project focuses on the free spoken digit dataset (FSDD) which can be downloaded via https://github.com/Jakobovski/free-spoken-digit-dataset.
There are different types of files this project works with. Some of these include: .png (figures), .log(log files), .pt (model checkpoints and transformed features). These are mainly stored at their individual folders, but the directory can be specified at the scripts which produces them (typically at the top to configure log path or figure path).
Utility packages are written mainly for re-use in the scripts for purposes such
as signal processing, logging, or plotting figures. These are located in the
utils
folder. On the other hand, the models and scripts to train or test the
networks are located inside the models
folder.
Ideally, most modification should only be done on the scripts within the home
directory. These scripts offer configuration options at the command-line level
through argparse
for simple modifications. Realistically, modifications can
also be done at the script level, especially at directory configuration, change
in sizes of plots, and etc. More explanations will be illustrated at the
following section on the scripts.
The scripts can be segmented into numbered scripts and non-numbered scripts.
Numbered scripts are scripts which were mainly used to better understand the
functions or logic behind the tools and options available. On the other hand,
non-numbered scripts are more stand-alone. Instructions on the arguments are
written within the argparse
's -help
section.
Numbered Scripts
i_print_dict.py
: Used to visualize dictionaries, such as the checkpoints.
ii_spikingTest.py
: Used to visualize spiking neurons through printed plot.
iii_quantize.py
: Used for Quantized-Aware-Training (QAT).
iv_layerAnalysis
: Used to visualize and analysis layer's weights and output.
Non-numbered Scripts
automatedSearch
: Validation/Search through neuron parameters using Tune.
Fig 5: Random and Bayesian search for highest accuracy across neuron parameters for both Model A and Model B: Beta, No. of time steps, and Threshold voltage.
feature_exploration.py
: Used to explore different features on a single audio.
main.py
: Main script to train and evaluate different models at different configurations.
manualSearch.py
: Manually sweep along individual neuron parameters to examine
the accuracy and sparasity.
Fig 6: Manual Search for Model A - Varying single parameter while keeping the rest constant.
Fig 7: Manual Search for Model B - Varying single parameter while keeping the rest constant.
scriptRun.py
: Similar to main.py
, but without argparse
and requires
modification directly on the script.
This project is limited in its scope and more can be done to improve it. Some of the future directions include:
- Implemention and test of hardware implementation.
- Multi-objective optimization (Accuracy, sparsity and hardware resources).
- Online optimization.
This repository can be slightly messy, but hope that you are still able to get the script to wor. If you have any questions or feedback, do email me (leowcs AT umich.edu) or drop me a Linkedin message. If this has been useful for your work, do cite it and feel free to let me know too! Cheers!
C. S. Leow, W. L. Goh, and Y. Gao, “Sparsity through spiking convolutional neural network for audio classification at the edge,” IEEE
Int. Symp. Circuits Syst. (ISCAS), 2023, pp. 1‑5.
[Citation will be updated once proceeding becomes available.]
This work is built upon the great works of others, some of them are listed below.
However, for the full list of references, please do refer to the bibliography
of the original publication.
Other useful works:
snnTorch: https://github.com/jeshraghian/snntorch/tree/master
Brevitas: https://github.com/Xilinx/brevitas
Flop Counter: https://github.com/facebookresearch/fvcore/blob/main/docs/flop_count.md
Ray (Tune): https://github.com/ray-project/ray