Due to copyrights and a signed non-disclosure agreement, the models, data, and further information about the data have been removed from the repository.
Further information about the approaches and theory can be found in the corresponding directory's README.md
.
This work was part of a seminar during my mathematics studies, focusing on applying deep learning techniques to classify time series data. The goal was to develop a deep-learning approach to classify time-series (TS) data using LSTMs, CNNs, and wavelet transformation. As an optional task, generative models can deliver synthetic data to provide a simple alternative for generating new, fresh data in the future, potentially improving the performance of the models.
- # of time-series: 33,727
- # of features per time-series: 6 (
$feature_1, feature_2, \ldots, feature_6$ ) - # of possible labels
$y$ : 4 ($class_0, class_1, class_2, class_4$ ) - Relative frequency of labels: (11.5%, 24.9%, 37.2%, 26.4%)
Since most deep learning models require input data of equal length, each time-series has been interpolated
to an equal length using scipy.interpolate.interp1d
. After that, the data was standardized by dividing each value by its absolute maximum value to a range between [-1,1], making the training more stable.
In the following, the task has been addressed by applying LSTM, CNN, and FCN (FullyCN). Since CNNs are usually known to perform well on grid-like data structures, the transformation of the time-series data into a wavelet image was performed, and CNN2D was applied.
Hyperparameter tuning was done using the HyperBand
search algorithm from keras-tuner
.
- LSTM on time-series using LSTM
- CNN on time-series using Conv1D
- CNN on wavelets using Conv2D
- FCN on time-series using Conv1D and GlobalAveragePooling1D
In the domain scenario, misclassification of the algorithms cannot be treated equally due to safety reasons.
Thus, in addition to SparseCategoricalCrossentropy,
a suitable metric for the loss function has been derived to account for these inequalities.
The following matrix
| Algorithm | Precision (Test) | Loss (Test) |
Initial approaches to generating synthetic data have been made by leveraging GAN architectures. These approaches require more computational power, so only raw samples could be produced, which already suggests that the original data can be approximated very well. This can be further explored in future work.
-
CNN
: Contains the implementation of the CNN models.HyperSearchCNN.ipynb
: A notebook for performing hyperparameter search on the classical CNN model.EvaluateCNN.ipynb
: A notebook for evaluating the best CNN model identified during the hyperparameter search.
-
FCN
: Contains the implementation of the FCN models.HyperSearchFCN.ipynb
: A notebook for performing hyperparameter search on the FCN model.EvaluateFCN.ipynb
: A notebook for evaluating the best FCN model identified during the hyperparameter search.
-
GenerativeModel
:GAN
: Implements the GAN models.DCGAN.ipynb
: Experiments using Deep Convolutional Generative Adversarial Network.TimeGAN.ipynb
: Experiments using Time-GAN by Jinsung Yoon et al. (2019).CompareGAN.ipynb
: Loads the current DCGAN and TimeGAN models and creates synthetic data for demonstration.
-
LSTM
: Contains the implementation of the LSTM models.HyperSearchLSTM.ipynb
: A notebook for performing hyperparameter search on the classical LSTM model.EvaluateLSTM.ipynb
: A notebook for comparing and evaluating the two best LSTM models identified during the hyperparameter search.
-
Metrics
:metrics.ipynb
: Implements a metric for imbalanced datasets and a weighted loss function.
-
utilities
: A collection of frequently used functions.