This is a repository for the testing of a new modelling formalism called "Stochastic Petri Nets with Exogenous Dependencies" (Exo-SLPN). Included are two folders which contain the files used to generate evaluation figures and tables.
The code (which uses the ProM framework) used to generate the data within each folder can be found here: ExogenousData Plugin. A more in-depth breakdown on the code and reproduction steps can be found in the readmes in each sub-folder.
In order to track and upload some files git LFS has been used. If you do not have Git LFS enabled, pulling down some files may require further steps. See docs.github.com/en/repositories/working-with-files/managing-large-files/collaboration-with-git-large-file-storage for more information.
This evaluation uses three event logs, from different domains, paired with exogenous data.
The first event log comes from the MIMIC III dataset [1] which contains `MICU' ward admissions focusing on the first 48 hours of admission and follows the preparation outlined by our previous work [2]. For this log, blood pressure measurements collected from nurse observations are used for exogenous data.
The second log comes from a smart factory [3], where only the `WF_101' process is considered and the start events within these executions (as these are associated IoT sensors).
The third log used is the road fines event log [4]. For this log, inter-case variables from the log were used as exogenous data, i.e. the total number of unpaid fines and the amount of unpaid fines seen in the event log.
During testing it was noted that for larger logs, exersive amounts of system memory were required (in the range 256Gb to 1TB) for discovering and measuring unit Earth movers (including the data-aware variants). As such, both road fines and MIMIC logs were randomlly reduced through trail and error to run the evaluation with the following system requirements:
- RAM: 64GB (4 x 16GB DDR4 @ 1053 MHz, Dual Channel)
- CPU: AMD Ryzen 9 3900XT
Notely, not all existing techniques have a threaded version so in some cases, runtime could be improved. However, both Exo-SLPN discovery and conformance checking that were implemented for this study are threaded.
To show that our approach can identify the stochastic nature of a process, we considered how the model quality changes as we apply our approach to progressively more complete samples of a given log. The intuition being that with a more complete understanding of the original, the discovered stochastic nature should better reflect the original or at least not degrade.
For each sample, we discovered an Exo-SLPN (recording runtime and memory usage)
and then using the complete log, we compute duEMSC to quantify the quality of
the discovered Exo-SLPN. We created many samples logs of the road fines log,
where the n-th sample consists of
In the sub-folder 'log-completeness' there are three logging files which describe the performance of EXo-SLPNs. Each logging file describes the performance outcomes for a single equation form for Exo-SLPNs, i.e. individual multiplictive (invmut), individual additive (invadd), and global additive (globadd).
These logging files are then visualised using python and can be rerun to reproduce the figures relating to log-incompleteness.
To compare our approach against existing, we discovered a variety of stochastic extensions of Petri nets and considered how close these nets represent the stochastic nature of a log. A variety of extensions from the inductive miner family were selected to discover control-flow models, which then a variety of stochastic miners were applied.
The procedure for discovering and quantifying a model consisted of: (i) discovering a control-flow model with the original log, (ii) sampling the original log with replacement, (iii) discovering stochastic weights using the sampled log and control-flow model, and (iv) measure the discovered stochastic model using the original log. The same sampled log for step (ii) and (iii) is used across techniques.
In the sub-folder 'model-quality' there are the discovered Exo-SLPNs and the measured (data-aware) Unit Earth Movers conformance result. Where possible all original data is kept in the data folders and the extracting scripts for exogenous data.
[1] Johnson, A.E., Pollard, T.J., Shen, L., wei H. Lehman, L., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L.A., Mark, R.G.: MIMIC-III, a freely accessible critical care database. Scientific Data (1) (2016)
[2] Banham, A., Leemans, S.J.J., Wynn, M.T., Andrews, R., Laupland, K.B., Shinners, L.: xPM: Enhancing exogenous data visibility. Artif. Intell. Medicine (2022)
[3] Malburg, L., Grüger, J., Bergmann, R.: An iot-enriched event log for process mining in smart factories. Zendo (2023)
[4] de Leoni, M., Mannhardt, F.: Road traffic fine management process (2015)