This repository contains the full code and some small example data to reproduce our results on the kinetic ensemble of amyloid-β 42. See also the original implementation of the constrained VAMPNets.
The analysis was performed on a single Google compute engine instance using 12 vCPUs, 78 GB memory and 1x NVIDIA Tesla V100 GPU. The instance used the c3-deeplearning-tf-ent-2-1-cu100-20200131
image with CUDA 10.0 and tensorflow 2.1, based on Debian 10. The original environment for this machine is provided in env-tf-original.yml
. Training a single network takes approximately 2 hours on this architecture.
The full dataset can be found on Zenodo. The size of the full dataset is around 45 GB and includes the following directories:
trajectories/
: The simulation trajectories, subsampled to 250 ps timesteps, performed in 5 rounds with 1024 individual trajectories each. The aggregated simulated time is 314 µs for the reduced form of Aβ42 and 317 µs for the oxidised form. Also includes the chemcial shifts backcalculated with CamShift as implemented in Plumed.intermediate/
: Intermediate data files, such as the calculated inter-residue minimum distances, and the full model outputs in the form of transition matrices, weights, and timescales.models/
: The neural network models including weights and trajectory indices used for training.structures{,-alt}/
: State structures sampled from the trajectories sampled two different ways.figs/
: Raw figures for the paper, generated by the notebooks.
The easiest way to try out the notebooks is by using conda
. We include several environment specifications: env-tf.yml
specifies the environment to be used for running the neural network, env-analysis.yml
specifies the packages needed for the analysis and plotting of the results, and env-msmbuilder.yml
specifies the environment for use with the msm-classic.ipynb
notebook for the conventional Markov models. Because the installation of tensorflow is mostly highly specific to your machine, we strongly recommend following the official installation instructions. To create the environments, run conda env create -f env-analysis.yml
and activate the new environment with conda activate analysis
. You will also need to install a tensorflow 2.* compatible version of vamptools
for training.
msm-vampe-hyperpar.ipynb
: Hyperparameter search code, can be run withpapermill
and theenv-tf.yml
environment.msm-vampe-training.ipynb
: Training code, can be run withpapermill
and theenv-tf.yml
environment.msm-vampe-convergence.ipynb
: Simple convergence test using subsets of the trajectory data with the trained models. Run with theenv-tf.yml
environment.msm-vampe-analysis.ipynb
: Full analysis and plots of the ensemble, run with theenv-analysis.yml
environment.msm-classic.ipynb
: Classic MSM building attempt, run with theenv-msmbuilder.yml
environment.model.py
: The neural network model code.data.py
: The data generators and handlers without any tensorflow related code, formsm-vampe-analysis.ipynb
.