Note that while you can use pretrained ChromWave models provided here to make transcription factor and nucleosome binding without the use of a GPU, any training of new models will require a GPU.
Note: These installation steps assume that you are on a Linux environment.
We recommend to install ChromWave in a conda environment. If you are on Windows or a Mac, you will need to download the right Miniconda2 installation file and install conda and pip packages manually as described below. We provide all necessary files.
First clone ChromWave using git
:
git clone https://github.com/luslab/ChromWave.git
If you are on linux or Mac you can use the provided miniconda install script in the conda installation files
folder. In a terminal window cd
into the ChromWave folder and install miniconda from the provided script by typing
bash Miniconda3-latest-Linux-x86_64.sh
If you are on a MacOS use the provided Mac files, eg.
bash Miniconda3-latest-MacOSX-x86_64
Once you have Miniconda3 installed, create the chromwave environment (still in the ChromWave folder)
conda env create -f environment_linux.yml
of if you are on MacOS type
conda env create -f environment_mac.yml
Alternatively, if you have your own version of conda installed, create a new environment using python3.7and then install all dependencies in conda:
conda create -n chromwave python=3.7
conda activate chromwave
and install tensorflow for CPY
conda install -c anaconda tensorflow
of if you have access to a GPU
conda install -c anaconda tensorflow-gpu
conda install keras==2.3.1 pandas --channel conda-forge
conda install scikit-learn matplotlib biopython pyyaml h5py --channel conda-forge
conda install scikit-image joblib hyperas --channel conda-forge
followed by the pip dependencies:
pip install roman pydot pydot-ng
You may also consider installing the following optional dependencies :
- cuDNN (recommended if you plan on running Keras on GPU).
- HDF5 and h5py (required if you plan on saving Keras models to disk). H5py is included in the provided conda environments.
- graphviz and pydot_ng (used by visualization utilities to plot model graphs). Pydot_ng is included in the provided conda environments and enabeles visualisation of the model architectures.
- jupyter-lab via conda:
conda install -c conda-forge jupyterlab
- hyperopt and hyperas to run hyperoptimisations as in the examples provided (both are provided in the conda environments)
Now you can install ChromWave:
First, activate the chromwave environment (source activate chromwave
), cd
into the ChromWave folder and run the install command:
cd ChromWave
python setup.py install
If you plan to continue developing the package and want to make local changes that are reflected in your installation, run instead
python setup.py develop
We have assembled some generic workflows as tutorials as jupyter notebooks. This includes a quick start in using the models and how to load and preprocess data (Tutorials 1&2) as well as other workflows showing how to produce most of the figures in our publication 'ChromWave: Deciphering the DNA-encoded competition between DNA-binding factors with deep neural networks'.
We also provide scripts that can be run in the command line for the optimisation of the hyperparameters with hyperopt.
In the model directory, we are providing all model files to be able to load the models again to run them on your own genomic data to produce DNA-binding prediction profiles. The model directories each also contain a prediction folder with bigWig files of the predictions for transcription factor and nucleosome binding in the yeast genome(TF-Nucleosome model), nucleosome occupancy in the yeast genome in vitro and in vivo and finally nucleosome occupancy predictions +/-1000bp around the annotated transcriptional start sites in hg38. For more information on these models see our publication 'ChromWave: Deciphering the DNA-encoded competition between DNA-binding factors with deep neural networks' and the provided jupyter notebooks.
The data directory contains all data necessary to train the provided models. This includes the genomic data and TSS annotation in the case of the human promoter model, as well as all binding profile data. ChromWave can readily be trained from csv files with binding occupancies, computeMatrix output of deeptools as well as bed file. If you are planning of using bed files, please make sure that you have no ranges of width>1 e.g. you must have a score at each base-pair position (missig values will be imputed with the genomic average). We are providing an example script of how to do this in R in the data directory.
Our models can easily be loaded in R to predict DNA-binding profiles in the genome of choice using the packages reticulate
and KerasR
. We have provided some examples on how to do this as R scripts.