Skip to content

eulertx/ChromWave

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChromWave - a deep learning model for sequential genomic data

Note that while you can use pretrained ChromWave models provided here to make transcription factor and nucleosome binding without the use of a GPU, any training of new models will require a GPU.

Installation

Note: These installation steps assume that you are on a Linux environment.

Install Miniconda for python 3

We recommend to install ChromWave in a conda environment. If you are on Windows or a Mac, you will need to download the right Miniconda2 installation file and install conda and pip packages manually as described below. We provide all necessary files.

First clone ChromWave using git:

git clone https://github.com/luslab/ChromWave.git

If you are on linux or Mac you can use the provided miniconda install script in the conda installation files folder. In a terminal window cd into the ChromWave folder and install miniconda from the provided script by typing

bash Miniconda3-latest-Linux-x86_64.sh

If you are on a MacOS use the provided Mac files, eg.

bash Miniconda3-latest-MacOSX-x86_64

Once you have Miniconda3 installed, create the chromwave environment (still in the ChromWave folder)

conda env create -f environment_linux.yml

of if you are on MacOS type

conda env create -f environment_mac.yml

Alternatively, if you have your own version of conda installed, create a new environment using python3.7and then install all dependencies in conda:

conda create -n chromwave python=3.7
conda activate chromwave

and install tensorflow for CPY

conda install -c anaconda tensorflow 

of if you have access to a GPU

conda install -c anaconda tensorflow-gpu
conda install keras==2.3.1 pandas --channel conda-forge
conda install scikit-learn matplotlib biopython pyyaml h5py --channel conda-forge
conda install scikit-image joblib hyperas  --channel conda-forge

followed by the pip dependencies:

pip install roman pydot pydot-ng

You may also consider installing the following optional dependencies :

  • cuDNN (recommended if you plan on running Keras on GPU).
  • HDF5 and h5py (required if you plan on saving Keras models to disk). H5py is included in the provided conda environments.
  • graphviz and pydot_ng (used by visualization utilities to plot model graphs). Pydot_ng is included in the provided conda environments and enabeles visualisation of the model architectures.
  • jupyter-lab via conda: conda install -c conda-forge jupyterlab
  • hyperopt and hyperas to run hyperoptimisations as in the examples provided (both are provided in the conda environments)

Install ChromWave

Now you can install ChromWave:

First, activate the chromwave environment (source activate chromwave), cd into the ChromWave folder and run the install command:

cd ChromWave
python setup.py install

If you plan to continue developing the package and want to make local changes that are reflected in your installation, run instead

python setup.py develop

Quick start

We have assembled some generic workflows as tutorials as jupyter notebooks. This includes a quick start in using the models and how to load and preprocess data (Tutorials 1&2) as well as other workflows showing how to produce most of the figures in our publication 'ChromWave: Deciphering the DNA-encoded competition between DNA-binding factors with deep neural networks'.

We also provide scripts that can be run in the command line for the optimisation of the hyperparameters with hyperopt.

The models

In the model directory, we are providing all model files to be able to load the models again to run them on your own genomic data to produce DNA-binding prediction profiles. The model directories each also contain a prediction folder with bigWig files of the predictions for transcription factor and nucleosome binding in the yeast genome(TF-Nucleosome model), nucleosome occupancy in the yeast genome in vitro and in vivo and finally nucleosome occupancy predictions +/-1000bp around the annotated transcriptional start sites in hg38. For more information on these models see our publication 'ChromWave: Deciphering the DNA-encoded competition between DNA-binding factors with deep neural networks' and the provided jupyter notebooks.

The data

The data directory contains all data necessary to train the provided models. This includes the genomic data and TSS annotation in the case of the human promoter model, as well as all binding profile data. ChromWave can readily be trained from csv files with binding occupancies, computeMatrix output of deeptools as well as bed file. If you are planning of using bed files, please make sure that you have no ranges of width>1 e.g. you must have a score at each base-pair position (missig values will be imputed with the genomic average). We are providing an example script of how to do this in R in the data directory.

R integration

Our models can easily be loaded in R to predict DNA-binding profiles in the genome of choice using the packages reticulate and KerasR. We have provided some examples on how to do this as R scripts.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 84.6%
  • Python 15.3%
  • R 0.1%