Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: Teichlab/tracer
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.6.0
Choose a base ref
...
head repository: Teichlab/tracer
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
Loading
186 changes: 135 additions & 51 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,51 +1,135 @@
# run as:
# docker build -t tracer .

#start off with a plain Debian
FROM debian:latest

#basic setup stuff, including bowtie2
RUN apt-get update && apt-get -y upgrade
RUN apt-get -y install wget curl unzip build-essential zlib1g-dev git python3 python3-pip bowtie2 openjdk-8-jre

#Trinity - depends on zlib1g-dev and openjdk-8-jre installed previously
RUN wget https://github.com/trinityrnaseq/trinityrnaseq/archive/Trinity-v2.4.0.zip
RUN unzip Trinity-v2.4.0.zip && rm Trinity-v2.4.0.zip
RUN cd /trinityrnaseq-Trinity-v2.4.0 && make

#IgBLAST, plus the setup of its super weird internal_data thing. don't ask. just needs to happen
#and then on top of that, the environmental variable thing facilitates the creation of a shell wrapper. fun
RUN wget ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/1.7.0/ncbi-igblast-1.7.0-x64-linux.tar.gz
RUN tar -xzvf ncbi-igblast-1.7.0-x64-linux.tar.gz && rm ncbi-igblast-1.7.0-x64-linux.tar.gz
RUN cd /ncbi-igblast-1.7.0/bin/ && wget -r ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/internal_data && \
mv ftp.ncbi.nih.gov/blast/executables/igblast/release/internal_data . && rm -r ftp.ncbi.nih.gov

#aligners - kallisto and salmon
RUN wget https://github.com/pachterlab/kallisto/releases/download/v0.43.1/kallisto_linux-v0.43.1.tar.gz
RUN tar -xzvf kallisto_linux-v0.43.1.tar.gz && rm kallisto_linux-v0.43.1.tar.gz
RUN wget https://github.com/COMBINE-lab/salmon/releases/download/v0.8.2/Salmon-0.8.2_linux_x86_64.tar.gz
RUN tar -xzvf Salmon-0.8.2_linux_x86_64.tar.gz && rm Salmon-0.8.2_linux_x86_64.tar.gz

#graphviz, along with its sea of dependencies that otherwise trip up the dpkg -i
RUN apt-get -y install libgd3 libgts-0.7-5 liblasi0 libltdl7 freeglut3 libglade2-0 libglu1-mesa libglu1 libgtkglext1 libxaw7
RUN wget http://www.graphviz.org/pub/graphviz/stable/ubuntu/ub13.10/x86_64/libgraphviz4_2.38.0-1~saucy_amd64.deb
RUN dpkg -i libgraphviz4_2.38.0-1~saucy_amd64.deb && apt-get -y -f install
RUN wget http://www.graphviz.org/pub/graphviz/stable/ubuntu/ub13.10/x86_64/graphviz_2.38.0-1~saucy_amd64.deb
RUN dpkg -i graphviz_2.38.0-1~saucy_amd64.deb && apt-get -y -f install
RUN rm libgraphviz4_2.38.0-1~saucy_amd64.deb && rm graphviz_2.38.0-1~saucy_amd64.deb

#tracer proper
COPY . /tracer
RUN cd /tracer && pip3 install -r docker_helper_files/requirements_stable.txt && python3 setup.py install

#obtaining the transcript sequences. no salmon/kallisto indices as they make dockerhub unhappy for some reason
RUN mkdir GRCh38 && cd GRCh38 && wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.transcripts.fa.gz && \
gunzip gencode.v27.transcripts.fa.gz && python3 /tracer/docker_helper_files/gencode_parse.py gencode.v27.transcripts.fa && rm gencode.v27.transcripts.fa
RUN mkdir GRCm38 && cd GRCm38 && wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M15/gencode.vM15.transcripts.fa.gz && \
gunzip gencode.vM15.transcripts.fa.gz && python3 /tracer/docker_helper_files/gencode_parse.py gencode.vM15.transcripts.fa && rm gencode.vM15.transcripts.fa

#placing a preconfigured tracer.conf in ~/.tracerrc
RUN cp /tracer/docker_helper_files/docker_tracer.conf ~/.tracerrc

#this is a tracer container, so let's point it at a tracer wrapper that sets the silly IgBLAST environment variable thing
ENTRYPOINT ["bash", "/tracer/docker_helper_files/docker_wrapper.sh"]
FROM ubuntu:20.04

ENV DEBIAN_FRONTEND=noninteractive

ARG bowtie_version=2.4.5
ARG samtools_version=1.15.1
ARG trinity_version=v2.14.0
ARG igblast_version=1.19.0
ARG kallisto_version=v0.48.0
ARG salmon_version=1.9.0
ARG gencode_human_version=41
ARG gencode_mouse_version=M30

#Install OS packages
RUN apt-get update && \
apt-get -y upgrade && \
apt-get -y install wget curl unzip build-essential zlib1g-dev git python3 python3-pip default-jre procps cmake \
libcairo2-dev pkg-config jellyfish autoconf libbz2-dev liblzma-dev graphviz libgirepository1.0-dev libncurses5-dev \
python-is-python3

#Install Bowtie2
RUN wget https://sourceforge.net/projects/bowtie-bio/files/bowtie2/${bowtie_version}/bowtie2-${bowtie_version}-linux-x86_64.zip/download -O /opt/bowtie2-${bowtie_version}-linux-x86_64.zip && \
cd /opt && \
unzip bowtie2-${bowtie_version}-linux-x86_64.zip && \
rm /opt/bowtie2-${bowtie_version}-linux-x86_64.zip

#Install samtools
RUN wget https://github.com/samtools/samtools/releases/download/${samtools_version}/samtools-${samtools_version}.tar.bz2 && \
tar -xvf samtools-${samtools_version}.tar.bz2 -C /opt && \
cd /opt/samtools-${samtools_version} && \
./configure && \
make && \
make install && \
rm /samtools-${samtools_version}.tar.bz2

#Install Trinity
RUN wget https://github.com/trinityrnaseq/trinityrnaseq/releases/download/Trinity-${trinity_version}/trinityrnaseq-${trinity_version}.FULL.tar.gz && \
tar -xvzf trinityrnaseq-${trinity_version}.FULL.tar.gz -C /opt && \
cd /opt/trinityrnaseq-${trinity_version} && \
make && \
rm /trinityrnaseq-${trinity_version}.FULL.tar.gz

#Install igblast
RUN wget https://ftp.ncbi.nih.gov/blast/executables/igblast/release/${igblast_version}/ncbi-igblast-${igblast_version}-x64-linux.tar.gz && \
tar -xvf ncbi-igblast-${igblast_version}-x64-linux.tar.gz -C /opt && \
rm /ncbi-igblast-${igblast_version}-x64-linux.tar.gz

##Set IGDATA variable to see the internal_files
ENV IGDATA="/opt/ncbi-igblast-${igblast_version}"

#Install Kallisto
RUN wget https://github.com/pachterlab/kallisto/releases/download/${kallisto_version}/kallisto_linux-${kallisto_version}.tar.gz && \
tar -xzvf kallisto_linux-${kallisto_version}.tar.gz -C /opt && \
rm kallisto_linux-${kallisto_version}.tar.gz

#Install Salmon
RUN wget https://github.com/COMBINE-lab/salmon/releases/download/v${salmon_version}/salmon-${salmon_version}_linux_x86_64.tar.gz && \
tar -xzvf salmon-${salmon_version}_linux_x86_64.tar.gz -C /opt && \
rm salmon-${salmon_version}_linux_x86_64.tar.gz

#Copy gencode_parse.py from GitHub repo
COPY gencode_parse.py /gencode_parse.py

#Installing transcript sequences for human and mouse
RUN mkdir -p /var/GRCh38 && \
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_${gencode_human_version}/gencode.v${gencode_human_version}.transcripts.fa.gz -O /var/GRCh38/gencode.v${gencode_human_version}.transcripts.fa.gz && \
gunzip /var/GRCh38/gencode.v${gencode_human_version}.transcripts.fa.gz && \
python3 /gencode_parse.py /var/GRCh38/gencode.v${gencode_human_version}.transcripts.fa && \
mv /genemap.tsv /var/GRCh38/ && \
mv /transcripts.fasta /var/GRCh38/ && \
rm /var/GRCh38/gencode.v${gencode_human_version}.transcripts.fa

RUN mkdir -p /var/GRCm38 && \
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_${gencode_mouse_version}/gencode.v${gencode_mouse_version}.transcripts.fa.gz -O /var/GRCm38/gencode.v${gencode_mouse_version}.transcripts.fa.gz && \
gunzip /var/GRCm38/gencode.v${gencode_mouse_version}.transcripts.fa.gz && \
python3 /gencode_parse.py /var/GRCm38/gencode.v${gencode_mouse_version}.transcripts.fa && \
mv /genemap.tsv /var/GRCm38/ && \
mv /transcripts.fasta /var/GRCm38/ && \
rm /var/GRCm38/gencode.v${gencode_mouse_version}.transcripts.fa && \
rm /gencode_parse.py

#Copy requirments.txt, setup.py and tracerlib from github repo

COPY requirements.txt /requirements.txt

COPY setup.py /setup.py

RUN mkdir -p /opt/tracer

COPY tracerlib /opt/tracer/tracerlib

#Install tracer

RUN pip3 install pycairo

RUN cd /opt/tracer && \
python /setup.py install && \
rm /setup.py && \
rm /requirements.txt

#Setting up tracer config file

COPY tracer.conf /home/.tracerrc

#Adding software to path

ENV PATH="${PATH}:/opt/kallisto:/opt/bowtie2-${bowtie_version}-linux-x86_64:/opt/ncbi-igblast-${igblast_version}/bin:/opt/salmon-${salmon_version}_linux_x86_64/bin:/opt/samtools-${samtools_version}:/opt/trinityrnaseq-${trinity_version}:/opt/tracer"

#Building Indexes
RUN kallisto index -i /var/GRCh38/kallisto.idx /var/GRCh38/transcripts.fasta

RUN kallisto index -i /var/GRCm38/kallisto.idx /var/GRCm38/transcripts.fasta

RUN salmon index --index /var/GRCh38/salmon --transcripts /var/GRCh38/transcripts.fasta

RUN salmon index --index /var/GRCm38/salmon --transcripts /var/GRCm38/transcripts.fasta

#Copying resources

RUN mkdir -p /usr/local/bin/resources

COPY resources /usr/local/bin/resources
COPY docker_wrapper.sh /tracer/docker_wrapper.sh

#Saving software versions to a file
RUN echo "bowtie2 version: ${bowtie_version}" >> versions.txt && \
echo "samtools version: ${samtools_version}" >> versions.txt && \
echo "trinity version: ${trinity_version}" >> versions.txt && \
echo "igblast version: ${igblast_version}" >> versions.txt && \
echo "kallisto version: ${kallisto_version}" >> versions.txt && \
echo "salmon version: ${salmon_version}" >> versions.txt && \
echo "human transcript version: ${gencode_human_version}" >> versions.txt && \
echo "mouse transcript version: ${gencode_mouse_version}" >> versions.txt

ENTRYPOINT ["bash", "/tracer/docker_wrapper.sh"]
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -15,9 +15,11 @@ TraCeR - reconstruction of T cell receptor sequences from single-cell RNA-seq da
## Introduction
This tool reconstructs the sequences of rearranged and expressed T cell receptor genes from single-cell RNA-seq data. It then uses the TCR sequences to identify cells that have the same receptor sequences and so derive from the same original clonally-expanded cell.

For more information on TraCeR, its validation and how it can be applied to investigate T cell populations during infection, see our [paper in Nature Methods](http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3800.html) or the [bioRxiv preprint](http://biorxiv.org/content/early/2015/08/28/025676) that preceded it.
For more information on TraCeR, its validation and how it can be applied to investigate T cell populations during infection, see our [paper in Nature Methods](http://dx.doi.org/10.1038/nmeth.3800) or the [bioRxiv preprint](http://biorxiv.org/content/early/2015/08/28/025676) that preceded it.

Please email questions / problems to ms31@sanger.ac.uk
Please email questions / problems to mjt.stubbington \[at\] gmail.com.

_Also, please note that I (Mike) created and developed TraCeR while I was at EMBL-EBI and subsequently at the Sanger Institute. Now that I have left Sanger, I will continue to maintain the code when possible **as a personal project** independent of any employment that I have_

## Installation
TraCeR is written in Python and so can just be downloaded, made executable (with `chmod u+x tracer`) and run or run with `python tracer`. Download the latest version and accompanying files from www.github.com/teichlab/tracer.
@@ -40,7 +42,7 @@ Note that TraCeR is compatible with both Python 2 and 3.
6. [Graphviz](http://www.graphviz.org) - Dot and Neato drawing programs required for visualisation of clonotype graphs. This is optional - see the [`--no_networks` option](#options-1) to [`summarise`](#summarise-summary-and-clonotype-networks).

##### Installing IgBlast
Downloading the executable files from `ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/<version_number>` is not sufficient for a working IgBlast installation. You must also download the `internal_data` directory (ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/internal_data) and put it into the same directory as the igblast executable. This is also described in the igblast README file.
Downloading the executable files from `ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/<version_number>` is not sufficient for a working IgBlast installation. You must also download the `internal_data` directory (ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/internal_data) and `optional_file` directory (ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/optional_file/) and put them into the same directory as the igblast executable. This is also described in the igblast README file.

You should also ensure to set the `$IGDATA` environment variable to point to the location of the IgBlast executable. For example run `export IGDATA=/<path_to_igblast>/igblast/1.4.0/bin`.

@@ -139,8 +141,7 @@ Type of sequence to be analysed. Since TraCeR currently only works with TCR sequ
Mmus = /path/to/kallisto/transcriptome_for_Mmus
Hsap = /path/to/kallisto/transcriptome_for_Hsap

Location of the transcriptome fasta file to which the specific TCR sequences will be appended from each cell. Can be downloaded from http://bio.math.berkeley.edu/kallisto/transcriptomes/ and many other places. This must be a plain-text fasta file so decompress it if necessary (files from the Kallisto link are gzipped).

Location of the transcriptome fasta file to which the specific TCR sequences will be appended from each cell. Instructions on how to make reference transcriptomes can be found at: http://www.nxn.se/valent/2016/10/3/the-first-steps-in-rna-seq-expression-analysis-single-cell-and-other. Reference trasncriptomes should be uncompressed fasta files.
#### Base indices for Kallisto/Salmon
[salmon_base_indices]
Mmus = /path/to/salmon/index_for_Mmus
@@ -168,12 +169,12 @@ See salmon [documentation](http://salmon.readthedocs.io/en/latest/salmon.html) f
tracer_path = /path/to/tracer

Location of the cloned TraCeR repository containing tracerlib, test_data, resources etc. Eg. `/user/software/tracer`.
Needed for localisation of resources and test_data if running TraCeR with the tracer binary.
Needed for localisation of resources and test_data if running TraCeR with the tracer binary. Note: this is the path to the **directory** and not the executable.

## Testing TraCeR
TraCeR comes with a small dataset in `test_data/` (containing only TCRA or TCRB reads for a single cell) that you can use to test your installation and config file and confirm that all the prerequisites are working. Run it as:

tracer test -p <ncores> -c <config_file> -o <output>
tracer test -p <ncores> -c <config_file>

**Note:** The data used in the test are derived from mouse T cells so make sure that the config file points to the appropriate mouse resource files.

@@ -227,6 +228,7 @@ Tracer has three modes: *assemble*, *summarise* and *build*.
* `--single_end` : use this option if your data are single-end reads. If this option is set you must specify fragment length and fragment sd as below.
* `--fragment_length` : Estimated average fragment length in the sequencing library. Used for Kallisto quantification. Required for single-end data. Can also be set for paired-end data if you don't want Kallisto to estimate it directly.
* `--fragment_sd` : Estimated standard deviation of average fragment length in the sequencing library. Used for Kallisto quantification. Required for single-end data. Can also be set for paired-end data if you don't want Kallisto to estimate it directly.
* `--max_junc_len` : Maximum permitted length of junction string in recombinant identifier. Used to filter out artefacts. May need to be longer for TCR Delta.
* `--invariant_sequences`: Custom invariant sequence file. Use the default example in 'resources/Mmus/invariant_seqs.csv'

#### Output
@@ -284,7 +286,7 @@ The following output files are generated:
1. `TCR_summary.txt`
Summary statistics describing successful TCR reconstruction rates and the numbers of cells with 0, 1, 2 or more recombinants for each locus.
2. `recombinants.txt`
List of TCR identifiers, lengths and productivities for each cell.
List of TCR identifiers, lengths, productivities, and CDR3 sequences (nucleotide and amino acid) for each cell. **Note:** It's possible for non-productive rearrangements to still have a detectable CDR3 if a frameshift introduces a stop-codon after this region. In these cases, the CDR3 nucleotides and amino acids are still reported but are shown in lower-case.
3. `reconstructed_lengths_TCR[A|B].pdf` and `reconstructed_lengths_TCR[A|B].txt`
Distribution plots (and text files with underlying data) showing the lengths of the VDJ regions from assembled TCR contigs. Longer contigs give higher-confidence segment assignments. Text files are only generated if at least one TCR is found for a locus. Plots are only generated if at least two TCRs are found for a locus.
4. `clonotype_sizes.pdf` and `clonotype_sizes.txt`
54 changes: 0 additions & 54 deletions docker_helper_files/docker_tracer.conf

This file was deleted.

6 changes: 0 additions & 6 deletions docker_helper_files/docker_wrapper.sh

This file was deleted.

28 changes: 0 additions & 28 deletions docker_helper_files/requirements_stable.txt

This file was deleted.

4 changes: 4 additions & 0 deletions docker_wrapper.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

tracer ${@:1}
File renamed without changes.
44 changes: 27 additions & 17 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,18 +1,28 @@
biopython>=1.66
cycler>=0.10.0
decorator>=4.0.9
matplotlib>=1.5.1
biopython==1.77
cryptography==38.0.3
cycler==0.11.0
decorator==5.1.1
future==0.18.2
idna==3.4
keyring==23.11.0
keyrings.alt==4.2.0
matplotlib==3.6.2
mock==4.0.3
networkx==1.11
numpy>=1.11.0
pandas>=0.18.0
prettytable>=0.7.2
pydotplus>=2.0.2
pyparsing>=2.0.3
python-dateutil>=2.5.2
python-Levenshtein>=0.12.0
pytz>=2016.3
scipy>=0.17.0
seaborn>=0.7.0
six>=1.10.0
mock>=2.0.0
future>=0.15.2
numpy==1.23.4
pandas==1.5.1
pbr==5.11.0
prettytable==3.5.0
pyasn1==0.5.0
pycrypto==2.6.1
pydotplus==2.0.2
pygobject==3.42.2
pyparsing==2.4.6
python-dateutil==2.8.2
python-Levenshtein==0.20.8
pytz==2022.6
pyxdg==0.28
scipy==1.9.3
seaborn==0.12.0
SecretStorage==3.3.3
six==1.14.0
Loading