Skip to content

Commit

Permalink
Included JPA
Browse files Browse the repository at this point in the history
  • Loading branch information
banskt committed Nov 22, 2018
1 parent 08238bf commit 1a93cf8
Show file tree
Hide file tree
Showing 10 changed files with 183 additions and 84 deletions.
22 changes: 16 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
(Currently in development)

The following methods will be included:
* [x] MatrixEQTL
* [ ] GNetLMM
* [ ] CPMA (implemented as JPA within TEJAAS)
* [x] [MatrixEQTL](http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/)
* [ ] [GNetLMM](https://github.com/PMBio/GNetLMM) (see [Installation instructions](https://github.com/banskt/trans-eqtl-pipeline/wiki/Install-GNetLMM-in-GWDG-cluster))
* [x] [CPMA](https://github.com/cotsapaslab/CPMAtranseqtl) (The authors did not provide updated software, the pipeline uses JPA scores from TEJAAS)
* [x] TEJAAS

We also want to compare:
Expand All @@ -26,8 +26,18 @@ The pipeline expects the following input files:
* Genotype (in gzipped dosage format)
* Expression (tab-separated text file, gene name in column1, expression for `N` patients in the next `N` columns, header line starting with `gene_id` in first column and sample ids in the next `N` columns)
* Sample (a dummy [sample file in Oxford format](http://www.stats.ox.ac.uk/~marchini/software/gwas/file_format.html))
* GENCODE file
* gene position file (for MatrixEQTL)
* MAF file from 1000Genomes

## How to run
1. Update the file paths in `main/PATH`.
2. Create a `CONFIG` file (see example in `configs/CONFIG`).
3. Run the different scripts from within `main` directory.
1. Within `bsubfiles` folder, change the job submission criteria and module loadings as per your requirements (GWDG users, skip this step)
2. Modify `main/utils/submit_job` to your own job scheduling mechanism (`bsub` users, skip this step)
3. Update the path of external programs `main/EXTERNAL`
4. Update the path of your datasets in `main/DATA`.
5. Create a `CONFIG` file (see example in `configs/CONFIG`).
6. Run the pipeline from within `main` directory.
```
cd main
./01_validation_pipeline.sh configs/CONFIG
```
16 changes: 10 additions & 6 deletions main/01_validation_pipeline.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,15 @@ fi

source ${CONFIGFILE}
source PATHS

RANDSTRING=`cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 4 | head -n 1`

# get the different functions
source ${UTILSDIR}/submit_job
source ${UTILSDIR}/add_deps
source ${UTILSDIR}/unset_vars
source EXTERNAL

RANDSTRING=`cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 4 | head -n 1`
RUNJPA=false # used for submitting jpa-only jobs
SHUFFLE=false # used for controlling shuffling
JOBDEPS="None" # used for controlling job dependencies

for MDATA in ${DATASETS}; do

Expand All @@ -27,14 +29,16 @@ for MDATA in ${DATASETS}; do
if [ ! -z "$EXPRESSIONFILE" ]; then

echo "Submitting jobs for $MDATA"
JOBDEPS=""

if [ "${bMatrixEqtl}" = "true" ]; then source ${UTILSDIR}/matrix_eqtl; fi
if [ "${bMEqtlRandom}" = "true" ]; then source ${UTILSDIR}/matrix_eqtl; fi
if [ "${bMEqtlRandom}" = "true" ]; then SHUFFLE=true; source ${UTILSDIR}/matrix_eqtl; fi
if [ "${bTejaas}" = "true" ]; then source ${UTILSDIR}/tejaas; fi
if [ "${bTejaasJPA}" = "true" ]; then RUNJPA=true; source ${UTILSDIR}/tejaas; fi

fi

# echo ${JOBDEPS}

unset_vars DATA

done
Expand Down
5 changes: 5 additions & 0 deletions main/EXTERNAL
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash

LDSTORE="${HOME}/packages/ldstore/ldstore_v1.1_x86_64/ldstore"
TEJAAS="${HOME}/trans-eQTL/codebase/tejaas/bin/tejaas"
GNETLMMDIR="${HOME}/packages/GNetLMM"
15 changes: 7 additions & 8 deletions main/PATHS
Original file line number Diff line number Diff line change
@@ -1,18 +1,17 @@
#!/bin/bash

CURDIR=`pwd`

# external programs required for the pipeline
LDSTORE="${HOME}/packages/ldstore/ldstore_v1.1_x86_64/ldstore"
TEJAAS="${HOME}/trans-eQTL/codebase/tejaas/bin/tejaas"

# Pipeline directories
UTILSDIR="${CURDIR}/utils"
CONFIGDIR="${CURDIR}/configs"
SCRIPTDIR="${CURDIR}/../scripts"
MASTER_BSUBDIR="${CURDIR}/../bsubfiles"
JOBSUBDIR="${CURDIR}/../jobsubs"
ANALYSISDIR="${CURDIR}/../analysis"
CURDIRUP="$(dirname "$(pwd)")"
UTILSDIR="${CURDIRUP}/main/utils"
CONFIGDIR="${CURDIRUP}/main/configs"
SCRIPTDIR="${CURDIRUP}/scripts"
MASTER_BSUBDIR="${CURDIRUP}/bsubfiles"
JOBSUBDIR="${CURDIRUP}/jobsubs"
ANALYSISDIR="${CURDIRUP}/analysis"

# Script files
MATRIXEQTL="${SCRIPTDIR}/matrixeqtl.R"
6 changes: 3 additions & 3 deletions main/configs/CONFIG
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#CHRNUMS="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22"
#CHRNUMS="1 2 3"
CHRNUMS="11 12"
DATASETS="cardio-mono cardio-macro"
DATASETS="cardio-mono"

# MatrixEQTL options
bMatrixEqtl=false # run the method if true
Expand All @@ -13,10 +13,10 @@ MEQTL_PVALTHRES_TRANS=0.00001
MATRIXEQTL_MODEL="modelLINEAR"

# TEJAAS options
bTejaas=true # run the method if true
bTejaas=false # run the method if true
TEJAAS_MODEL="jpa-rr" # only one model allowed
TEJAAS_NULL="perm maf" # list of null models
TEJAAS_SIGMA_BETA="0.01 0.001" # list of sigma beta
TEJAAS_SIGMA_BETA="0.01" # list of sigma beta
MAX_NSNP_PERJOB=20000 # number of SNPs for analysis by TEJAAS per job
TEJAAS_SNPS_THRES=0.001 # all trans-eQTLs are printed, but target genes are printed for those trans-eQTLs below this threshold
TEJAAS_GENE_THRES=0.001 # even for the above, not all target genes are printed; only those with p-values below this threshold
Expand Down
2 changes: 1 addition & 1 deletion main/utils/add_deps
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ function add_deps() {
local _JobDeps=$1
local _AddJob=$2
local _Result=""
if [ ! -z "$_JobDeps" ]; then
if [ ! "$_JobDeps" == "None" ]; then
_Result="${_JobDeps} && done(${_AddJob})"
else
_Result="done(${_AddJob})"
Expand Down
3 changes: 2 additions & 1 deletion main/utils/matrix_eqtl
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@
THISMETHOD="matrixeqtl"
THISJOBDEPS=""
EXTRAFLAGS=""
if [ "${bMEqtlRandom}" = "true" ]; then
if [ "${SHUFFLE}" = "true" ]; then
THISMETHOD="matrixeqtl_rand"
EXTRAFLAGS="--randomize"
SHUFFLE=false
fi

JOBSUBDIR_MATRIX_EQTL="${JOBSUBDIR_DATA}/${THISMETHOD}"
Expand Down
22 changes: 11 additions & 11 deletions main/utils/submit_job
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ function submit_job_later() {
cd ${CWD}
}

# ok=0
# # workaround, sometimes the file was submitted to the cluster but not written to disk yet
# while [ $ok -lt 1 ]; do
# if [ -s ${JOBNAME}.bsub ]; then
# bsub < ${JOBNAME}.bsub
# ok=1
# else
# echo "File is empty, retrying.."
# sleep 1;
# fi
# done
# ok=0
# # workaround, sometimes the file was submitted to the cluster but not written to disk yet
# while [ $ok -lt 1 ]; do
# if [ -s ${JOBNAME}.bsub ]; then
# bsub < ${JOBNAME}.bsub
# ok=1
# else
# echo "File is empty, retrying.."
# sleep 1;
# fi
# done
81 changes: 81 additions & 0 deletions main/utils/submit_tejaas
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#!/bin/bash

function submit_tejaas() {
local tejaas=$1
local chrm=$2
local genotype=$3
local expression=$4
local sample=$5
local geneinfo=$6
local maffile=$7
local method=$8
local null=$9
local snpthres=${10}
local genethres=${11}
local sbeta=${12}
local extraflags=${13}
local jobsubdir=${14}
local outdir=${15}
local jobprefix=${16}
local nmax=${17}
local thisjobdep=${18}
local bsubdir=${19}
local __jobdeps=${20}
local jobdeps=${21}
local jobname=0
local i=0
local index=0
local includesnps=0
local startsnp=0
local endsnp=0
local outprefix=0

## echo $tejaas $chrm $genotype ${expression} ${sample} ${geneinfo} $maffile
## echo "Method: " ${method} ${null} ${snpthres} ${genethres} ${sbeta}
## echo "Extraflags: " ${extraflags}
## echo $jobsubdir $outdir $jobprefix
## echo $nmax $thisjobdep $bsubdir $__jobdeps $jobdeps

ntot=`zcat ${genotype} | wc -l`
njobs=$(echo $(( ntot/nmax )))

if [ -d ${jobsubdir} ]; then rm -rf ${jobsubdir}; fi; mkdir -p ${jobsubdir}
if [ ! -d ${outdir} ]; then mkdir -p ${outdir}; fi

for (( i=0; i <= ${njobs}; i++ )); do
index=`echo ${i} | awk '{printf "%03d", $1}'`
jobname="${jobprefix}_${index}"

startsnp=$(( nmax * i + 1 ))
endsnp=$(( nmax * (i + 1) ))
if [ $endsnp -gt $ntot ]; then
endsnp=${ntot}
fi
includesnps="${startsnp}:${endsnp}"

outprefix="${outdir}/chunk${index}"

# create the job submission file
sed "s|_i_NAME|${jobname}|g;
s|_TJS_BINR|${tejaas}|g;
s|_GT_FILE_|${genotype}|g;
s|_SAM_FILE|${sample}|g;
s|_EXPR_FL_|${expression}|g;
s|_GEN_POSF|${geneinfo}|g;
s|_TJ_METHD|${method}|g;
s|_NULL_MDL|${null}|g;
s|_OUT_PRFX|${outprefix}|g;
s|_STRT_END|${includesnps}|g;
s|_SNP_CUT_|${snpthres}|g;
s|_GEN_CUT_|${genethres}|g;
s|_SIG_BETA|${sbeta}|g;
s|_CHRM_NUM|${chrm}|g;
s|_MAF_FILE|${maffile}|g;
s|_EXT_FLAG|\"${extraflags}\"|g;
" ${bsubdir}/tejaas.bsub > ${jobsubdir}/${jobname}.bsub

submit_job ${jobsubdir} ${jobname} ${thisjobdep}
jobdeps=`add_deps "${jobdeps}" ${jobname}`
done
eval $__jobdeps="'$jobdeps'"
}
95 changes: 47 additions & 48 deletions main/utils/tejaas
Original file line number Diff line number Diff line change
@@ -1,66 +1,65 @@
#!/bin/bash

THISMETHOD="tejaas"
THISJOBDEPS=""
THISJOBDEPS="None"

source ${UTILSDIR}/submit_tejaas

for CHRM in ${CHRNUMS}; do
GENOTYPEFILE=${GENO_FMT/\[CHRM\]/${CHRM}}
TOTALSNPS=`zcat ${GENOTYPEFILE} | wc -l`
NJOBS=$(echo $(( TOTALSNPS/MAX_NSNP_PERJOB )))
MAF_1KG_FILE=${MAF_1KG_FMT/\[CHRM\]/${CHRM}}

for NULL in ${TEJAAS_NULL}; do
for SBETA in ${TEJAAS_SIGMA_BETA}; do
if [ "${RUNJPA}" = "true" ]; then
METHOD_VARIANT="jpa"
SPECIFIC_JOBSUBDIR="${JOBSUBDIR_DATA}/${THISMETHOD}/${METHOD_VARIANT}/chr${CHRM}"
SPECIFIC_OUTDIR="${OUTDIR_DATA}/${THISMETHOD}/${METHOD_VARIANT}/chr${CHRM}"

METHOD_VARIANT="${NULL}null_sb${SBETA}"
SPECIFIC_JOBSUBDIR="${JOBSUBDIR_DATA}/${THISMETHOD}/${METHOD_VARIANT}/chr${CHRM}"
SPECIFIC_OUTDIR="${OUTDIR_DATA}/${THISMETHOD}/${METHOD_VARIANT}/chr${CHRM}"
EXTRAFLAGS="--dosage"
if [ "${DATATYPE}" = "cardiogenics" ]; then
EXTRAFLAGS="$EXTRAFLAGS --trim"
fi

if [ -d ${SPECIFIC_JOBSUBDIR} ]; then rm -rf ${SPECIFIC_JOBSUBDIR}; fi; mkdir -p ${SPECIFIC_JOBSUBDIR}
if [ ! -d ${SPECIFIC_OUTDIR} ]; then mkdir -p ${SPECIFIC_OUTDIR}; fi

JOBPREFIX="${THISMETHOD}_${METHOD_VARIANT}_${MDATA}_chr${CHRM}_${RANDSTRING}"

for (( JOB=0; JOB <= ${NJOBS}; JOB++ )); do
INDEX=`echo ${JOB} | awk '{printf "%03d", $1}'`
JOBNAME="${JOBPREFIX}_${INDEX}"

STARTSNP=$(( MAX_NSNP_PERJOB * JOB + 1 ))
ENDSNP=$(( MAX_NSNP_PERJOB * (JOB+1) ))
if [ $ENDSNP -gt $TOTALSNPS ]; then
ENDSNP=${TOTALSNPS}
fi
INCSNP="${STARTSNP}:${ENDSNP}"

OUTPREFIX="${SPECIFIC_OUTDIR}/chunk${INDEX}"
JOBPREFIX="${THISMETHOD}_${METHOD_VARIANT}_${MDATA}_chr${CHRM}_${RANDSTRING}"

submit_tejaas ${TEJAAS} ${CHRM} ${GENOTYPEFILE} ${EXPRESSIONFILE} ${SAMPLEFILE} \
${GENEINFOFILE} ${MAF_1KG_FILE} \
jpa perm \
${TEJAAS_SNPS_THRES} ${TEJAAS_GENE_THRES} 0.01 \
"${EXTRAFLAGS}" \
${SPECIFIC_JOBSUBDIR} ${SPECIFIC_OUTDIR} ${JOBPREFIX} \
${MAX_NSNP_PERJOB} \
${THISJOBDEPS} ${MASTER_BSUBDIR} \
JOBDEPS "${JOBDEPS}"

else
for NULL in ${TEJAAS_NULL}; do
for SBETA in ${TEJAAS_SIGMA_BETA}; do

METHOD_VARIANT="${NULL}null_sb${SBETA}"
SPECIFIC_JOBSUBDIR="${JOBSUBDIR_DATA}/${THISMETHOD}/${METHOD_VARIANT}/chr${CHRM}"
SPECIFIC_OUTDIR="${OUTDIR_DATA}/${THISMETHOD}/${METHOD_VARIANT}/chr${CHRM}"

EXTRAFLAGS="--dosage"
if [ "${DATATYPE}" = "cardiogenics" ]; then
EXTRAFLAGS="$EXTRAFLAGS --trim"
fi

# create the job submission file
sed "s|_JOB_NAME|${JOBNAME}|g;
s|_TJS_BINR|${TEJAAS}|g;
s|_GT_FILE_|${GENOTYPEFILE}|g;
s|_SAM_FILE|${SAMPLEFILE}|g;
s|_EXPR_FL_|${EXPRESSIONFILE}|g;
s|_GEN_POSF|${GENEINFOFILE}|g;
s|_TJ_METHD|${TEJAAS_MODEL}|g;
s|_NULL_MDL|${NULL}|g;
s|_OUT_PRFX|${OUTPREFIX}|g;
s|_STRT_END|${INCSNP}|g;
s|_SNP_CUT_|${TEJAAS_SNPS_THRES}|g;
s|_GEN_CUT_|${TEJAAS_GENE_THRES}|g;
s|_SIG_BETA|${SBETA}|g;
s|_CHRM_NUM|${CHRM}|g;
s|_MAF_FILE|${MAF_1KG_FILE}|g;
s|_EXT_FLAG|\"${EXTRAFLAGS}\"|g;
" ${MASTER_BSUBDIR}/tejaas.bsub > ${SPECIFIC_JOBSUBDIR}/${JOBNAME}.bsub

submit_job ${SPECIFIC_JOBSUBDIR} ${JOBNAME} ${THISJOBDEPS}
JOBDEPS=`add_deps "${JOBDEPS}" ${JOBNAME}`

JOBPREFIX="${THISMETHOD}_${METHOD_VARIANT}_${MDATA}_chr${CHRM}_${RANDSTRING}"

submit_tejaas ${TEJAAS} ${CHRM} ${GENOTYPEFILE} ${EXPRESSIONFILE} ${SAMPLEFILE} \
${GENEINFOFILE} ${MAF_1KG_FILE} \
${TEJAAS_MODEL} ${NULL} \
${TEJAAS_SNPS_THRES} ${TEJAAS_GENE_THRES} ${SBETA}\
"${EXTRAFLAGS}" \
${SPECIFIC_JOBSUBDIR} ${SPECIFIC_OUTDIR} ${JOBPREFIX} \
${MAX_NSNP_PERJOB} \
${THISJOBDEPS} ${MASTER_BSUBDIR} \
JOBDEPS "${JOBDEPS}"

done
done
done

fi
done

RUNJPA=false

0 comments on commit 1a93cf8

Please sign in to comment.