-
Notifications
You must be signed in to change notification settings - Fork 9
Building you own graphs from scratch preparing them for Graph Peak Caller
If you have your own set of variants (snps/indels), then vg has a very good guide for creating whole-genome graphs from a vcf file and a linear reference. In case the vg team changes or removes their guide, we have copied and slightly rewritten the version of the guide that we used to create graphs when testing Graph Peak Caller. Our version of the guide can be found here.
The following assumes you have created your graphs following that vg tutorial, and now have one vg graph for each chromosome. We will now convert these to Offset Based Graphs and produce the necessary files needed in order to run a whole-genome ChIP-seq experiment.
This tutorial assumes you are having chromosomes 1-5. If you have something else, simply change seq 1 5
to what you have (for human, you can use seq 1 22; echo X; echo Y
).
We convert the vg graphs to json and create ob graphs:
for chr in $(seq 1 5);
do
vg view -Vj $chr.vg > $chr.json
graph_peak_caller create_ob_graph $chr.json
done
Graph Peak Caller also has a script for splitting reads into separate files for each chromosome. If you want to use that, you will need to find the start and end node of each graph (this is quick to run):
for chr in $(seq 1 5);
do
vg stats -r $chr.vg | awk '{print $2}' > node_range_$chr.txt
done
This is optional and not needed for peak calling. However, it is required if you want to analyse graph peaks to linear peaks.
for chr in $(seq 1 5);
do
graph_peak_caller find_linear_path -g $chr.nobg $chr.json $chr ${chr}_linear_pathv2.interval &
done
You now have all the required data you need in order to do a full graph-based ChIP-seq experiment.