Skip to content

Building you own graphs from scratch preparing them for Graph Peak Caller

ivargr edited this page Apr 12, 2019 · 10 revisions

If you have your own set of variants (snps/indels), then vg has a very good guide for creating whole-genome graphs from a vcf file and a linear reference. In case the vg team changes or removes their guide, we have copied and slightly rewritten the version of the guide that we used to create graphs when testing Graph Peak Caller. Our version of the guide can be found here.

The following assumes you have created your graphs following that vg tutorial, and now have one vg graph for each chromosome. We will now convert these to Offset Based Graphs and produce the necessary files needed in order to run a whole-genome ChIP-seq experiment.

This tutorial assumes you are having chromosomes 1-5. If you have something else, simply change seq 1 5 to what you have (for human, you can use seq 1 22; echo X; echo Y).

Creating ob-graphs

We convert the vg graphs to json and create ob graphs:

for chr in $(seq 1 5);
do
    vg view -Vj $chr.vg > $chr.json
    graph_peak_caller create_ob_graph $chr.json
done

Finding start and end nodes

Graph Peak Caller also has a script for splitting reads into separate files for each chromosome. If you want to use that, you will need to find the start and end node of each graph (this is quick to run):

for chr in $(seq 1 5);
do
    vg stats -r $chr.vg  | awk '{print $2}' > node_range_$chr.txt
done

Creating an indexed path through each graph

This is optional and not needed for peak calling. However, it is required if you want to analyse graph peaks to linear peaks.

for chr in $(seq 1 5);
do
    graph_peak_caller find_linear_path -g $chr.nobg $chr.json $chr ${chr}_linear_pathv2.interval &
done

You now have all the required data you need in order to do a full graph-based ChIP-seq experiment.