Dear all,

Shannan suggested I started providing URLs of materials I’ll re-use for the edX course. Here goes:

I’ve used the general framework of our Galaxy course on Exome-Seq, ripping out all parts that are not relevant.
As we probably have to provide a bit more in-depth information on NGS itself I was planning to recycle materials from UC Davis’ Intro to NGS presentation (PDF) and the more recent 2014 version
For QC I’ll stick to our approach, but use the command line instead; I might pull in some additional examples from the Davis slides (PDF)
We have read alignment (PDF) covered nicely already but can expand if needed.
The core component is going to use FreeBayes (PDF) instead of GATK. Luckily, Eric has created an awesome ‘Getting started’ tutorial that I will be retracing pretty much completely. There is more information on the FreeBayes GitHub page and in a 2013 presentation (PDF)
Despite that, it might make sense to go through the GATK Best Practices. I might also pull in some more details from their in-depth description if there’s sufficient time, but taking a few excerpts from UC Davis’ Notes on GATK presentation (PDF) should be sufficient.
For the VCF interpretation, the GATK’s summary is a good starting point; I will stick to regular VCFs rather than tackling gVCFs
Aaron has a Gemini tutorial that we can retrace. Major drawback is that this adds 1.5GB to the download
For the ‘next steps’ I wanted to walk them through a bcbio tutorial. Problem here is that this is quite unlikely to work on any of the participants laptops / VMs. Ideas welcome.
Ideally I’d like to use the somatic variant calling pipeline for bcbio as an example, but given the IT constraints I am thinking we might do the whole somatic variant calling bit just as slides / presentation.
Can use this as a reminder of reproducibility, scalability, etc., using slides from Brad and Rory (see some recent talks.
Also part of what-next could be AWS; again, now supported by bcbio. We can point people at (again) the UC Davis AWS signup tutorial or use our own.

The Linux/Unix parts are completely independent right now. I was going to re-use materials from the Unix Intro and the Unix primer for biologists, but if you have something ready to go from the other course I’d love to delegate this. They will have to learn about basic redirects and pipes, but that’s as far as it goes.

At this point I am tempted to drop structural variation calls as they require almost invariably whole genome data which I do not want to handle in this course. We can explore CNVs, but I am not sure we are going to find anything in the shallow sequence data we are using. If we absolutely want to include it the section will be based on CNVKit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

references.md

references.md

Files

references.md

Latest commit

History

references.md

File metadata and controls