From 8acdc6e199ac4b062c11e12a407d2ac98e102ddd Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Tue, 12 Oct 2021 11:07:57 +0100 Subject: [PATCH 01/29] Updated list of citations List of citations now obtained from: https://europepmc.org/search?query=CITES%3A30254741_MED --- fastq_screen_documentation.md | 20 +------------------- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/fastq_screen_documentation.md b/fastq_screen_documentation.md index 02dba50..2278b17 100644 --- a/fastq_screen_documentation.md +++ b/fastq_screen_documentation.md @@ -294,25 +294,7 @@ FastQ Screen is distributed under a "GNU General Public License", a copy of whic Papers citing FastQ Screen ========================== -Picornell AC, Echavarria I, Alvarez E, et al.: Breast cancer PAM50 signature: correlation and concordance between RNA-Seq and digital multiplexed gene expression technologies in a triple negative breast cancer series. BMC Genomics. 2019; DOI: 10.1186/s12864-019-5849-0 - -Laufer BI, Hwang H, Vogel Ciernia A et al., Whole genome bisulfite sequencing of Down syndrome brain reveals regional DNA hypermethylation and novel disorder insights. Epigenetics. 2019; 14(7), 672-684; DOI:10.1080/15592294.2019.1609867 - -Chana-Muñoz A, Jendroszek Agnieszka Sønnichsen M, et al.: Origin and diversification of the plasminogen activation system among chordates. BMC evolutionary biology 2019; 19(1); DOI:10.1186/s12862-019-1353-z - -Dawidowska M, Jaksik Roman, Szarzyńska-Zawadzka B, el al.: Comprehensive Investigation of miRNome Identifies Novel Candidate miRNA-mRNA Interactions Implicated in T-Cell Acute Lymphoblastic Leukemia. Neoplasia. 2019; 21(3), 294—310; DOI:10.1016/j.neo.2019.01.004 - -Woodham EF, Paul NR, Tyrrell B, et al.: Coordination by Cdc42 of Actin, Contractility, and Adhesion for Melanoblast Movement in Mouse Skin. Curr Biol. 2017; 27(5): 624–637 - -Perrin S, Firmo C, Lemoine S, et al.: Aozan: an automated post-sequencing data-processing pipeline. Bioinformatics. 2017; 33(14): 2212–2213. - -O'Sullivan NJ, Teasdale MD, Mattiangeli V, et al.: A whole mitochondria analysis of the Tyrolean Iceman's leather provides insights into the animal sources of Copper Age clothing. Sci Rep. 2016; 6: 31279. - -Ewels P, Magnusson M, Lundin S, et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016; 32(19): 3047–3048. - -Fiddyment S, Holsinger B, Ruzzier C, et al.: Animal origin of 13th-century uterine vellum revealed using noninvasive peptide fingerprinting. Proc Natl Acad Sci U S A. 2015; 112(49): 15066–15071. - -Rose G, Wooldridge DJ, Anscombe C, et al.: Challenges of the Unknown: Clinical Application of Microbial Metagenomics. Int J Genomics. 2015; 2015: 292950. +`https://europepmc.org/search?query=CITES%3A30254741_MED `_ How to cite FastQ Screen From 9c617280b742f10b7c0294763d7becc101fd5366 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Tue, 12 Oct 2021 14:32:03 +0100 Subject: [PATCH 02/29] Added _config.yml For GitHub Pages themes --- _config.yml | 1 + 1 file changed, 1 insertion(+) create mode 100644 _config.yml diff --git a/_config.yml b/_config.yml new file mode 100644 index 0000000..6d92ae8 --- /dev/null +++ b/_config.yml @@ -0,0 +1 @@ +remote_theme: carlosperate/jekyll-theme-rtd From 1d7d62d48042ef046531b1ae807233b8c74a704c Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Tue, 12 Oct 2021 14:41:26 +0100 Subject: [PATCH 03/29] Created docs folder The project documentation is now kept in a docs folder. This is intended for subsequent use with GitHub pages. --- .../fastq_screen_documentation.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename fastq_screen_documentation.md => docs/fastq_screen_documentation.md (100%) diff --git a/fastq_screen_documentation.md b/docs/fastq_screen_documentation.md similarity index 100% rename from fastq_screen_documentation.md rename to docs/fastq_screen_documentation.md From c40731f49811a88e473638984ec2c55593d4005d Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Tue, 12 Oct 2021 14:48:13 +0100 Subject: [PATCH 04/29] Renamed documentation file Renamed docs/fastq_screen_documentation.md to docs/README.md so it can be recognised by GitHub pages. --- docs/{fastq_screen_documentation.md => README.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/{fastq_screen_documentation.md => README.md} (100%) diff --git a/docs/fastq_screen_documentation.md b/docs/README.md similarity index 100% rename from docs/fastq_screen_documentation.md rename to docs/README.md From cb4bd45337adee649c6724ac2312ee76ecde66cd Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Tue, 12 Oct 2021 15:11:14 +0100 Subject: [PATCH 05/29] Set theme jekyll-theme-slate --- docs/_config.yml | 1 + 1 file changed, 1 insertion(+) create mode 100644 docs/_config.yml diff --git a/docs/_config.yml b/docs/_config.yml new file mode 100644 index 0000000..c741881 --- /dev/null +++ b/docs/_config.yml @@ -0,0 +1 @@ +theme: jekyll-theme-slate \ No newline at end of file From 30e7755e1c8a88aa4a8465f646f1262709585fb4 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Tue, 12 Oct 2021 15:14:32 +0100 Subject: [PATCH 06/29] Using a Read The Docs Theme for GitHub Pages --- docs/_config.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/_config.yml b/docs/_config.yml index c741881..6d92ae8 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -1 +1 @@ -theme: jekyll-theme-slate \ No newline at end of file +remote_theme: carlosperate/jekyll-theme-rtd From 6d14cd5f6338a2cd6debac7681dc4ee0d5872ba2 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Tue, 12 Oct 2021 15:48:31 +0100 Subject: [PATCH 07/29] Rename README.md to fastq_screen_documentation.md Test whether renaming to previous name works with GitHub pages. --- docs/{README.md => fastq_screen_documentation.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/{README.md => fastq_screen_documentation.md} (100%) diff --git a/docs/README.md b/docs/fastq_screen_documentation.md similarity index 100% rename from docs/README.md rename to docs/fastq_screen_documentation.md From 2bda0244844d7a7b085320f64f96051cb70ec384 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Tue, 12 Oct 2021 15:53:41 +0100 Subject: [PATCH 08/29] Rename docs/fastq_screen_documentation.md to README.md File needs to be named docs/fastq_screen_documentation.md for GitHub pages to work. --- docs/fastq_screen_documentation.md => README.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/fastq_screen_documentation.md => README.md (100%) diff --git a/docs/fastq_screen_documentation.md b/README.md similarity index 100% rename from docs/fastq_screen_documentation.md rename to README.md From c64594122c50d99a2036c08c92028feaba91fdba Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Tue, 12 Oct 2021 15:56:13 +0100 Subject: [PATCH 09/29] Rename README.md to docs/README.md Move to correct folder. --- README.md => docs/README.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename README.md => docs/README.md (100%) diff --git a/README.md b/docs/README.md similarity index 100% rename from README.md rename to docs/README.md From 16147594a56b2490eb79055f456d533ad9873edb Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Tue, 11 Jan 2022 14:40:47 +0000 Subject: [PATCH 10/29] Update _config.yml --- _config.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_config.yml b/_config.yml index 6d92ae8..78392bc 100644 --- a/_config.yml +++ b/_config.yml @@ -1 +1,3 @@ remote_theme: carlosperate/jekyll-theme-rtd + +edit_on_github: false From 7b41ee82a248ff19f1bc49debcccdd5363166cdf Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Wed, 12 Jan 2022 17:28:25 +0000 Subject: [PATCH 11/29] Update _config.yml --- docs/_config.yml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/_config.yml b/docs/_config.yml index 6d92ae8..a4e95ee 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -1 +1,4 @@ remote_theme: carlosperate/jekyll-theme-rtd + +github: + is_project_page: false From 200ceebfe703808464478ea5bb4691cb8ffc410a Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Thu, 13 Jan 2022 15:37:38 +0000 Subject: [PATCH 12/29] Update README.md --- docs/README.md | 186 ++++++++++++++++++++++++++++--------------------- 1 file changed, 105 insertions(+), 81 deletions(-) diff --git a/docs/README.md b/docs/README.md index 2278b17..c99a70f 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,6 +1,9 @@ +drawing + + Introduction ============ -FastQ Screen is a simple application which allows you to search a large sequence dataset against a panel of different genomes to determine from where the sequences in your data originate. It was built as a QC check for sequencing pipelines but may also be useful in characterising metagenomic samples. When running a sequencing pipeline it is useful to know that your sequencing runs contain the types of sequence they're supposed to. Your search libraries might contain the genomes of all of the organisms you work on, along with PhiX, Vectors or other contaminants commonly seen in sequencing experiments. +FastQ Screen is a simple application which allows you to search a large sequence dataset against a panel of different genomes to determine from where the sequences in your data originate. It was built as a QC check for sequencing pipelines but may also be useful to characterise metagenomic samples. When running a sequencing pipeline it is useful to know that your sequencing runs contain the types of sequence they're supposed to. Your search libraries might contain the genomes of all of the organisms you work on, along with PhiX, Vectors or other contaminants commonly seen in sequencing experiments. Although the program wasn't built with any particular technology in mind it is probably only really suitable for processing short reads due to the use of either Bowtie, Bowtie2 or BWA as the searching application. @@ -8,35 +11,43 @@ The program generates both text and graphical output to inform you what proporti ***(Please note, in version 0.9.4 the graphs colour scheme changed from that shown below to a similar, but colour-blind safe colour scheme.)*** - .. image:: http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/good_sequence_screen.png +![Good Sequencing Results](http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/good_sequence_screen.png) In contrast, poor sequencing results will include results from one or more unexpected species. Identifying such reads may help the user discover the source of the contamination. - .. image:: http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/bad_sequence_screen.png +![Poor Sequencing Results](http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/bad_sequence_screen.png) FastQ Screen online tutorials ============================= To assist your understanding of FastQ Screen and how it should be used, we have prepared a series of short training videos. -`Training video 1: Introduction to FastQ Screen `_ +**Training video 1: Introduction to FastQ Screen** + +[![Training video 1: Introduction to FastQ Screen](https://img.youtube.com/vi/8IsGdikLhaE/0.jpg)](https://www.youtube.com/watch?v=8IsGdikLhaE) + +**Training video 2: Downloading, configuring and running FastQ Screen** + +[![Training video 2: Downloading, configuring and running FastQ Screen](https://img.youtube.com/vi/WqiKPRxHzNU/0.jpg)](https://www.youtube.com/watch?v=WqiKPRxHzNU) -`Training video 2: Downloading, configuring and running FastQ Screen `_ +**Training video 3: Interpreting FastQ Screen results** -`Training video 3: Interpreting FastQ Screen results `_ +[![Training video 3: Interpreting FastQ Screen results](https://img.youtube.com/vi/x32k84HHqjQ/0.jpg)](https://www.youtube.com/watch?v=x32k84HHqjQ) -`Training video 4: Filtering FASTQ Files `_ +**Training video 4: Filtering FASTQ files** + +[![Training video 4: Filtering FASTQ Files](https://img.youtube.com/vi/eJcAv-Dt57I/0.jpg)](https://www.youtube.com/watch?v=eJcAv-Dt57I) **We recommend watching these before using FastQ Screen for the first time.** In total the videos take no longer than 20 minutes to watch, and should could cover everything you need to get started with the software. Project Homepage ================ -The FastQ Screen Homepage can be found `here `_ +The FastQ Screen Homepage can be found [here.](http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen) Download ======== -FastQ Screen may be obtained from the `Babraham Bioinformatics download page. `_ +FastQ Screen may be obtained from the [GitHub download page.](https://github.com/StevenWingett/FastQ-Screen/releases/latest) Requirements summary @@ -54,7 +65,7 @@ Installation ============ Before running FastQ Screen there are a few prerequisites that will need to be installed: -1. A sequence aligner. FastQ Screen is compatible with `Bowtie `_, `Bowtie2 `_ or `BWA `_. It's easier if you put the chosen aligner in your path, but if not you can configure its location in the config file. +1. A sequence aligner. FastQ Screen is compatible with [Bowtie](http://bowtie-bio.sourceforge.net), [Bowtie2](http://bowtie-bio.sourceforge.net) or [BWA](http://bio-bwa.sourceforge.net). It's easier if you put the chosen aligner in your path, but if not you can configure its location in the config file. 2. We recommend running FastQ Screen in a Linux system, on which the programming language Perl should already be installed. @@ -63,50 +74,50 @@ Before running FastQ Screen there are a few prerequisites that will need to be i You can use the built in CPAN shell to install this module: -``perl -MCPAN -e "install GD"`` + perl -MCPAN -e "install GD" Because GD graph uses GD this will be brought in as a dependency. GD may be easier to install using a package manager on many linux distributions. On Fedora for example you can install GD using: -``yum install perl-GD`` + yum install perl-GD ..before doing the CPAN install of GD::Graph Actually installing Fastq Screen is very simple. Download the tar.gz distribution file and then do: -``tar -xzf fastq_screen_v0.x.x.tar.gz`` + tar -xzf fastq_screen_v0.x.x.tar.gz -You will see a folder called fastq\_screen\_v0.x.x has been created and the program is inside that. You can add the program to your path either by linking the program into: -``usr/local/bin`` or by adding the program installation directory to your search path. +You will see a folder called fastq_screen_v0.x.x has been created and the program is inside that. You can add the program to your path either by linking the program into: +/usr/local/bin or by adding the program installation directory to your search path. Configuration ============= In order to use FastQ Screen you will need to configure some genome databases for the program to search. This will involve downloading the sequences for the databases in FASTA format and then using either Bowtie, Bowtie2 or BWA to build the relevant index files. Please note: the aligner used to build the index files must be used to map the reads -Once you have built your index you can configure the FastQ Screen program. You do this by editing the fastq\_screen.conf.example file which is distributed with the program. This shows an example set of database configurations which you will need to change to reflect the actual databases you have set up. FastQ Screen can process up to a maximum of 32 reference genomes. Rename the file to fastq\_screen.conf after you have finished editing. +Once you have built your index you can configure the FastQ Screen program. You do this by editing the fastq_screen.conf.example file which is distributed with the program. This shows an example set of database configurations which you will need to change to reflect the actual databases you have set up. FastQ Screen can process up to a maximum of 32 reference genomes. Rename the file to fastq_screen.conf after you have finished editing. The other options you can set in the config file are the location of the aligner binary (if it's not in your path),and the number of threads you want to allocate to the aligner when performing your screen. The number of threads will be the number of CPU cores the code will run on so you shouldn't set this value higher than the number of physical cores you have in your machine. The more threads you can allow the faster the searching part of the screen will run. An example command is shown below. This would process two FASTQ files and would create the screen output in the same directory as the original files. -``fastq_screen sample1.fastq sample2.fastq`` + fastq_screen sample1.fastq sample2.fastq -By default the program looks for a configuration file named "fastq\_screen.conf" in the folder where the FastQ Screen script it is located. If you wish to specify a different configuration file, which may be placed in different folder, then use the --conf option: +By default the program looks for a configuration file named "fastq_screen.conf" in the folder where the FastQ Screen script it is located. If you wish to specify a different configuration file, which may be placed in different folder, then use the --conf option: -``fastq_screen --conf /home/myConfig.conf sample1.fastq sample2.fastq`` + fastq_screen --conf /home/myConfig.conf sample1.fastq sample2.fastq Full documentation for the FastQ Screen options can be obtained by running: -``fastq_screen --help`` + fastq_screen --help Obtaining reference genomes =========================== -The sequence aligners Bowtie, Bowtie2 and BWA require reference genomes against which to map FASTQ reads. If you do not have these genomes already in place on your system, you can build them by downloading genome sequence FASTA files from a public database (such as those made available at the `NCBI website `_). Then, simply create genome indices from the FASTA files as detailed in the instructions for your chosen aligner. +The sequence aligners Bowtie, Bowtie2 and BWA require reference genomes against which to map FASTQ reads. If you do not have these genomes already in place on your system, you can build them by downloading genome sequence FASTA files from a public database (such as those made available at the [NCBI website](https://www.ncbi.nlm.nih.gov/genome)). Then, simply create genome indices from the FASTA files as detailed in the instructions for your chosen aligner. Alternatively, pre-built Bowtie2 indices of commonly used genomes may be downloaded directly from the Babraham Bioinformatics website with the command: -``fastq_screen --get_genomes`` + fastq_screen --get_genomes The genome indices will be downloaded to a folder named "FastQ_Screen_Genomes" in your current working directory (or to another location if --outdir is specified). In addition to the genome indices, the folder FastQ_Screen_Genomes will contain a configuration file named "fastq_screen.conf", which is ready to use and lists the correct paths to the newly downloaded reference genomes. This configuration file can be passed to fastq_screen with the --conf command, or may be used as the default configuration by copying the file to the folder containing the fastq_screen script. @@ -114,11 +125,11 @@ The genome indices will be downloaded to a folder named "FastQ_Screen_Genomes" i Test Dataset ============ -To confirm FastQ Screen functions correctly on your system please download the `Test Dataset `_ The file 'fastq\_screen\_test\_dataset.fastq.gz' contains reads in Sanger FASTQ format. +To confirm FastQ Screen functions correctly on your system please download the [Test Dataset.](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/fastq_screen_test_dataset.tar.gz) The file 'fastq_screen_test_dataset.fastq.gz' contains reads in Sanger FASTQ format. -1. Extract the tar archive before processing: -``tar xvzf fastq_screen_test_dataset.tar.gz`` + 1\. Extract the tar archive before processing: + tar xvzf fastq_screen_test_dataset.tar.gz 2. If not present already, create index files of recent versions of the Mouse and Human genomes (how the index files are generated will depend on the aligner used for the mapping i.e. refer to either the Bowtie, Bowtie2 or BWA documentation for further details). @@ -129,19 +140,19 @@ To confirm FastQ Screen functions correctly on your system please download the ` Interpreting the results from a large number of datasets ======================================================== -FastQ Screen output is compatible with `MultiQC `_, a specialist tool for aggregating results from bioinformatics analyses across many samples into a single report. We recommend using this tool for quickly interpreting the FastQ Screen results from a large number of datasets. +FastQ Screen output is compatible with [MultiQC](http://multiqc.info), a specialist tool for aggregating results from bioinformatics analyses across many samples into a single report. We recommend using this tool for quickly interpreting the FastQ Screen results from a large number of datasets. Screening Bisulfite Samples =========================== -Mapping bisulfite converted sequences is possible with FastQ Screen, which uses the tool `Bismark `_ to process the FASTQ files. After downloading and setting-up Bismark, provide the path to Bismark in the configuration file and run FastQ Screen in bisulfite mode. +Mapping bisulfite converted sequences is possible with FastQ Screen, which uses the tool [Bismark](http://www.bioinformatics.babraham.ac.uk/projects/bismark) to process the FASTQ files. After downloading and setting-up Bismark, provide the path to Bismark in the configuration file and run FastQ Screen in bisulfite mode. -``fastq_screen --bisulfite sample3.fastq`` + fastq_screen --bisulfite sample3.fastq FastQ Screen, when run in Bisulfite mode, reports to which strand the reads aligned (original top strand, complementary to original top strand, complementary to original bottom strand, or original bottom -strand). Refer to the `Bismark `_ documentation for more details on these bisulfite strand definitions. +strand). Refer to the [Bismark](http://www.bioinformatics.babraham.ac.uk/projects/bismark) documentation for more details on these bisulfite strand definitions. Filtering FastQ Files @@ -150,23 +161,23 @@ You may want to filter your data to remove reads mapping to a certain species. To create a tagged FASTQ file, enter on the command line something similar to that below: -``fastq_screen --tag sample4.fastq`` + fastq_screen --tag sample4.fastq To filter the tag file, enter on the command line something similar to that below: -``fastq_screen --filter 1000 sample5.fastq`` + fastq_screen --filter 1000 sample5.fastq This instructs FastQ Screen to extract from the FASTQ file reads that map uniquely to genome 1, but not to genomes 2, 3 or 4 (genome order set by the ordered entered in the configuration file). See the table in the FastQ Screen Option Summmary for further details of the --filter options. It is also possible to tag and filter a file in a single operation: -``fastq_screen --tag --filter 0001 sample6.fastq`` + fastq_screen --tag --filter 0001 sample6.fastq In this example the file is tagged and reads mapping to a single location on genome 4, but do not align to any of the other three genomes, are written to the output file. Adjust the filter options as required: -``fastq_screen --tag --filter 5555 --pass 1 sample7.fastq`` + fastq_screen --tag --filter 5555 --pass 1 sample7.fastq The --pass command allows the user to specify how many filters need to be passed for a read to be written to the output file. By default, all the filters should be passed. Consequently the example above will remove reads that map uniquely to any of the genomes. @@ -174,17 +185,17 @@ Another useful option is --inverse. This option inverts the --filter results i. It is also possible to extract reads mapping to none of the reference genomes with the option --nohits: -``fastq_screen --nohits sample7.fastq`` + fastq_screen --nohits sample7.fastq The option --nohits is equivalent to --tag --filter 0000 (zero for every genome screened). By adjusting the filters and, if necessary, undergoing several rounds of filtering it should be possible for a user to extract the desired reads. -Filtering paired-end reads files separately will generate files with un-paired reads e.g. a read may be present in File1, but its corresponding pair may not be found in File2. Also, the order of the reads in processed files may not correspond to on another. Consequently, the resulting file pairs will need processing after filtering with FastQ Screen. `Several tools are available (although not currently produced by us) to achieve this re-pairing `_ +Filtering paired-end reads files separately will generate files with un-paired reads e.g. a read may be present in File1, but its corresponding pair may not be found in File2. Also, the order of the reads in processed files may not correspond to on another. Consequently, the resulting file pairs will need processing after filtering with FastQ Screen. [Several tools are available (although not currently produced by us) to achieve this re-pairing.](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/repair-guide) There may also may be occasions when, after filtering a FASTQ file, the tags need to be removed from the headers of each read. This can be achieved using the script Misc/remove_tags.pl. -A video tutorial explaining how to filter FASTQ files may be found `here `__ +A video tutorial explaining how to filter FASTQ files may be found [here.](https://www.youtube.com/watch?v=eJcAv-Dt57I) Performance @@ -193,22 +204,20 @@ The memory requirements and the time taken to process a dataset will vary substa The table below summarises the time taken to process large and small FASTQ files (output from HiSeq and MiSeq sequencers respectively). Both FASTQ files, which were derived from sequencing human samples, were processed using 14 threads on a 256 node compute cluster, running CentOS v6.2 and using Bowtie2 v2.3.2 as the aligner. -=============================== ========= =========== -Classification File A File B -=============================== ========= =========== -Number of reads 7,535,739 250,033,919 -QC Mode Wallclock time 00:02:03 00:17:15 -QC Mode System time 00:01:14 00:02:29 -QC Mode CPU time 00:06:48 00:30:22 -QC Mode Maximum memory (GB) 4.620 4.621 -Filter Mode Wallclock time 00:36:48 15:09:58 -Filter Mode System time 00:38:34 1:03:03:32 -Filter Mode CPU time 05:11:01 4:06:56:22 -Filter Mode Maximum memory (GB) 4.733 12.037 -=============================== ========= =========== +| Classification | File A | File B | +| -------------- | ------ | ------ | +| Number of reads | 7,535,739 | 250,033,919 | +| QC Mode Wallclock time | 00:02:03 | 00:17:15 | +| QC Mode System time | 00:01:14 | 00:02:29 | +| QC Mode CPU time | 00:06:48 | 00:30:22 | +| QC Mode Maximum memory (GB) | 4.620 | 4.621 | +| Filter Mode Wallclock time | 00:36:48 | 15:09:58 | +| Filter Mode System time | 00:38:34 | 1:03:03:32 | +| Filter Mode CPU time | 05:11:01 | 4:06:56:22 | +| Filter Mode Maximum memory (GB) | 4.733 | 12.037 | Many factors will determine the memory requirements of FASTQ Screen and the time taken to process a file. Listed below are the most important factors to consider: -* System processor, memory and other jobs being processed simulateously +* System processor, memory and other jobs being processed simultaneously * Number of threads * Number of genomes to screen * Number of reads to process @@ -240,51 +249,50 @@ FastQ Screen Options Summary Below gives an explanation of each character. -========== =========================================== -Character Explanation -========== =========================================== - 0 Read does not map - 1 Read maps uniquely - 2 Read multi-maps - 3 Read maps (one or more times) - 4 Passes filter 0 or filter 1 - 5 Passes filter 0 or filter 2 - \- Ignore whether a read maps to this genome -========== =========================================== + +| Character | Explanation | +| --------- | ----------- | +| 0 | Read does not map | +| 1 | Read maps uniquely | +| 2 | Read multi-maps | +| 3 | Read maps (one or more times) | +| 4 | Passes filter 0 or filter 1 | +| 5 | Passes filter 0 or filter 2 | +| - | Ignore whether a read maps to this genome | Consider mapping to three genomes (A, B and C), the string '003' produces a file in which reads do not map to genomes A or B, but map (once or more) to genome C. The string '--1' would generate a file in which reads uniquely map to genome C. Whether reads map to genome A or B would be ignored. When --filter is used in conjunction with --tag, FASTQ files shall be mapped, tagged and then filtered. If the --tag option is not selected however, the input FASTQ file should have been previously tagged. -**force :** Do not terminate if output files already exist, instead overwrite the files. +- **force :** Do not terminate if output files already exist, instead overwrite the files. -**get_genomes :** Download pre-indexed Bowtie2 genomes for a range of commonly studied species and sequences. If used with --bisulfite, Bismark bisulfite Bowtie2 indices will be downloaded instead. +- **get_genomes :** Download pre-indexed Bowtie2 genomes for a range of commonly studied species and sequences. If used with --bisulfite, Bismark bisulfite Bowtie2 indices will be downloaded instead. -**help :** Print program help and exit. +- **help :** Print program help and exit. -**illumina1_3 :** Assume that the quality values are in encoded in Illumina v1.3 format. Defaults to Sanger format if this flag is not specified. +- **illumina1_3 :** Assume that the quality values are in encoded in Illumina v1.3 format. Defaults to Sanger format if this flag is not specified. -**inverse :** Inverts the --filter results i.e. reads that pass the --filter parameter will not pass when --filter --inverse are specified together, and vice versa. +- **inverse :** Inverts the --filter results i.e. reads that pass the --filter parameter will not pass when --filter --inverse are specified together, and vice versa. -**nohits :** Writes to a file the sequences that did not map to any of the specified genomes. This option is equivalent to specifying --tag --filter 0000 (number of zeros corresponds to the number of genomes screened). By default the whole input file will be mapped, unless overridden by --subset. +- **nohits :** Writes to a file the sequences that did not map to any of the specified genomes. This option is equivalent to specifying --tag --filter 0000 (number of zeros corresponds to the number of genomes screened). By default the whole input file will be mapped, unless overridden by --subset. -**outdir \ :** Specify a directory in which to save output files. If no directory is specified then output files are saved in the current working directory. +- **outdir \ :** Specify a directory in which to save output files. If no directory is specified then output files are saved in the current working directory. -**pass \ :** Used in conjuction with --filter. By default all genome filters must be passed for a read to pass the --filter option. However, a minimum number of genome filters may be specified that a read needs to pass to be considered to pass the --filter option. (--pass 1 effectively acts as an OR boolean operator for the genome filters.) +- **pass \ :** Used in conjuction with --filter. By default all genome filters must be passed for a read to pass the --filter option. However, a minimum number of genome filters may be specified that a read needs to pass to be considered to pass the --filter option. (--pass 1 effectively acts as an OR boolean operator for the genome filters.) -**quiet :** Suppress all progress reports on stderr and only report errors. +- **quiet :** Suppress all progress reports on stderr and only report errors. -**subset \ :** Don't use the whole sequence file, but create a temporary dataset of this specified number of reads. The dataset created will be of approximately (within a factor of 2) of this size. If the real dataset is smaller than twice the specified size then the whole dataset will be used. Subsets will be taken evenly from throughout the whole original dataset. By Default FastQ Screen runs with this parameter set to 100000. To process an entire dataset however, adjust --subset to 0. +- **subset \ :** Don't use the whole sequence file, but create a temporary dataset of this specified number of reads. The dataset created will be of approximately (within a factor of 2) of this size. If the real dataset is smaller than twice the specified size then the whole dataset will be used. Subsets will be taken evenly from throughout the whole original dataset. By Default FastQ Screen runs with this parameter set to 100000. To process an entire dataset however, adjust --subset to 0. -**tag :** Label each FASTQ read header with a tag listing to which genomes the read did, or did not align. The first read in the output FASTQ file will list the full genome names along with a score denoting whether the read did not align (0), aligned uniquely to the specified genome (1), or aligned more than once (2). In subsequent reads the genome names are omitted and only the score is printed, in the same order as the first line. +- **tag :** Label each FASTQ read header with a tag listing to which genomes the read did, or did not align. The first read in the output FASTQ file will list the full genome names along with a score denoting whether the read did not align (0), aligned uniquely to the specified genome (1), or aligned more than once (2). In subsequent reads the genome names are omitted and only the score is printed, in the same order as the first line. This option results in the he whole file being processed unless overridden explicitly by the user with the --subset parameter -**threads \ :** Specify across how many threads bowtie will be allowed to run. Overrides the default value set in the configuration file. +- **threads \ :** Specify across how many threads bowtie will be allowed to run. Overrides the default value set in the configuration file. -**top \/\ :** Don't use the whole sequence file, but create a temporary dataset of the specified number of reads taken from the top of the original file. It is also possible to specify the number of lines to skip before beginning the selection e.g. --top 100000,5000000 skips the first five million reads and selects the subsequent one hundred thousand reads. While this option is usually faster than comparable --subset operations, it does not prevent biases arising from non-uniform distribution of reads in the original FastQ file. This option should only be used when minimising processing time is of highest priority. +- **top \/\ :** Don't use the whole sequence file, but create a temporary dataset of the specified number of reads taken from the top of the original file. It is also possible to specify the number of lines to skip before beginning the selection e.g. --top 100000,5000000 skips the first five million reads and selects the subsequent one hundred thousand reads. While this option is usually faster than comparable --subset operations, it does not prevent biases arising from non-uniform distribution of reads in the original FastQ file. This option should only be used when minimising processing time is of highest priority. -**version :** Print the program version and exit. +- **version :** Print the program version and exit. Terms of use @@ -292,9 +300,27 @@ Terms of use FastQ Screen is distributed under a "GNU General Public License", a copy of which is distributed with the software. -Papers citing FastQ Screen -========================== -`https://europepmc.org/search?query=CITES%3A30254741_MED `_ +Selected Papers citing FastQ Screen +=================================== +- Picornell AC, Echavarria I, Alvarez E, et al.: Breast cancer PAM50 signature: correlation and concordance between RNA-Seq and digital multiplexed gene expression technologies in a triple negative breast cancer series. BMC Genomics. 2019; DOI: 10.1186/s12864-019-5849-0 + +- Laufer BI, Hwang H, Vogel Ciernia A et al., Whole genome bisulfite sequencing of Down syndrome brain reveals regional DNA hypermethylation and novel disorder insights. Epigenetics. 2019; 14(7), 672-684; DOI:10.1080/15592294.2019.1609867 + +- Chana-Muñoz A, Jendroszek Agnieszka Sønnichsen M, et al.: Origin and diversification of the plasminogen activation system among chordates. BMC evolutionary biology 2019; 19(1); DOI:10.1186/s12862-019-1353-z + +- Dawidowska M, Jaksik Roman, Szarzyńska-Zawadzka B, el al.: Comprehensive Investigation of miRNome Identifies Novel Candidate miRNA-mRNA Interactions Implicated in T-Cell Acute Lymphoblastic Leukemia. Neoplasia. 2019; 21(3), 294—310; DOI:10.1016/j.neo.2019.01.004 + +- Woodham EF, Paul NR, Tyrrell B, et al.: Coordination by Cdc42 of Actin, Contractility, and Adhesion for Melanoblast Movement in Mouse Skin. Curr Biol. 2017; 27(5): 624–637 + +- Perrin S, Firmo C, Lemoine S, et al.: Aozan: an automated post-sequencing data-processing pipeline. Bioinformatics. 2017; 33(14): 2212–2213. + +- O'Sullivan NJ, Teasdale MD, Mattiangeli V, et al.: A whole mitochondria analysis of the Tyrolean Iceman's leather provides insights into the animal sources of Copper Age clothing. Sci Rep. 2016; 6: 31279. + +- Ewels P, Magnusson M, Lundin S, et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016; 32(19): 3047–3048. + +- Fiddyment S, Holsinger B, Ruzzier C, et al.: Animal origin of 13th-century uterine vellum revealed using noninvasive peptide fingerprinting. Proc Natl Acad Sci U S A. 2015; 112(49): 15066–15071. + +- Rose G, Wooldridge DJ, Anscombe C, et al.: Challenges of the Unknown: Clinical Application of Microbial Metagenomics. Int J Genomics. 2015; 2015: 292950. How to cite FastQ Screen @@ -307,6 +333,4 @@ Wingett SW and Andrews S. FastQ Screen: A tool for multi-genome mapping and qual Report problems =============== -If you have any problems running this program you can report them on `GitHub `_. - -Please email any other queries to: steven.wingett@babraham.ac.uk +If you have any problems running this program you can report them on [GitHub.](https://github.com/StevenWingett/FastQ-Screen/issues) From 2aa6e4aec0b616c4fcacaafa687f603bacfcdb64 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Thu, 13 Jan 2022 15:39:11 +0000 Subject: [PATCH 13/29] Set theme jekyll-theme-modernist --- docs/_config.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/_config.yml b/docs/_config.yml index a4e95ee..55ab52a 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -2,3 +2,5 @@ remote_theme: carlosperate/jekyll-theme-rtd github: is_project_page: false + +theme: jekyll-theme-modernist \ No newline at end of file From 39918035f60432c10cc4c16c2ede71b5068dc5a7 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Thu, 13 Jan 2022 15:39:47 +0000 Subject: [PATCH 14/29] Update _config.yml --- docs/_config.yml | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/docs/_config.yml b/docs/_config.yml index 55ab52a..75e2535 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -1,6 +1 @@ -remote_theme: carlosperate/jekyll-theme-rtd - -github: - is_project_page: false - -theme: jekyll-theme-modernist \ No newline at end of file +theme: jekyll-theme-modernist From 9cc1885bd152cfa952ef2411adb690c569d9d71f Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Thu, 13 Jan 2022 15:42:48 +0000 Subject: [PATCH 15/29] Update README.md --- docs/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/README.md b/docs/README.md index c99a70f..9185dbb 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,4 +1,4 @@ -drawing +drawing Introduction From edac7e724cccac5bcb61cd8bfcc57c22db406561 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Thu, 13 Jan 2022 15:44:22 +0000 Subject: [PATCH 16/29] Update README.txt --- README.txt | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/README.txt b/README.txt index ffdd69b..b971749 100644 --- a/README.txt +++ b/README.txt @@ -11,12 +11,10 @@ The FastQ Screen Homepage is found at: https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen Full documentation on using the software is provided at: -https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/_build/html/index.html +https://stevenwingett.github.io/FastQ-Screen/ The codebase is maintained at: https://github.com/StevenWingett/FastQ-Screen Bug reports, queries or suggestions can be made at: https://github.com/StevenWingett/FastQ-Screen/issues - -Alternatively, please email steven.wingett@babraham.ac.uk \ No newline at end of file From 52c3dbbde052eaf350150bf07e79653c25c0577b Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Thu, 13 Jan 2022 15:53:37 +0000 Subject: [PATCH 17/29] Update README.md --- docs/README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/README.md b/docs/README.md index 9185dbb..d8cc393 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,5 +1,4 @@ -drawing - +drawing Introduction ============ From 962e2eeceec198aa1a6ef7dc27262ba3f7a15b79 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Thu, 13 Jan 2022 15:56:28 +0000 Subject: [PATCH 18/29] Update README.md --- docs/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/README.md b/docs/README.md index d8cc393..991cf4a 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,4 +1,4 @@ -drawing +drawing Introduction ============ From af3bb4df9ad9f742411f5b36165ebccda07258e6 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Fri, 14 Jan 2022 14:11:14 +0000 Subject: [PATCH 19/29] Added Zenodo link --- docs/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/README.md b/docs/README.md index 991cf4a..299304b 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,5 +1,7 @@ drawing +[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5838377.svg)](https://doi.org/10.5281/zenodo.5838377) + Introduction ============ FastQ Screen is a simple application which allows you to search a large sequence dataset against a panel of different genomes to determine from where the sequences in your data originate. It was built as a QC check for sequencing pipelines but may also be useful to characterise metagenomic samples. When running a sequencing pipeline it is useful to know that your sequencing runs contain the types of sequence they're supposed to. Your search libraries might contain the genomes of all of the organisms you work on, along with PhiX, Vectors or other contaminants commonly seen in sequencing experiments. From ddb22e08b5b74a6540e6d23889c1d594254c4fe9 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Fri, 14 Jan 2022 14:30:44 +0000 Subject: [PATCH 20/29] Improved documentation wording and layout --- docs/README.md | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/docs/README.md b/docs/README.md index 299304b..98b3f51 100644 --- a/docs/README.md +++ b/docs/README.md @@ -18,6 +18,17 @@ In contrast, poor sequencing results will include results from one or more unexp ![Poor Sequencing Results](http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/bad_sequence_screen.png) + +Publication and how to cite FastQ Screen +======================================== +[FastQ Screen was published in the open access journal F1000Research.](https://doi.org/10.12688/f1000research.15931.2) + +FastQ Screen should be cited as follows: + +Wingett SW and Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control [version 2; referees: 4 approved]. F1000Research 2018, 7:1338 +(https://doi.org/10.12688/f1000research.15931.2) + + FastQ Screen online tutorials ============================= To assist your understanding of FastQ Screen and how it should be used, we have prepared a series of short training videos. @@ -324,14 +335,6 @@ Selected Papers citing FastQ Screen - Rose G, Wooldridge DJ, Anscombe C, et al.: Challenges of the Unknown: Clinical Application of Microbial Metagenomics. Int J Genomics. 2015; 2015: 292950. -How to cite FastQ Screen -======================== -FastQ Screen was published in the open access journal F1000Research. - -Wingett SW and Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control [version 2; referees: 4 approved]. F1000Research 2018, 7:1338 -(https://doi.org/10.12688/f1000research.15931.2) - - -Report problems +Feedback =============== -If you have any problems running this program you can report them on [GitHub.](https://github.com/StevenWingett/FastQ-Screen/issues) +Bug reports, queries or suggestions can be made via the [FastQ Screen GitHub page.](https://github.com/StevenWingett/FastQ-Screen/issues) From 11dcfe7bbbd28da3ac84e29a4f606f3730fe1483 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Sat, 15 Jan 2022 19:27:16 +0000 Subject: [PATCH 21/29] Update README.md Added release badge to README. --- docs/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/README.md b/docs/README.md index 98b3f51..837b4e2 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,5 +1,6 @@ drawing +![GitHub release (latest by date)](https://img.shields.io/github/v/release/StevenWingett/Fastq-Screen) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5838377.svg)](https://doi.org/10.5281/zenodo.5838377) Introduction From 348156230b4f1f62346c9e20e75aa498a342372a Mon Sep 17 00:00:00 2001 From: StevenWingett Date: Wed, 26 Jan 2022 10:42:29 +0000 Subject: [PATCH 22/29] Added folder download_genomes to provide information when the FastQ Screen genomes are downloaded --- download_genomes/README.md | 3 +++ download_genomes/genome_locations.txt | 2 ++ 2 files changed, 5 insertions(+) create mode 100644 download_genomes/README.md create mode 100644 download_genomes/genome_locations.txt diff --git a/download_genomes/README.md b/download_genomes/README.md new file mode 100644 index 0000000..3c62a45 --- /dev/null +++ b/download_genomes/README.md @@ -0,0 +1,3 @@ +This folder contains details on the downloaded regular (Bowtie2) and bisulphite genomes (Bismark/Bowtie2), should they need to be re-made / uploaded at some future date. + +This folder contains a tree of the files on the remote FTP, the regular/bisulphite configuration files downloaded from the FTP and the FTP location file downloaded from the Babraham server. diff --git a/download_genomes/genome_locations.txt b/download_genomes/genome_locations.txt new file mode 100644 index 0000000..45346af --- /dev/null +++ b/download_genomes/genome_locations.txt @@ -0,0 +1,2 @@ +ftp1.babraham.ac.uk/ftpusr46/FastQ_Screen_Genomes/ +ftp1.babraham.ac.uk/ftpusr46/FastQ_Screen_Genomes_Bisulfite/ From a138b39843fbf9d9102025628ead3c57f200cb36 Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Wed, 26 Jan 2022 10:50:16 +0000 Subject: [PATCH 23/29] Delete fastq_screen_documentation.md --- fastq_screen_documentation.md | 328 ---------------------------------- 1 file changed, 328 deletions(-) delete mode 100644 fastq_screen_documentation.md diff --git a/fastq_screen_documentation.md b/fastq_screen_documentation.md deleted file mode 100644 index bc07451..0000000 --- a/fastq_screen_documentation.md +++ /dev/null @@ -1,328 +0,0 @@ -Introduction -============ -FastQ Screen is a simple application which allows you to search a large sequence dataset against a panel of different genomes to determine from where the sequences in your data originate. It was built as a QC check for sequencing pipelines but may also be useful in characterising metagenomic samples. When running a sequencing pipeline it is useful to know that your sequencing runs contain the types of sequence they're supposed to. Your search libraries might contain the genomes of all of the organisms you work on, along with PhiX, Vectors or other contaminants commonly seen in sequencing experiments. - -Although the program wasn't built with any particular technology in mind it is probably only really suitable for processing short reads due to the use of either Bowtie, Bowtie2 or BWA as the searching application. - -The program generates both text and graphical output to inform you what proportion of your library was able to map, either uniquely or to more than one location, against each of your specified reference genomes. The user should therefore be able to identify a clean sequencing experiment in which the overwhelming majority of reads are probably derived from a single genomic origin. - -***(Please note, in version 0.9.4 the graphs colour scheme changed from that shown below to a similar, but colour-blind safe colour scheme.)*** - - .. image:: http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/good_sequence_screen.png - -In contrast, poor sequencing results will include results from one or more unexpected species. Identifying such reads may help the user discover the source of the contamination. - .. image:: http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/bad_sequence_screen.png - - -FastQ Screen online tutorials -============================= -To assist your understanding of FastQ Screen and how it should be used, we have prepared a series of short training videos. - -`Training video 1: Introduction to FastQ Screen `_ - -`Training video 2: Downloading, configuring and running FastQ Screen `_ - -`Training video 3: Interpreting FastQ Screen results `_ - -`Training video 4: Filtering FASTQ Files `_ - -**We recommend watching these before using FastQ Screen for the first time.** In total the videos take no longer than 20 minutes to watch, and should could cover everything you need to get started with the software. - - -Project Homepage -================ -The FastQ Screen Homepage can be found `here `_ - - -Download -======== -FastQ Screen may be obtained from the `Babraham Bioinformatics download page. `_ - - -Requirements summary -==================== -* Requirements: Linux-based operating system -* Language: Perl -* Bowtie, Bowtie2 or BWA -* gzip (optional) -* SAMtools (optional) -* GD::Graph (optional) -* Bismark (bisulfite mapping only) - - -Installation -============ -Before running FastQ Screen there are a few prerequisites that will need to be installed: - -1. A sequence aligner. FastQ Screen is compatible with `Bowtie `_, `Bowtie2 `_ or `BWA `_. It's easier if you put the chosen aligner in your path, but if not you can configure its location in the config file. - -2. We recommend running FastQ Screen in a Linux system, on which the programming language Perl should already be installed. - -3. GD::Graph FastQ Screen uses the GD::Graph module to draw PNG format graphs summarising the mapping results. FastQ Screen will still produce both text and HTML format summaries of the results if GD::Graph is not installed. - -You can use the built in CPAN shell to install -this module: - -``perl -MCPAN -e "install GD"`` - -Because GD graph uses GD this will be brought in as a dependency. GD may be easier to install using a package manager on many linux distributions. On Fedora for example you can install GD using: - -``yum install perl-GD`` - -..before doing the CPAN install of GD::Graph - -Actually installing Fastq Screen is very simple. Download the tar.gz distribution file and then do: - -``tar -xzf fastq_screen_v0.x.x.tar.gz`` - -You will see a folder called fastq\_screen\_v0.x.x has been created and the program is inside that. You can add the program to your path either by linking the program into: -``usr/local/bin`` or by adding the program installation directory to your search path. - - -Configuration -============= -In order to use FastQ Screen you will need to configure some genome databases for the program to search. This will involve downloading the sequences for the databases in FASTA format and then using either Bowtie, Bowtie2 or BWA to build the relevant index files. Please note: the aligner used to build the index files must be used to map the reads - -Once you have built your index you can configure the FastQ Screen program. You do this by editing the fastq\_screen.conf.example file which is distributed with the program. This shows an example set of database configurations which you will need to change to reflect the actual databases you have set up. FastQ Screen can process up to a maximum of 32 reference genomes. Rename the file to fastq\_screen.conf after you have finished editing. - -The other options you can set in the config file are the location of the aligner binary (if it's not in your path),and the number of threads you want to allocate to the aligner when performing your screen. The number of threads will be the number of CPU cores the code will run on so you shouldn't set this value higher than the number of physical cores you have in your machine. The more threads you can allow the faster the searching part of the screen will run. - -An example command is shown below. This would process two FASTQ files and would create the screen output in the same directory as the original files. - -``fastq_screen sample1.fastq sample2.fastq`` - -By default the program looks for a configuration file named "fastq\_screen.conf" in the folder where the FastQ Screen script it is located. If you wish to specify a different configuration file, which may be placed in different folder, then use the --conf option: - -``fastq_screen --conf /home/myConfig.conf sample1.fastq sample2.fastq`` - -Full documentation for the FastQ Screen options can be obtained by running: - -``fastq_screen --help`` - - -Obtaining reference genomes -=========================== -The sequence aligners Bowtie, Bowtie2 and BWA require reference genomes against which to map FASTQ reads. If you do not have these genomes already in place on your system, you can build them by downloading genome sequence FASTA files from a public database (such as those made available at the `NCBI website `_). Then, simply create genome indices from the FASTA files as detailed in the instructions for your chosen aligner. - -Alternatively, pre-built Bowtie2 indices of commonly used genomes may be downloaded directly from the Babraham Bioinformatics website with the command: - -``fastq_screen --get_genomes`` - -The genome indices will be downloaded to a folder named "FastQ_Screen_Genomes" in your current working directory (or to another location if --outdir is specified). In addition to the genome indices, the folder FastQ_Screen_Genomes will contain a configuration file named "fastq_screen.conf", which is ready to use and lists the correct paths to the newly downloaded reference genomes. This configuration file can be passed to fastq_screen with the --conf command, or may be used as the default configuration by copying the file to the folder containing the fastq_screen script. - - - -Test Dataset -============ -To confirm FastQ Screen functions correctly on your system please download the `Test Dataset `_ The file 'fastq\_screen\_test\_dataset.fastq.gz' contains reads in Sanger FASTQ format. - -1. Extract the tar archive before processing: -``tar xvzf fastq_screen_test_dataset.tar.gz`` - - -2. If not present already, create index files of recent versions of the Mouse and Human genomes (how the index files are generated will depend on the aligner used for the mapping i.e. refer to either the Bowtie, Bowtie2 or BWA documentation for further details). - -3. Create a configuration file tailored to your system. - -4. Run FastQ Screen - - -Interpreting the results from a large number of datasets -======================================================== -FastQ Screen output is compatible with `MultiQC `_, a specialist tool for aggregating results from bioinformatics analyses across many samples into a single report. We recommend using this tool for quickly interpreting the FastQ Screen results from a large number of datasets. - - -Screening Bisulfite Samples -=========================== -Mapping bisulfite converted sequences is possible with FastQ Screen, which uses the tool `Bismark `_ to process the FASTQ files. After downloading and setting-up Bismark, provide the path to Bismark in the configuration file and run FastQ Screen in bisulfite mode. - -``fastq_screen --bisulfite sample3.fastq`` - -FastQ Screen, when run in Bisulfite mode, reports to which strand the -reads aligned (original top strand, complementary to original top -strand, complementary to original bottom strand, or original bottom -strand). Refer to the `Bismark `_ documentation for more details on these bisulfite strand definitions. - - -Filtering FastQ Files -===================== -You may want to filter your data to remove reads mapping to a certain species. With FastQ Screen it is possible to generate a new FASTQ file in which each FASTQ read is tagged, listing to which genomes the read did, or did not align. This file may then be processed as required to select for, or filter out, reads aligning to given species. By default, selecting --tag will result in the whole file being processed, unless over-ridden by the --subset option. - -To create a tagged FASTQ file, enter on the command line something similar to that below: - -``fastq_screen --tag sample4.fastq`` - -To filter the tag file, enter on the command line something similar to that below: - -``fastq_screen --filter 1000 sample5.fastq`` - -This instructs FastQ Screen to extract from the FASTQ file reads that map uniquely to genome 1, but not to genomes 2, 3 or 4 (genome order set by the ordered entered in the configuration file). See the table in the FastQ Screen Option Summmary for further details of the --filter options. - -It is also possible to tag and filter a file in a single operation: - -``fastq_screen --tag --filter 0001 sample6.fastq`` - -In this example the file is tagged and reads mapping to a single location on genome 4, but do not align to any of the other three genomes, are written to the output file. - -Adjust the filter options as required: - -``fastq_screen --tag --filter 5555 --pass 1 sample7.fastq`` - -The --pass command allows the user to specify how many filters need to be passed for a read to be written to the output file. By default, all the filters should be passed. Consequently the example above will remove reads that map uniquely to any of the genomes. - -Another useful option is --inverse. This option inverts the --filter results i.e. reads that pass the --filter parameter will not pass when --filter --inverse are specified together, and vice versa. - -It is also possible to extract reads mapping to none of the reference genomes with the option --nohits: - -``fastq_screen --nohits sample7.fastq`` - -The option --nohits is equivalent to --tag --filter 0000 (zero for every genome screened). - -By adjusting the filters and, if necessary, undergoing several rounds of filtering it should be possible for a user to extract the desired reads. - -Filtering paired-end reads files separately will generate files with un-paired reads e.g. a read may be present in File1, but its corresponding pair may not be found in File2. Also, the order of the reads in processed files may not correspond to on another. Consequently, the resulting file pairs will need processing after filtering with FastQ Screen. `Several tools are available (although not currently produced by us) to achieve this re-pairing `_ - -There may also may be occasions when, after filtering a FASTQ file, the tags need to be removed from the headers of each read. This can be achieved using the script Misc/remove_tags.pl. - -A video tutorial explaining how to filter FASTQ files may be found `here `__ - - -Performance -=========== -The memory requirements and the time taken to process a dataset will vary substantially depending on the input and user settings. The table below summarises the results of mapping two different FASTQ files against a panel of genomes (*H. sapiens, M. musculus, R. norvegicus, E. coli, D. melanogaster, C. elegans. A. thaliana, S. cerevisiae, PhiX174*, sequencing adapters, commonly used vectors, rRNA, mitochondria, lambda phage). The table below summarises the results. - -The table below summarises the time taken to process large and small FASTQ files (output from HiSeq and MiSeq sequencers respectively). Both FASTQ files, which were derived from sequencing human samples, were processed using 14 threads on a 256 node compute cluster, running CentOS v6.2 and using Bowtie2 v2.3.2 as the aligner. - -=============================== ========= =========== -Classification File A File B -=============================== ========= =========== -Number of reads 7,535,739 250,033,919 -QC Mode Wallclock time 00:02:03 00:17:15 -QC Mode System time 00:01:14 00:02:29 -QC Mode CPU time 00:06:48 00:30:22 -QC Mode Maximum memory (GB) 4.620 4.621 -Filter Mode Wallclock time 00:36:48 15:09:58 -Filter Mode System time 00:38:34 1:03:03:32 -Filter Mode CPU time 05:11:01 4:06:56:22 -Filter Mode Maximum memory (GB) 4.733 12.037 -=============================== ========= =========== - -Many factors will determine the memory requirements of FASTQ Screen and the time taken to process a file. Listed below are the most important factors to consider: -* System processor, memory and other jobs being processed simulateously -* Number of threads -* Number of genomes to screen -* Number of reads to process -* Whether FastQ Screen subsets the data prior to processing. Typically, for QC reports, a file is subset to 100,000 reads prior to mapping. When filtering files, subsetting is typically not performed. -* Bisulfite libraries take considerably longer to process - -While it is not possible to cover every scenario, as a general rule using FastQ Screen to QC a dataset should take minutes whereas filtering a large dataset may take a several hours. - - -FastQ Screen Options Summary -============================ - **add_genome \ :** Edits the file 'fastq_screen.conf' (in the folder where this script is saved) to add a new genome. Specify the additional genome as a comma separated list: 'Database name','Genome path and basename','Notes' - -**aligner \ :** Specify the aligner to use for the mapping. Valid arguments are 'bowtie', bowtie2' (default) or 'bwa'. Bowtie maps with parameters -k 2, Bowtie 2 with parameters -k 2 --very-fast-local and BWA with mem -a. Local aligners such as BWA or Bowtie2 will be better at detecting the origin of chimeric reads. - -**bisulfite :** Process bisulfite libraries. Bismark runs in non-directional mode. The path to the bisulfite aligner (Bismark) may be specified in the configuration file. Either conventional or bisulfite libraries may be processed, but not both simultaneously. The --bisulfite option cannot be used in conjunction with --bwa. - -**bismark \ :** Specify extra parameters to be passed to Bismark. These parameters should be quoted to clearly delimit Bismark parameters from FastQ Screen parameters. - -**bowtie \ :** Specify extra parameters to be passed to Bowtie. These parameters should be quoted to clearly delimit bowtie parameters from FastQ Screen parameters. You should not try to use this option to override the normal search or reporting options for bowtie which are set automatically but it might be useful to allow reads to be trimmed before alignment etc. - -**bowtie2 \ :** Specify extra parameters to be passed to Bowtie 2. These parameters should be quoted to clearly delimit Bowtie 2 parameters from FastQ Screen parameters. You should not try to use this option to override the normal search or reporting options for bowtie which are set automatically but it might be useful to allow reads to be trimmed before alignment etc. - -**bwa \ :** Specify extra parameters to be passed to BWA. These parameters should be quoted to clearly delimit BWA parameters from FastQ Screen parameters. You should not try to use this option to override the normal search or reporting options for BWA which are set automatically but it might be useful to allow reads to be trimmed before alignment etc. - -**conf \ :** Manually specify a location for the configuration. - -**filter \ :** Produce a FASTQ file containing reads mapping to specified genomes. Pass the argument a string of characters (0, 1, 2, 3, -), in which each character corresponds to a reference genome (in the order the reference genome occurs in the configuration file). - -Below gives an explanation of each character. - -========== =========================================== -Character Explanation -========== =========================================== - 0 Read does not map - 1 Read maps uniquely - 2 Read multi-maps - 3 Read maps (one or more times) - 4 Passes filter 0 or filter 1 - 5 Passes filter 0 or filter 2 - \- Ignore whether a read maps to this genome -========== =========================================== - -Consider mapping to three genomes (A, B and C), the string '003' produces a file in which reads do not map to genomes A or B, but map (once or more) to genome C. The string '--1' would generate a file in which reads uniquely map to genome C. Whether reads map to genome A or B would be ignored. - -When --filter is used in conjunction with --tag, FASTQ files shall be mapped, tagged and then filtered. If the --tag option is not selected however, the input FASTQ file should have been previously tagged. - -**force :** Do not terminate if output files already exist, instead overwrite the files. - -**get_genomes :** Download pre-indexed Bowtie2 genomes for a range of commonly studied species and sequences. If used with --bisulfite, Bismark bisulfite Bowtie2 indices will be downloaded instead. - -**help :** Print program help and exit. - -**illumina1_3 :** Assume that the quality values are in encoded in Illumina v1.3 format. Defaults to Sanger format if this flag is not specified. - -**inverse :** Inverts the --filter results i.e. reads that pass the --filter parameter will not pass when --filter --inverse are specified together, and vice versa. - -**nohits :** Writes to a file the sequences that did not map to any of the specified genomes. This option is equivalent to specifying --tag --filter 0000 (number of zeros corresponds to the number of genomes screened). By default the whole input file will be mapped, unless overridden by --subset. - -**outdir \ :** Specify a directory in which to save output files. If no directory is specified then output files are saved in the current working directory. - -**pass \ :** Used in conjuction with --filter. By default all genome filters must be passed for a read to pass the --filter option. However, a minimum number of genome filters may be specified that a read needs to pass to be considered to pass the --filter option. (--pass 1 effectively acts as an OR boolean operator for the genome filters.) - -**quiet :** Suppress all progress reports on stderr and only report errors. - -**subset \ :** Don't use the whole sequence file, but create a temporary dataset of this specified number of reads. The dataset created will be of approximately (within a factor of 2) of this size. If the real dataset is smaller than twice the specified size then the whole dataset will be used. Subsets will be taken evenly from throughout the whole original dataset. By Default FastQ Screen runs with this parameter set to 100000. To process an entire dataset however, adjust --subset to 0. - -**tag :** Label each FASTQ read header with a tag listing to which genomes the read did, or did not align. The first read in the output FASTQ file will list the full genome names along with a score denoting whether the read did not align (0), aligned uniquely to the specified genome (1), or aligned more than once (2). In subsequent reads the genome names are omitted and only the score is printed, in the same order as the first line. - -This option results in the he whole file being processed unless overridden explicitly by the user with the --subset parameter - -**threads \ :** Specify across how many threads bowtie will be allowed to run. Overrides the default value set in the configuration file. - -**top \/\ :** Don't use the whole sequence file, but create a temporary dataset of the specified number of reads taken from the top of the original file. It is also possible to specify the number of lines to skip before beginning the selection e.g. --top 100000,5000000 skips the first five million reads and selects the subsequent one hundred thousand reads. While this option is usually faster than comparable --subset operations, it does not prevent biases arising from non-uniform distribution of reads in the original FastQ file. This option should only be used when minimising processing time is of highest priority. - -**version :** Print the program version and exit. - - -Terms of use -============ -FastQ Screen is distributed under a "GNU General Public License", a copy of which is distributed with the software. - - -Selected Papers citing FastQ Screen -=================================== -Picornell AC, Echavarria I, Alvarez E, et al.: Breast cancer PAM50 signature: correlation and concordance between RNA-Seq and digital multiplexed gene expression technologies in a triple negative breast cancer series. BMC Genomics. 2019; DOI: 10.1186/s12864-019-5849-0 - -Laufer BI, Hwang H, Vogel Ciernia A et al., Whole genome bisulfite sequencing of Down syndrome brain reveals regional DNA hypermethylation and novel disorder insights. Epigenetics. 2019; 14(7), 672-684; DOI:10.1080/15592294.2019.1609867 - -Chana-Muñoz A, Jendroszek Agnieszka Sønnichsen M, et al.: Origin and diversification of the plasminogen activation system among chordates. BMC evolutionary biology 2019; 19(1); DOI:10.1186/s12862-019-1353-z - -Dawidowska M, Jaksik Roman, Szarzyńska-Zawadzka B, el al.: Comprehensive Investigation of miRNome Identifies Novel Candidate miRNA-mRNA Interactions Implicated in T-Cell Acute Lymphoblastic Leukemia. Neoplasia. 2019; 21(3), 294—310; DOI:10.1016/j.neo.2019.01.004 - -Woodham EF, Paul NR, Tyrrell B, et al.: Coordination by Cdc42 of Actin, Contractility, and Adhesion for Melanoblast Movement in Mouse Skin. Curr Biol. 2017; 27(5): 624–637 - -Perrin S, Firmo C, Lemoine S, et al.: Aozan: an automated post-sequencing data-processing pipeline. Bioinformatics. 2017; 33(14): 2212–2213. - -O'Sullivan NJ, Teasdale MD, Mattiangeli V, et al.: A whole mitochondria analysis of the Tyrolean Iceman's leather provides insights into the animal sources of Copper Age clothing. Sci Rep. 2016; 6: 31279. - -Ewels P, Magnusson M, Lundin S, et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016; 32(19): 3047–3048. - -Fiddyment S, Holsinger B, Ruzzier C, et al.: Animal origin of 13th-century uterine vellum revealed using noninvasive peptide fingerprinting. Proc Natl Acad Sci U S A. 2015; 112(49): 15066–15071. - -Rose G, Wooldridge DJ, Anscombe C, et al.: Challenges of the Unknown: Clinical Application of Microbial Metagenomics. Int J Genomics. 2015; 2015: 292950. - - -How to cite FastQ Screen -======================== -FastQ Screen was published in the open access journal F1000Research. - -Wingett SW and Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control [version 2; referees: 4 approved]. F1000Research 2018, 7:1338 -(https://doi.org/10.12688/f1000research.15931.2) - - -Report problems -=============== -If you have any problems running this program you can report them on `GitHub `_. From 1fed612009f4a7c44ea5f477375b7a91556a5c0d Mon Sep 17 00:00:00 2001 From: Steven Wingett <9609778+StevenWingett@users.noreply.github.com> Date: Wed, 26 Jan 2022 10:54:20 +0000 Subject: [PATCH 24/29] Update git_actions.yml Git actions now only runs following changes to master branch. --- .github/workflows/git_actions.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/git_actions.yml b/.github/workflows/git_actions.yml index 0d0d241..c656d19 100644 --- a/.github/workflows/git_actions.yml +++ b/.github/workflows/git_actions.yml @@ -5,9 +5,9 @@ name: CI # Controls when the action will run. Triggers the workflow on push or pull request on: push: - branches: [ master, pre_release ] + branches: [ master ] pull_request: - branches: [ master, pre_release ] + branches: [ master ] # A workflow run is made up of one or more jobs that can run sequentially or in parallel jobs: From 04ea3c895ee71bf3214b82f320c3ada6371bea21 Mon Sep 17 00:00:00 2001 From: StevenWingett Date: Wed, 26 Jan 2022 11:03:10 +0000 Subject: [PATCH 25/29] Delete _config.yml with information on remote theme --- _config.yml | 3 --- 1 file changed, 3 deletions(-) delete mode 100644 _config.yml diff --git a/_config.yml b/_config.yml deleted file mode 100644 index 78392bc..0000000 --- a/_config.yml +++ /dev/null @@ -1,3 +0,0 @@ -remote_theme: carlosperate/jekyll-theme-rtd - -edit_on_github: false From e846ac0e938559c66d69d549b27f7dc99ac00399 Mon Sep 17 00:00:00 2001 From: StevenWingett Date: Wed, 26 Jan 2022 11:18:21 +0000 Subject: [PATCH 26/29] Updated RELEASE_NOTES.txt for next release --- RELEASE_NOTES.txt | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/RELEASE_NOTES.txt b/RELEASE_NOTES.txt index a99f7de..8b6bef4 100644 --- a/RELEASE_NOTES.txt +++ b/RELEASE_NOTES.txt @@ -1,3 +1,11 @@ +Release notes for FastQ Screen v0.15.2 (26 January 2022) +-------------------------------------------------------- +Updated documentation + +Provided details on the genomes downloaded with the +command --get_genomes for possible future reference. + + Release notes for FastQ Screen v0.15.1 (11 January 2022) -------------------------------------------------------- Updated contact details. From 4c6ddac0b84a8901ae5575112f8312d191fe1c55 Mon Sep 17 00:00:00 2001 From: StevenWingett Date: Wed, 26 Jan 2022 15:11:10 +0000 Subject: [PATCH 27/29] Updated documentation Now includes link to bisulfite test dataset --- docs/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/README.md b/docs/README.md index 837b4e2..4db2461 100644 --- a/docs/README.md +++ b/docs/README.md @@ -165,7 +165,9 @@ Mapping bisulfite converted sequences is possible with FastQ Screen, which uses FastQ Screen, when run in Bisulfite mode, reports to which strand the reads aligned (original top strand, complementary to original top strand, complementary to original bottom strand, or original bottom -strand). Refer to the [Bismark](http://www.bioinformatics.babraham.ac.uk/projects/bismark) documentation for more details on these bisulfite strand definitions. +strand). Refer to the [Bismark](http://www.bioinformatics.babraham.ac.uk/projects/bismark) documentation for more details on these bisulfite strand definitions. + +A bisulfite test dataset may be obtained from [here.](https://github.com/FelixKrueger/Bismark/blob/master/test_data.fastq) Filtering FastQ Files From 997e2c89d950d6a0df0c001e952aba35fe8ee0bf Mon Sep 17 00:00:00 2001 From: StevenWingett Date: Wed, 26 Jan 2022 16:48:59 +0000 Subject: [PATCH 28/29] Added information on genomes downloads --- .../fastq_screen.conf | 106 +++++ download_genomes/genomes_tree.txt | 373 ++++++++++++++++++ .../fastq_screen.conf | 127 ++++++ 3 files changed, 606 insertions(+) create mode 100644 download_genomes/bisulfite_genomes_config_file/fastq_screen.conf create mode 100644 download_genomes/genomes_tree.txt create mode 100644 download_genomes/regular_genomes_config_file/fastq_screen.conf diff --git a/download_genomes/bisulfite_genomes_config_file/fastq_screen.conf b/download_genomes/bisulfite_genomes_config_file/fastq_screen.conf new file mode 100644 index 0000000..4185c4c --- /dev/null +++ b/download_genomes/bisulfite_genomes_config_file/fastq_screen.conf @@ -0,0 +1,106 @@ +# This is a configuration file for fastq_screen + +############ +## Bowtie2 # +############ +## If the Bowtie2 binary is not in your PATH then you can +## set this value to tell the program where to find it. +## Uncomment the line below and set the appropriate location + +#BOWTIE2 /bi/apps/bowtie2/2.3.2/bowtie2 + + +############ +## BISMARK # +############ +## If the Bismark binary is not in your PATH then you can +## set this value to tell the program where to find it. +## Uncomment the line below and set the appropriate location + +#BISMARK /bi/apps/bismark/bismark + + +############ +## Threads # +############ +## Bowtie can be made to run across multiple CPU cores to +## speed up your searches. Set this value to the number +## of cores you want to use for your searches. + +THREADS 7 + +############## +## Databases # +############## +## This section allows you to configure multiple databases +## to search against in your screen. For each database +## you need to provide a database name (which can't contain +## spaces) and the location of the bowtie indices which +## you created for that database. +## +## The default entries shown below are only suggested examples +## you can add as many DATABASE sections as you like, and you +## can comment out or remove as many of the existing entries +## as you like. + + + +######### +## Human - sequences available from +## ftp://ftp.ensembl.org/pub/current/fasta/homo_sapiens/dna/ +DATABASE Human [FastQ_Screen_Genomes_Path]/Human/GRCh38 + + + +######### +## Mouse - sequence available from +## ftp://ftp.ensembl.org/pub/current/fasta/mus_musculus/dna/ +DATABASE Mouse [FastQ_Screen_Genomes_Path]/Mouse/GRCm38 + + + +######### +## Rat - sequence available from +## ftp://ftp.ensembl.org/pub/current/fasta/rattus_norvegicus/dna/ +DATABASE Rat [FastQ_Screen_Genomes_Path]/Rat/Rnor_6.0 + + + +############ +# Drosophila +DATABASE Drosophila [FastQ_Screen_Genomes_Path]/Drosophila_melanogaster/BDGP6 + + + +######### +## Worm +DATABASE Worm [FastQ_Screen_Genomes_Path]/C_elegans/WBcel235 + + + +######### +## Yeast - sequence available from +## ftp://ftp.ensembl.org/pub/current/fasta/saccharomyces_cerevisiae/dna/ +DATABASE Yeast [FastQ_Screen_Genomes_Path]/Yeast/R64-1-1 + + + +######### +## Arabidopsis - sequences available from +DATABASE Arabidopsis [FastQ_Screen_Genomes_Path]/Arabidopsis/TAIR10 + + + +######### +## Ecoli +## Sequence available from EMBL accession U00096.2 +DATABASE Ecoli [FastQ_Screen_Genomes_Path]/E_coli/NC_010473 + + + + +######## +## PhiX - sequence available from Refseq accession NC_001422.1 +DATABASE PhiX [FastQ_Screen_Genomes_Path]/PhiX/phiX174_plus_SNPs + + diff --git a/download_genomes/genomes_tree.txt b/download_genomes/genomes_tree.txt new file mode 100644 index 0000000..9ae8da6 --- /dev/null +++ b/download_genomes/genomes_tree.txt @@ -0,0 +1,373 @@ +. +├── FastQ_Screen_Genomes +│   ├── Adapters +│   │   ├── Contaminants.1.bt2 +│   │   ├── Contaminants.2.bt2 +│   │   ├── Contaminants.3.bt2 +│   │   ├── Contaminants.4.bt2 +│   │   ├── Contaminants.rev.1.bt2 +│   │   └── Contaminants.rev.2.bt2 +│   ├── Arabidopsis +│   │   ├── Arabidopsis_thaliana.TAIR10.1.bt2 +│   │   ├── Arabidopsis_thaliana.TAIR10.2.bt2 +│   │   ├── Arabidopsis_thaliana.TAIR10.3.bt2 +│   │   ├── Arabidopsis_thaliana.TAIR10.4.bt2 +│   │   ├── Arabidopsis_thaliana.TAIR10.rev.1.bt2 +│   │   └── Arabidopsis_thaliana.TAIR10.rev.2.bt2 +│   ├── Drosophila +│   │   ├── BDGP6.1.bt2 +│   │   ├── BDGP6.2.bt2 +│   │   ├── BDGP6.3.bt2 +│   │   ├── BDGP6.4.bt2 +│   │   ├── BDGP6.rev.1.bt2 +│   │   └── BDGP6.rev.2.bt2 +│   ├── E_coli +│   │   ├── Ecoli.1.bt2 +│   │   ├── Ecoli.2.bt2 +│   │   ├── Ecoli.3.bt2 +│   │   ├── Ecoli.4.bt2 +│   │   ├── Ecoli.rev.1.bt2 +│   │   └── Ecoli.rev.2.bt2 +│   ├── fastq_screen.conf +│   ├── Human +│   │   ├── Homo_sapiens.GRCh38.1.bt2 +│   │   ├── Homo_sapiens.GRCh38.2.bt2 +│   │   ├── Homo_sapiens.GRCh38.3.bt2 +│   │   ├── Homo_sapiens.GRCh38.4.bt2 +│   │   ├── Homo_sapiens.GRCh38.rev.1.bt2 +│   │   └── Homo_sapiens.GRCh38.rev.2.bt2 +│   ├── Lambda +│   │   ├── Lambda.1.bt2 +│   │   ├── Lambda.2.bt2 +│   │   ├── Lambda.3.bt2 +│   │   ├── Lambda.4.bt2 +│   │   ├── Lambda.rev.1.bt2 +│   │   └── Lambda.rev.2.bt2 +│   ├── Mitochondria +│   │   ├── mitochondria.1.bt2 +│   │   ├── mitochondria.2.bt2 +│   │   ├── mitochondria.3.bt2 +│   │   ├── mitochondria.4.bt2 +│   │   ├── mitochondria.rev.1.bt2 +│   │   └── mitochondria.rev.2.bt2 +│   ├── Mouse +│   │   ├── Mus_musculus.GRCm38.1.bt2 +│   │   ├── Mus_musculus.GRCm38.2.bt2 +│   │   ├── Mus_musculus.GRCm38.3.bt2 +│   │   ├── Mus_musculus.GRCm38.4.bt2 +│   │   ├── Mus_musculus.GRCm38.rev.1.bt2 +│   │   └── Mus_musculus.GRCm38.rev.2.bt2 +│   ├── PhiX +│   │   ├── phi_plus_SNPs.1.bt2 +│   │   ├── phi_plus_SNPs.2.bt2 +│   │   ├── phi_plus_SNPs.3.bt2 +│   │   ├── phi_plus_SNPs.4.bt2 +│   │   ├── phi_plus_SNPs.rev.1.bt2 +│   │   └── phi_plus_SNPs.rev.2.bt2 +│   ├── Rat +│   │   ├── Rnor_6.0.1.bt2 +│   │   ├── Rnor_6.0.2.bt2 +│   │   ├── Rnor_6.0.3.bt2 +│   │   ├── Rnor_6.0.4.bt2 +│   │   ├── Rnor_6.0.rev.1.bt2 +│   │   └── Rnor_6.0.rev.2.bt2 +│   ├── rRNA +│   │   ├── GRCm38_rRNA.1.bt2 +│   │   ├── GRCm38_rRNA.2.bt2 +│   │   ├── GRCm38_rRNA.3.bt2 +│   │   ├── GRCm38_rRNA.4.bt2 +│   │   ├── GRCm38_rRNA.fa +│   │   ├── GRCm38_rRNA.rev.1.bt2 +│   │   └── GRCm38_rRNA.rev.2.bt2 +│   ├── Vectors +│   │   ├── Vectors.1.bt2 +│   │   ├── Vectors.2.bt2 +│   │   ├── Vectors.3.bt2 +│   │   ├── Vectors.4.bt2 +│   │   ├── Vectors.rev.1.bt2 +│   │   └── Vectors.rev.2.bt2 +│   ├── Worm +│   │   ├── Caenorhabditis_elegans.WBcel235.1.bt2 +│   │   ├── Caenorhabditis_elegans.WBcel235.2.bt2 +│   │   ├── Caenorhabditis_elegans.WBcel235.3.bt2 +│   │   ├── Caenorhabditis_elegans.WBcel235.4.bt2 +│   │   ├── Caenorhabditis_elegans.WBcel235.rev.1.bt2 +│   │   └── Caenorhabditis_elegans.WBcel235.rev.2.bt2 +│   └── Yeast +│   ├── Saccharomyces_cerevisiae.R64-1-1.1.bt2 +│   ├── Saccharomyces_cerevisiae.R64-1-1.2.bt2 +│   ├── Saccharomyces_cerevisiae.R64-1-1.3.bt2 +│   ├── Saccharomyces_cerevisiae.R64-1-1.4.bt2 +│   ├── Saccharomyces_cerevisiae.R64-1-1.rev.1.bt2 +│   └── Saccharomyces_cerevisiae.R64-1-1.rev.2.bt2 +├── FastQ_Screen_Genomes_Bisulfite +│   ├── Arabidopsis +│   │   └── TAIR10 +│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.1.fa.gz +│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.2.fa.gz +│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.3.fa.gz +│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.4.fa.gz +│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.5.fa.gz +│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.Mt.fa.gz +│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.Pt.fa.gz +│   │   └── Bisulfite_Genome +│   │   ├── CT_conversion +│   │   │   ├── BS_CT.1.bt2 +│   │   │   ├── BS_CT.2.bt2 +│   │   │   ├── BS_CT.3.bt2 +│   │   │   ├── BS_CT.4.bt2 +│   │   │   ├── BS_CT.rev.1.bt2 +│   │   │   └── BS_CT.rev.2.bt2 +│   │   └── GA_conversion +│   │   ├── BS_GA.1.bt2 +│   │   ├── BS_GA.2.bt2 +│   │   ├── BS_GA.3.bt2 +│   │   ├── BS_GA.4.bt2 +│   │   ├── BS_GA.rev.1.bt2 +│   │   └── BS_GA.rev.2.bt2 +│   ├── C_elegans +│   │   └── WBcel235 +│   │   ├── Bisulfite_Genome +│   │   │   ├── CT_conversion +│   │   │   │   ├── BS_CT.1.bt2 +│   │   │   │   ├── BS_CT.2.bt2 +│   │   │   │   ├── BS_CT.3.bt2 +│   │   │   │   ├── BS_CT.4.bt2 +│   │   │   │   ├── BS_CT.rev.1.bt2 +│   │   │   │   └── BS_CT.rev.2.bt2 +│   │   │   └── GA_conversion +│   │   │   ├── BS_GA.1.bt2 +│   │   │   ├── BS_GA.2.bt2 +│   │   │   ├── BS_GA.3.bt2 +│   │   │   ├── BS_GA.4.bt2 +│   │   │   ├── BS_GA.rev.1.bt2 +│   │   │   └── BS_GA.rev.2.bt2 +│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.I.fa.gz +│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.II.fa.gz +│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.III.fa.gz +│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.IV.fa.gz +│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.MtDNA.fa.gz +│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.V.fa.gz +│   │   └── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.X.fa.gz +│   ├── Drosophila_melanogaster +│   │   └── BDGP6 +│   │   ├── Bisulfite_Genome +│   │   │   ├── CT_conversion +│   │   │   │   ├── BS_CT.1.bt2 +│   │   │   │   ├── BS_CT.2.bt2 +│   │   │   │   ├── BS_CT.3.bt2 +│   │   │   │   ├── BS_CT.4.bt2 +│   │   │   │   ├── BS_CT.rev.1.bt2 +│   │   │   │   └── BS_CT.rev.2.bt2 +│   │   │   └── GA_conversion +│   │   │   ├── BS_GA.1.bt2 +│   │   │   ├── BS_GA.2.bt2 +│   │   │   ├── BS_GA.3.bt2 +│   │   │   ├── BS_GA.4.bt2 +│   │   │   ├── BS_GA.rev.1.bt2 +│   │   │   └── BS_GA.rev.2.bt2 +│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.2L.fa.gz +│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.2R.fa.gz +│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.3L.fa.gz +│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.3R.fa.gz +│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.4.fa.gz +│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.dmel_mitochondrion_genome.fa.gz +│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.X.fa.gz +│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.Y.fa.gz +│   │   └── Drosophila_melanogaster.BDGP6.dna.nonchromosomal.fa.gz +│   ├── E_coli +│   │   └── NC_010473 +│   │   ├── Bisulfite_Genome +│   │   │   ├── CT_conversion +│   │   │   │   ├── BS_CT.1.bt2 +│   │   │   │   ├── BS_CT.2.bt2 +│   │   │   │   ├── BS_CT.3.bt2 +│   │   │   │   ├── BS_CT.4.bt2 +│   │   │   │   ├── BS_CT.rev.1.bt2 +│   │   │   │   └── BS_CT.rev.2.bt2 +│   │   │   └── GA_conversion +│   │   │   ├── BS_GA.1.bt2 +│   │   │   ├── BS_GA.2.bt2 +│   │   │   ├── BS_GA.3.bt2 +│   │   │   ├── BS_GA.4.bt2 +│   │   │   ├── BS_GA.rev.1.bt2 +│   │   │   └── BS_GA.rev.2.bt2 +│   │   └── NC_010473.fa.gz +│   ├── fastq_screen.conf +│   ├── Human +│   │   └── GRCh38 +│   │   ├── Bisulfite_Genome +│   │   │   ├── CT_conversion +│   │   │   │   ├── BS_CT.1.bt2 +│   │   │   │   ├── BS_CT.2.bt2 +│   │   │   │   ├── BS_CT.3.bt2 +│   │   │   │   ├── BS_CT.4.bt2 +│   │   │   │   ├── BS_CT.rev.1.bt2 +│   │   │   │   └── BS_CT.rev.2.bt2 +│   │   │   └── GA_conversion +│   │   │   ├── BS_GA.1.bt2 +│   │   │   ├── BS_GA.2.bt2 +│   │   │   ├── BS_GA.3.bt2 +│   │   │   ├── BS_GA.4.bt2 +│   │   │   ├── BS_GA.rev.1.bt2 +│   │   │   └── BS_GA.rev.2.bt2 +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.10.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.11.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.12.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.13.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.14.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.15.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.16.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.17.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.18.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.1.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.20.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.22.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.2.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.3.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.4.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.5.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.6.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.7.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.8.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.9.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.MT.fa.gz +│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.X.fa.gz +│   │   └── Homo_sapiens.GRCh38.dna.chromosome.Y.fa.gz +│   ├── Mouse +│   │   └── GRCm38 +│   │   ├── Bisulfite_Genome +│   │   │   ├── CT_conversion +│   │   │   │   ├── BS_CT.1.bt2 +│   │   │   │   ├── BS_CT.2.bt2 +│   │   │   │   ├── BS_CT.3.bt2 +│   │   │   │   ├── BS_CT.4.bt2 +│   │   │   │   ├── BS_CT.rev.1.bt2 +│   │   │   │   └── BS_CT.rev.2.bt2 +│   │   │   └── GA_conversion +│   │   │   ├── BS_GA.1.bt2 +│   │   │   ├── BS_GA.2.bt2 +│   │   │   ├── BS_GA.3.bt2 +│   │   │   ├── BS_GA.4.bt2 +│   │   │   ├── BS_GA.rev.1.bt2 +│   │   │   └── BS_GA.rev.2.bt2 +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.10.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.11.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.12.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.13.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.14.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.15.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.16.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.17.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.18.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.19.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.1.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.2.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.3.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.4.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.5.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.6.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.7.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.8.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.9.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.MT.fa.gz +│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.X.fa.gz +│   │   └── Mus_musculus.GRCm38.68.dna.chromosome.Y.fa.gz +│   ├── PhiX +│   │   └── phiX174_plus_SNPs +│   │   ├── Bisulfite_Genome +│   │   │   ├── CT_conversion +│   │   │   │   ├── BS_CT.1.bt2 +│   │   │   │   ├── BS_CT.2.bt2 +│   │   │   │   ├── BS_CT.3.bt2 +│   │   │   │   ├── BS_CT.4.bt2 +│   │   │   │   ├── BS_CT.rev.1.bt2 +│   │   │   │   └── BS_CT.rev.2.bt2 +│   │   │   └── GA_conversion +│   │   │   ├── BS_GA.1.bt2 +│   │   │   ├── BS_GA.2.bt2 +│   │   │   ├── BS_GA.3.bt2 +│   │   │   ├── BS_GA.4.bt2 +│   │   │   ├── BS_GA.rev.1.bt2 +│   │   │   └── BS_GA.rev.2.bt2 +│   │   └── phi_plus_SNPs.fa.gz +│   ├── Rat +│   │   └── Rnor_6.0 +│   │   ├── Bisulfite_Genome +│   │   │   ├── CT_conversion +│   │   │   │   ├── BS_CT.1.bt2 +│   │   │   │   ├── BS_CT.2.bt2 +│   │   │   │   ├── BS_CT.3.bt2 +│   │   │   │   ├── BS_CT.4.bt2 +│   │   │   │   ├── BS_CT.rev.1.bt2 +│   │   │   │   └── BS_CT.rev.2.bt2 +│   │   │   └── GA_conversion +│   │   │   ├── BS_GA.1.bt2 +│   │   │   ├── BS_GA.2.bt2 +│   │   │   ├── BS_GA.3.bt2 +│   │   │   ├── BS_GA.4.bt2 +│   │   │   ├── BS_GA.rev.1.bt2 +│   │   │   └── BS_GA.rev.2.bt2 +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.10.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.11.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.12.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.13.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.14.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.15.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.16.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.17.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.18.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.19.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.1.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.20.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.2.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.3.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.4.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.5.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.6.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.7.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.8.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.9.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.MT.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.X.fa.gz +│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.Y.fa.gz +│   │   └── Rattus_norvegicus.Rnor_6.0.dna.nonchromosomal.fa.gz +│   └── Yeast +│   └── R64-1-1 +│   ├── Bisulfite_Genome +│   │   ├── CT_conversion +│   │   │   ├── BS_CT.1.bt2 +│   │   │   ├── BS_CT.2.bt2 +│   │   │   ├── BS_CT.3.bt2 +│   │   │   ├── BS_CT.4.bt2 +│   │   │   ├── BS_CT.rev.1.bt2 +│   │   │   └── BS_CT.rev.2.bt2 +│   │   └── GA_conversion +│   │   ├── BS_GA.1.bt2 +│   │   ├── BS_GA.2.bt2 +│   │   ├── BS_GA.3.bt2 +│   │   ├── BS_GA.4.bt2 +│   │   ├── BS_GA.rev.1.bt2 +│   │   └── BS_GA.rev.2.bt2 +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.I.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.II.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.III.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.IV.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.IX.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.Mito.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.V.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.VI.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.VII.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.VIII.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.X.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XI.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XII.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XIII.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XIV.fa.gz +│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XV.fa.gz +│   └── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XVI.fa.gz +└── genomes_tree.txt + +61 directories, 309 files diff --git a/download_genomes/regular_genomes_config_file/fastq_screen.conf b/download_genomes/regular_genomes_config_file/fastq_screen.conf new file mode 100644 index 0000000..1bdf601 --- /dev/null +++ b/download_genomes/regular_genomes_config_file/fastq_screen.conf @@ -0,0 +1,127 @@ +# This is a configuration file for fastq_screen + +########### +## Bowtie # +########### +## If the bowtie binary is not in your PATH then you can +## set this value to tell the program where to find it. +## Uncomment the line below and set the appropriate location +## + +#BOWTIE /usr/local/bin/bowtie/bowtie +#BOWTIE2 /bi/apps/bowtie2/2.3.2/bowtie2 + + +############ +## Threads # +############ +## Bowtie can be made to run across multiple CPU cores to +## speed up your searches. Set this value to the number +## of cores you want to use for your searches. + +THREADS 7 + +############## +## Databases # +############## +## This section allows you to configure multiple databases +## to search against in your screen. For each database +## you need to provide a database name (which can't contain +## spaces) and the location of the bowtie indices which +## you created for that database. +## +## The default entries shown below are only suggested examples +## you can add as many DATABASE sections as you like, and you +## can comment out or remove as many of the existing entries +## as you like. + + + +######### +## Human - sequences available from +## ftp://ftp.ensembl.org/pub/current/fasta/homo_sapiens/dna/ +DATABASE Human [FastQ_Screen_Genomes_Path]/Human/Homo_sapiens.GRCh38 + + + +######### +## Mouse - sequence available from +## ftp://ftp.ensembl.org/pub/current/fasta/mus_musculus/dna/ +DATABASE Mouse [FastQ_Screen_Genomes_Path]/Mouse/Mus_musculus.GRCm38 + + + +######### +## Rat - sequence available from +## ftp://ftp.ensembl.org/pub/current/fasta/rattus_norvegicus/dna/ +DATABASE Rat [FastQ_Screen_Genomes_Path]/Rat/Rnor_6.0 + + + +############ +# Drosophila +DATABASE Drosophila [FastQ_Screen_Genomes_Path]/Drosophila/BDGP6 + + + +######### +## Worm +DATABASE Worm [FastQ_Screen_Genomes_Path]/Worm/Caenorhabditis_elegans.WBcel235 + + + +######### +## Yeast - sequence available from +## ftp://ftp.ensembl.org/pub/current/fasta/saccharomyces_cerevisiae/dna/ +DATABASE Yeast [FastQ_Screen_Genomes_Path]/Yeast/Saccharomyces_cerevisiae.R64-1-1 + + + +######### +## Arabidopsis - sequences available from +DATABASE Arabidopsis [FastQ_Screen_Genomes_Path]/Arabidopsis/Arabidopsis_thaliana.TAIR10 + + + +######### +## Ecoli +## Sequence available from EMBL accession U00096.2 +DATABASE Ecoli [FastQ_Screen_Genomes_Path]/E_coli/Ecoli + + + +########## +##rRNA - In house custom database +DATABASE rRNA [FastQ_Screen_Genomes_Path]/rRNA/GRCm38_rRNA + + + +############## +# Mitochondria +DATABASE MT [FastQ_Screen_Genomes_Path]/Mitochondria/mitochondria + + + +######## +## PhiX - sequence available from Refseq accession NC_001422.1 +DATABASE PhiX [FastQ_Screen_Genomes_Path]/PhiX/phi_plus_SNPs + + + +############## +# Lambda +DATABASE Lambda [FastQ_Screen_Genomes_Path]/Lambda/Lambda + + + +########## +## Vector - Sequence taken from the UniVec database +## http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html +DATABASE Vectors [FastQ_Screen_Genomes_Path]/Vectors/Vectors + + + +############ +## Adapters - sequence derived from the FastQC contaminats file +## www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ +DATABASE Adapters [FastQ_Screen_Genomes_Path]/Adapters/Contaminants From af1dc394fc6b83ae8923b230d3506f34db939bdc Mon Sep 17 00:00:00 2001 From: StevenWingett Date: Wed, 26 Jan 2022 16:53:07 +0000 Subject: [PATCH 29/29] Removed file genomes_tree.txt inclusion in the tree - i.e. the file previously referred to itself. --- download_genomes/genomes_tree.txt | 539 +++++++++++++++--------------- 1 file changed, 269 insertions(+), 270 deletions(-) diff --git a/download_genomes/genomes_tree.txt b/download_genomes/genomes_tree.txt index 9ae8da6..eed04b5 100644 --- a/download_genomes/genomes_tree.txt +++ b/download_genomes/genomes_tree.txt @@ -100,274 +100,273 @@ │   ├── Saccharomyces_cerevisiae.R64-1-1.4.bt2 │   ├── Saccharomyces_cerevisiae.R64-1-1.rev.1.bt2 │   └── Saccharomyces_cerevisiae.R64-1-1.rev.2.bt2 -├── FastQ_Screen_Genomes_Bisulfite -│   ├── Arabidopsis -│   │   └── TAIR10 -│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.1.fa.gz -│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.2.fa.gz -│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.3.fa.gz -│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.4.fa.gz -│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.5.fa.gz -│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.Mt.fa.gz -│   │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.Pt.fa.gz -│   │   └── Bisulfite_Genome -│   │   ├── CT_conversion -│   │   │   ├── BS_CT.1.bt2 -│   │   │   ├── BS_CT.2.bt2 -│   │   │   ├── BS_CT.3.bt2 -│   │   │   ├── BS_CT.4.bt2 -│   │   │   ├── BS_CT.rev.1.bt2 -│   │   │   └── BS_CT.rev.2.bt2 -│   │   └── GA_conversion -│   │   ├── BS_GA.1.bt2 -│   │   ├── BS_GA.2.bt2 -│   │   ├── BS_GA.3.bt2 -│   │   ├── BS_GA.4.bt2 -│   │   ├── BS_GA.rev.1.bt2 -│   │   └── BS_GA.rev.2.bt2 -│   ├── C_elegans -│   │   └── WBcel235 -│   │   ├── Bisulfite_Genome -│   │   │   ├── CT_conversion -│   │   │   │   ├── BS_CT.1.bt2 -│   │   │   │   ├── BS_CT.2.bt2 -│   │   │   │   ├── BS_CT.3.bt2 -│   │   │   │   ├── BS_CT.4.bt2 -│   │   │   │   ├── BS_CT.rev.1.bt2 -│   │   │   │   └── BS_CT.rev.2.bt2 -│   │   │   └── GA_conversion -│   │   │   ├── BS_GA.1.bt2 -│   │   │   ├── BS_GA.2.bt2 -│   │   │   ├── BS_GA.3.bt2 -│   │   │   ├── BS_GA.4.bt2 -│   │   │   ├── BS_GA.rev.1.bt2 -│   │   │   └── BS_GA.rev.2.bt2 -│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.I.fa.gz -│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.II.fa.gz -│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.III.fa.gz -│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.IV.fa.gz -│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.MtDNA.fa.gz -│   │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.V.fa.gz -│   │   └── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.X.fa.gz -│   ├── Drosophila_melanogaster -│   │   └── BDGP6 -│   │   ├── Bisulfite_Genome -│   │   │   ├── CT_conversion -│   │   │   │   ├── BS_CT.1.bt2 -│   │   │   │   ├── BS_CT.2.bt2 -│   │   │   │   ├── BS_CT.3.bt2 -│   │   │   │   ├── BS_CT.4.bt2 -│   │   │   │   ├── BS_CT.rev.1.bt2 -│   │   │   │   └── BS_CT.rev.2.bt2 -│   │   │   └── GA_conversion -│   │   │   ├── BS_GA.1.bt2 -│   │   │   ├── BS_GA.2.bt2 -│   │   │   ├── BS_GA.3.bt2 -│   │   │   ├── BS_GA.4.bt2 -│   │   │   ├── BS_GA.rev.1.bt2 -│   │   │   └── BS_GA.rev.2.bt2 -│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.2L.fa.gz -│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.2R.fa.gz -│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.3L.fa.gz -│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.3R.fa.gz -│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.4.fa.gz -│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.dmel_mitochondrion_genome.fa.gz -│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.X.fa.gz -│   │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.Y.fa.gz -│   │   └── Drosophila_melanogaster.BDGP6.dna.nonchromosomal.fa.gz -│   ├── E_coli -│   │   └── NC_010473 -│   │   ├── Bisulfite_Genome -│   │   │   ├── CT_conversion -│   │   │   │   ├── BS_CT.1.bt2 -│   │   │   │   ├── BS_CT.2.bt2 -│   │   │   │   ├── BS_CT.3.bt2 -│   │   │   │   ├── BS_CT.4.bt2 -│   │   │   │   ├── BS_CT.rev.1.bt2 -│   │   │   │   └── BS_CT.rev.2.bt2 -│   │   │   └── GA_conversion -│   │   │   ├── BS_GA.1.bt2 -│   │   │   ├── BS_GA.2.bt2 -│   │   │   ├── BS_GA.3.bt2 -│   │   │   ├── BS_GA.4.bt2 -│   │   │   ├── BS_GA.rev.1.bt2 -│   │   │   └── BS_GA.rev.2.bt2 -│   │   └── NC_010473.fa.gz -│   ├── fastq_screen.conf -│   ├── Human -│   │   └── GRCh38 -│   │   ├── Bisulfite_Genome -│   │   │   ├── CT_conversion -│   │   │   │   ├── BS_CT.1.bt2 -│   │   │   │   ├── BS_CT.2.bt2 -│   │   │   │   ├── BS_CT.3.bt2 -│   │   │   │   ├── BS_CT.4.bt2 -│   │   │   │   ├── BS_CT.rev.1.bt2 -│   │   │   │   └── BS_CT.rev.2.bt2 -│   │   │   └── GA_conversion -│   │   │   ├── BS_GA.1.bt2 -│   │   │   ├── BS_GA.2.bt2 -│   │   │   ├── BS_GA.3.bt2 -│   │   │   ├── BS_GA.4.bt2 -│   │   │   ├── BS_GA.rev.1.bt2 -│   │   │   └── BS_GA.rev.2.bt2 -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.10.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.11.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.12.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.13.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.14.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.15.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.16.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.17.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.18.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.1.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.20.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.22.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.2.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.3.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.4.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.5.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.6.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.7.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.8.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.9.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.MT.fa.gz -│   │   ├── Homo_sapiens.GRCh38.dna.chromosome.X.fa.gz -│   │   └── Homo_sapiens.GRCh38.dna.chromosome.Y.fa.gz -│   ├── Mouse -│   │   └── GRCm38 -│   │   ├── Bisulfite_Genome -│   │   │   ├── CT_conversion -│   │   │   │   ├── BS_CT.1.bt2 -│   │   │   │   ├── BS_CT.2.bt2 -│   │   │   │   ├── BS_CT.3.bt2 -│   │   │   │   ├── BS_CT.4.bt2 -│   │   │   │   ├── BS_CT.rev.1.bt2 -│   │   │   │   └── BS_CT.rev.2.bt2 -│   │   │   └── GA_conversion -│   │   │   ├── BS_GA.1.bt2 -│   │   │   ├── BS_GA.2.bt2 -│   │   │   ├── BS_GA.3.bt2 -│   │   │   ├── BS_GA.4.bt2 -│   │   │   ├── BS_GA.rev.1.bt2 -│   │   │   └── BS_GA.rev.2.bt2 -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.10.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.11.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.12.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.13.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.14.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.15.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.16.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.17.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.18.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.19.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.1.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.2.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.3.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.4.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.5.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.6.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.7.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.8.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.9.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.MT.fa.gz -│   │   ├── Mus_musculus.GRCm38.68.dna.chromosome.X.fa.gz -│   │   └── Mus_musculus.GRCm38.68.dna.chromosome.Y.fa.gz -│   ├── PhiX -│   │   └── phiX174_plus_SNPs -│   │   ├── Bisulfite_Genome -│   │   │   ├── CT_conversion -│   │   │   │   ├── BS_CT.1.bt2 -│   │   │   │   ├── BS_CT.2.bt2 -│   │   │   │   ├── BS_CT.3.bt2 -│   │   │   │   ├── BS_CT.4.bt2 -│   │   │   │   ├── BS_CT.rev.1.bt2 -│   │   │   │   └── BS_CT.rev.2.bt2 -│   │   │   └── GA_conversion -│   │   │   ├── BS_GA.1.bt2 -│   │   │   ├── BS_GA.2.bt2 -│   │   │   ├── BS_GA.3.bt2 -│   │   │   ├── BS_GA.4.bt2 -│   │   │   ├── BS_GA.rev.1.bt2 -│   │   │   └── BS_GA.rev.2.bt2 -│   │   └── phi_plus_SNPs.fa.gz -│   ├── Rat -│   │   └── Rnor_6.0 -│   │   ├── Bisulfite_Genome -│   │   │   ├── CT_conversion -│   │   │   │   ├── BS_CT.1.bt2 -│   │   │   │   ├── BS_CT.2.bt2 -│   │   │   │   ├── BS_CT.3.bt2 -│   │   │   │   ├── BS_CT.4.bt2 -│   │   │   │   ├── BS_CT.rev.1.bt2 -│   │   │   │   └── BS_CT.rev.2.bt2 -│   │   │   └── GA_conversion -│   │   │   ├── BS_GA.1.bt2 -│   │   │   ├── BS_GA.2.bt2 -│   │   │   ├── BS_GA.3.bt2 -│   │   │   ├── BS_GA.4.bt2 -│   │   │   ├── BS_GA.rev.1.bt2 -│   │   │   └── BS_GA.rev.2.bt2 -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.10.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.11.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.12.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.13.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.14.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.15.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.16.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.17.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.18.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.19.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.1.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.20.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.2.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.3.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.4.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.5.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.6.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.7.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.8.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.9.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.MT.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.X.fa.gz -│   │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.Y.fa.gz -│   │   └── Rattus_norvegicus.Rnor_6.0.dna.nonchromosomal.fa.gz -│   └── Yeast -│   └── R64-1-1 -│   ├── Bisulfite_Genome -│   │   ├── CT_conversion -│   │   │   ├── BS_CT.1.bt2 -│   │   │   ├── BS_CT.2.bt2 -│   │   │   ├── BS_CT.3.bt2 -│   │   │   ├── BS_CT.4.bt2 -│   │   │   ├── BS_CT.rev.1.bt2 -│   │   │   └── BS_CT.rev.2.bt2 -│   │   └── GA_conversion -│   │   ├── BS_GA.1.bt2 -│   │   ├── BS_GA.2.bt2 -│   │   ├── BS_GA.3.bt2 -│   │   ├── BS_GA.4.bt2 -│   │   ├── BS_GA.rev.1.bt2 -│   │   └── BS_GA.rev.2.bt2 -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.I.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.II.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.III.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.IV.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.IX.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.Mito.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.V.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.VI.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.VII.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.VIII.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.X.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XI.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XII.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XIII.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XIV.fa.gz -│   ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XV.fa.gz -│   └── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XVI.fa.gz -└── genomes_tree.txt +└── FastQ_Screen_Genomes_Bisulfite + ├── Arabidopsis + │   └── TAIR10 + │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.1.fa.gz + │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.2.fa.gz + │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.3.fa.gz + │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.4.fa.gz + │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.5.fa.gz + │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.Mt.fa.gz + │   ├── Arabidopsis_thaliana.TAIR10.31.dna.chromosome.Pt.fa.gz + │   └── Bisulfite_Genome + │   ├── CT_conversion + │   │   ├── BS_CT.1.bt2 + │   │   ├── BS_CT.2.bt2 + │   │   ├── BS_CT.3.bt2 + │   │   ├── BS_CT.4.bt2 + │   │   ├── BS_CT.rev.1.bt2 + │   │   └── BS_CT.rev.2.bt2 + │   └── GA_conversion + │   ├── BS_GA.1.bt2 + │   ├── BS_GA.2.bt2 + │   ├── BS_GA.3.bt2 + │   ├── BS_GA.4.bt2 + │   ├── BS_GA.rev.1.bt2 + │   └── BS_GA.rev.2.bt2 + ├── C_elegans + │   └── WBcel235 + │   ├── Bisulfite_Genome + │   │   ├── CT_conversion + │   │   │   ├── BS_CT.1.bt2 + │   │   │   ├── BS_CT.2.bt2 + │   │   │   ├── BS_CT.3.bt2 + │   │   │   ├── BS_CT.4.bt2 + │   │   │   ├── BS_CT.rev.1.bt2 + │   │   │   └── BS_CT.rev.2.bt2 + │   │   └── GA_conversion + │   │   ├── BS_GA.1.bt2 + │   │   ├── BS_GA.2.bt2 + │   │   ├── BS_GA.3.bt2 + │   │   ├── BS_GA.4.bt2 + │   │   ├── BS_GA.rev.1.bt2 + │   │   └── BS_GA.rev.2.bt2 + │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.I.fa.gz + │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.II.fa.gz + │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.III.fa.gz + │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.IV.fa.gz + │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.MtDNA.fa.gz + │   ├── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.V.fa.gz + │   └── Caenorhabditis_elegans.WBcel235.74.dna.chromosome.X.fa.gz + ├── Drosophila_melanogaster + │   └── BDGP6 + │   ├── Bisulfite_Genome + │   │   ├── CT_conversion + │   │   │   ├── BS_CT.1.bt2 + │   │   │   ├── BS_CT.2.bt2 + │   │   │   ├── BS_CT.3.bt2 + │   │   │   ├── BS_CT.4.bt2 + │   │   │   ├── BS_CT.rev.1.bt2 + │   │   │   └── BS_CT.rev.2.bt2 + │   │   └── GA_conversion + │   │   ├── BS_GA.1.bt2 + │   │   ├── BS_GA.2.bt2 + │   │   ├── BS_GA.3.bt2 + │   │   ├── BS_GA.4.bt2 + │   │   ├── BS_GA.rev.1.bt2 + │   │   └── BS_GA.rev.2.bt2 + │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.2L.fa.gz + │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.2R.fa.gz + │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.3L.fa.gz + │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.3R.fa.gz + │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.4.fa.gz + │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.dmel_mitochondrion_genome.fa.gz + │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.X.fa.gz + │   ├── Drosophila_melanogaster.BDGP6.dna.chromosome.Y.fa.gz + │   └── Drosophila_melanogaster.BDGP6.dna.nonchromosomal.fa.gz + ├── E_coli + │   └── NC_010473 + │   ├── Bisulfite_Genome + │   │   ├── CT_conversion + │   │   │   ├── BS_CT.1.bt2 + │   │   │   ├── BS_CT.2.bt2 + │   │   │   ├── BS_CT.3.bt2 + │   │   │   ├── BS_CT.4.bt2 + │   │   │   ├── BS_CT.rev.1.bt2 + │   │   │   └── BS_CT.rev.2.bt2 + │   │   └── GA_conversion + │   │   ├── BS_GA.1.bt2 + │   │   ├── BS_GA.2.bt2 + │   │   ├── BS_GA.3.bt2 + │   │   ├── BS_GA.4.bt2 + │   │   ├── BS_GA.rev.1.bt2 + │   │   └── BS_GA.rev.2.bt2 + │   └── NC_010473.fa.gz + ├── fastq_screen.conf + ├── Human + │   └── GRCh38 + │   ├── Bisulfite_Genome + │   │   ├── CT_conversion + │   │   │   ├── BS_CT.1.bt2 + │   │   │   ├── BS_CT.2.bt2 + │   │   │   ├── BS_CT.3.bt2 + │   │   │   ├── BS_CT.4.bt2 + │   │   │   ├── BS_CT.rev.1.bt2 + │   │   │   └── BS_CT.rev.2.bt2 + │   │   └── GA_conversion + │   │   ├── BS_GA.1.bt2 + │   │   ├── BS_GA.2.bt2 + │   │   ├── BS_GA.3.bt2 + │   │   ├── BS_GA.4.bt2 + │   │   ├── BS_GA.rev.1.bt2 + │   │   └── BS_GA.rev.2.bt2 + │   ├── Homo_sapiens.GRCh38.dna.chromosome.10.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.11.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.12.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.13.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.14.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.15.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.16.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.17.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.18.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.1.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.20.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.22.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.2.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.3.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.4.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.5.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.6.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.7.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.8.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.9.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.MT.fa.gz + │   ├── Homo_sapiens.GRCh38.dna.chromosome.X.fa.gz + │   └── Homo_sapiens.GRCh38.dna.chromosome.Y.fa.gz + ├── Mouse + │   └── GRCm38 + │   ├── Bisulfite_Genome + │   │   ├── CT_conversion + │   │   │   ├── BS_CT.1.bt2 + │   │   │   ├── BS_CT.2.bt2 + │   │   │   ├── BS_CT.3.bt2 + │   │   │   ├── BS_CT.4.bt2 + │   │   │   ├── BS_CT.rev.1.bt2 + │   │   │   └── BS_CT.rev.2.bt2 + │   │   └── GA_conversion + │   │   ├── BS_GA.1.bt2 + │   │   ├── BS_GA.2.bt2 + │   │   ├── BS_GA.3.bt2 + │   │   ├── BS_GA.4.bt2 + │   │   ├── BS_GA.rev.1.bt2 + │   │   └── BS_GA.rev.2.bt2 + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.10.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.11.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.12.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.13.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.14.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.15.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.16.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.17.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.18.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.19.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.1.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.2.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.3.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.4.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.5.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.6.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.7.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.8.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.9.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.MT.fa.gz + │   ├── Mus_musculus.GRCm38.68.dna.chromosome.X.fa.gz + │   └── Mus_musculus.GRCm38.68.dna.chromosome.Y.fa.gz + ├── PhiX + │   └── phiX174_plus_SNPs + │   ├── Bisulfite_Genome + │   │   ├── CT_conversion + │   │   │   ├── BS_CT.1.bt2 + │   │   │   ├── BS_CT.2.bt2 + │   │   │   ├── BS_CT.3.bt2 + │   │   │   ├── BS_CT.4.bt2 + │   │   │   ├── BS_CT.rev.1.bt2 + │   │   │   └── BS_CT.rev.2.bt2 + │   │   └── GA_conversion + │   │   ├── BS_GA.1.bt2 + │   │   ├── BS_GA.2.bt2 + │   │   ├── BS_GA.3.bt2 + │   │   ├── BS_GA.4.bt2 + │   │   ├── BS_GA.rev.1.bt2 + │   │   └── BS_GA.rev.2.bt2 + │   └── phi_plus_SNPs.fa.gz + ├── Rat + │   └── Rnor_6.0 + │   ├── Bisulfite_Genome + │   │   ├── CT_conversion + │   │   │   ├── BS_CT.1.bt2 + │   │   │   ├── BS_CT.2.bt2 + │   │   │   ├── BS_CT.3.bt2 + │   │   │   ├── BS_CT.4.bt2 + │   │   │   ├── BS_CT.rev.1.bt2 + │   │   │   └── BS_CT.rev.2.bt2 + │   │   └── GA_conversion + │   │   ├── BS_GA.1.bt2 + │   │   ├── BS_GA.2.bt2 + │   │   ├── BS_GA.3.bt2 + │   │   ├── BS_GA.4.bt2 + │   │   ├── BS_GA.rev.1.bt2 + │   │   └── BS_GA.rev.2.bt2 + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.10.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.11.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.12.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.13.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.14.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.15.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.16.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.17.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.18.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.19.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.1.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.20.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.2.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.3.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.4.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.5.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.6.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.7.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.8.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.9.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.MT.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.X.fa.gz + │   ├── Rattus_norvegicus.Rnor_6.0.dna.chromosome.Y.fa.gz + │   └── Rattus_norvegicus.Rnor_6.0.dna.nonchromosomal.fa.gz + └── Yeast + └── R64-1-1 + ├── Bisulfite_Genome + │   ├── CT_conversion + │   │   ├── BS_CT.1.bt2 + │   │   ├── BS_CT.2.bt2 + │   │   ├── BS_CT.3.bt2 + │   │   ├── BS_CT.4.bt2 + │   │   ├── BS_CT.rev.1.bt2 + │   │   └── BS_CT.rev.2.bt2 + │   └── GA_conversion + │   ├── BS_GA.1.bt2 + │   ├── BS_GA.2.bt2 + │   ├── BS_GA.3.bt2 + │   ├── BS_GA.4.bt2 + │   ├── BS_GA.rev.1.bt2 + │   └── BS_GA.rev.2.bt2 + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.I.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.II.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.III.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.IV.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.IX.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.Mito.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.V.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.VI.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.VII.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.VIII.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.X.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XI.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XII.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XIII.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XIV.fa.gz + ├── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XV.fa.gz + └── Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.XVI.fa.gz -61 directories, 309 files +61 directories, 308 files