-
Notifications
You must be signed in to change notification settings - Fork 51
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updating README file to reflect changes in 2.0. More fixes for handli…
…ng inputs correctly. It seems like in some cases we lost some globbed paths because the paths went out of scope. I've now forced them to be shared ptrs which ensures they stick around until the input handler goes out of scope. This seems to fix the issue.
- Loading branch information
Daniel Mapleson
committed
Sep 24, 2015
1 parent
ad35386
commit 13faa79
Showing
12 changed files
with
48 additions
and
70 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,12 @@ | ||
|
||
#KAT - The K-mer Analysis Toolkit | ||
|
||
KAT is a suite of tools that analyse jellyfish kmer hashes. The following tools are currently available in KAT: | ||
KAT is a suite of tools that analyse jellyfish hashes or sequence files (fasta or fastq) using kmer counts. The following tools are currently available in KAT: | ||
|
||
- **sect**: SEquence Coverage estimator Tool. Estimates the coverage of each sequence in a fasta file using K-mers from a jellyfish hash. | ||
- **comp**: K-mer comparison tool. Creates a matrix of shared K-mers between two jellyfish hashes. | ||
- **sect**: SEquence Coverage estimator Tool. Estimates the coverage of each sequence in a file using K-mers from another sequence file. | ||
- **comp**: K-mer comparison tool. Creates a matrix of shared K-mers between two (or three) sequence files or hashes. | ||
- **gcp:** K-mer GC Processor. Creates a matrix of the number of K-mers found given a GC count and a K-mer count. | ||
- **hist**: Create an histogram of k-mer occurrences from a jellyfish hash. Adds metadata in output for easy plotting. | ||
- **hist**: Create an histogram of k-mer occurrences from a sequence file. Adds metadata in output for easy plotting. | ||
- **plot**: Plotting tool. Contains several plotting tools to visualise K-mer and compare distributions. Requires gnuplot. The following plot tools are available: | ||
|
||
- **density**: Creates a density plot from a matrix created with the "comp" tool. Typically this is used to compare two K-mer hashes produced by different NGS reads. | ||
|
@@ -27,16 +27,13 @@ Generic installation description can be found in the INSTALL file. Short summary | |
- Ensure these tools are correctly installed and available on your system: | ||
- gcc tool chain | ||
- make | ||
- jellyfish = V1.1.10 or V1.1.11 - http://www.cbcb.umd.edu/software/jellyfish/jellyfish-1.1.11.tar.gz **IMPORTANT NOTE**: Please use jellyfish V1.1, we currently do not support jellyfish 2. We will update KAT to support newer versions of jellyfish in due course. | ||
- gnuplot (required for plotting at runtime, must be available on the path to use this functionality) - http://www.gnuplot.info | ||
- If you cloned the git repository you must first run "./autogen.sh" to create the configure and make files for your project. Do not worry if this fails due to missing dependencies at this stage. If you downloaded a source code distribution tarball then you can skip this step. | ||
- If you cloned the git repository you must first run "./autogen.sh" to create the configure and make files for your project. (If you downloaded a source code distribution tarball then you can skip this step.) | ||
- For a typical installation on a machine where you have root access type ```./configure; make; sudo make install;``` | ||
|
||
The configure script can take several options as arguments. One commonly modified option is ```--prefix```, which will install KAT to a custom directory. By default this is "/usr/local", so the KAT executable would be found at "/usr/local/bin" by default. In addition, some options specific to managing KAT dependencies located in non-standard locations are: | ||
|
||
- ```--with-jellyfish``` - for specifying a custom jellyfish directory | ||
- ```--with-boost``` - for specifying a custom boost directory (boost is only required for unit testing) | ||
- ```--with-doxygen``` - for specifying a custom doxygen directory (doxygen is only required for generating code documention. | ||
|
||
Type ```./configure --help``` for full details. | ||
|
||
|
@@ -54,39 +51,21 @@ KAT also come with a python script called "dist_analysis.py", which allows the u | |
After KAT has been installed, the following tools should be available: | ||
|
||
- **kat** - a single executable binary file that contains a number of subtools. | ||
- **kat_comp_reads.sh** - a bash script demonstrating a simple pipeline to compare the K-mers in two read files | ||
- **dist_analysis.py** - a python script for determining the amount of content in each peak in the K-mer spectra | ||
|
||
Running ```kat --help``` will bring up a list of available tools within kat. To get help on any of these subtools simple type: ```kat <tool> --help```. For example: ```kat sect --help``` will show details on how to use the sequence coverage estimator tool. | ||
|
||
Specifically, jellyfish must be available for dynamic linking at runtime. In addition, in order to use the plotting tools it is necessary for "gnuplot" to be available in the PATH. | ||
KAT supports file globbing for input, this is particularly useful when trying to count and analyse kmers for paired end files. For example, | ||
assuming you had two files: LIB_R1.fastq, LIB_R2.fastq in the current directory then ```kat hist -C -m27 LIB_R?.fastq```, will consume any | ||
files matching the pattern LIB_R?.fastq as input, i.e. LIB_R1.fastq, LIB_R2.fastq. The same result could be achieved listing the files at | ||
the command line: ```kat hist -C -m27 LIB_R1.fastq LIB_R2.fastq``` | ||
|
||
Note, the KAT comp subtool takes 2 or three groups of inputs as positional arguments therefore we need to distinguish between the file groups. | ||
This is achieved by surrounding any glob patterns or file lists in single quotes. For example, assuming we have LIB1_R1.fastq, LIB1_R2.fastq, | ||
LIB2_R1.fastq, LIB2_R2.fastq in the current directory, and we want to compare LIB1 against LIB2, instead of catting the files together, we might | ||
run either: ```kat comp -C -D 'LIB1_R?.fastq' 'LIB2_R?.fastq'```; or ```kat comp -C -D 'LIB1_R1.fastq LIB1_R2.fastq' 'LIB2_R1.fastq LIB2_R2.fastq'. | ||
Both commands do the same thing. | ||
|
||
##Extending KAT: | ||
|
||
Developers can extend KAT by adding additional tools, whilst leveraging some of the shared resources that KAT and Jellyfish have made available. In order to add an additional tool to KAT, developers will need a reasonable working knowledge of C++ programming and have GNU auto tools available on their system. The process for adding a new subtool is as follows: | ||
|
||
1. Create a new directory with the tools name in the "src" directory | ||
2. Copy the template_args.hpp file into this directory and rename to whatever you wish. Modify the template file so that it contains details of how to use your tool. Comments have been added to the template to indicate places where you will have to add your custom code. The args template file makes use of getopt.h so developers familiar with this library should have no issues here. For those unfamiliar with this library, please read the getopt documentation: http://www.gnu.org/software/libc/manual/html_node/Getopt.html | ||
3. Copy the template_main.cc and _main.hpp files into the new directory and write whatever code is necessary for your tool. | ||
4. Add an include and extend the validMode method in "src/kat.cc" so that your tool is recognised. Also add your tool to the longDescription method. | ||
5. Update the "src/kat_args.hpp" to extend the KAT help messages. | ||
6. Update the Makefile.am file to include your _main.cc file. | ||
7. Run ```aclocal; autoconf; automake``` to generate the actual configure script and initial Makefiles. | ||
8. Run ```./configure```, with any appropriate options, to make the final Makefiles. | ||
9. Run ```make``` to compile the new version of KAT with your tools included. The KAT binary will be available in the "./bin" directory. | ||
10. Run ```sudo make install``` to install the software. | ||
|
||
See INSTALL file for more details on configuring steps 8-10. | ||
|
||
There are some shared resources available which might aid the generation of a subtool. It is worth browsing the ./src/inc directory to see what is available. There are libraries for: | ||
|
||
- Easing generation of gnuplot commands. Code was taken and modified from: http://ndevilla.free.fr/gnuplot/ | ||
- "jellyfish_helper.hpp" provides some convienient functionality for loading an managing jellyfish hashes from a simple file path. | ||
- Sparse Matrix implementation. In order to avoid loading heavy dependencies such as boost a simple sparse matrix implementation has been added to store matricies in a relatively memory efficient way. The code was originally taken from: http://www.cplusplus.com/forum/general/8352/ and modified for use in KAT. If more functionality is required than is available here, either extend this class or use a dedicated matrix library. | ||
- string and file utils. Some shortcuts to commonly used string and file operations that would otherwise only be available by adding another library as a dependency to this project. | ||
|
||
If you think your subtool is useful and want it available in the official KAT release then please contact [email protected] or [email protected] for discussions on how to harmonise the code. The job will be easier if you maintain a branch from a clone or fork of the KAT repository on github. | ||
|
||
|
||
##Licensing: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
########################################################################## | ||
|
||
# Autoconf initialistion. Sets package name version and contact details | ||
AC_INIT([Kmer Analysis Toolkit (KAT)],[2.0.5],[[email protected] and/or [email protected]],[kat],[http://www.tgac.ac.uk/kat]) | ||
AC_INIT([Kmer Analysis Toolkit (KAT)],[2.0.6],[[email protected] and/or [email protected]],[kat],[http://www.tgac.ac.uk/kat]) | ||
|
||
# Require autoconf 2.53 or higher | ||
AC_PREREQ([2.53]) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters