-
Notifications
You must be signed in to change notification settings - Fork 4
HaMStR SetUp
ProtTrace integrates with the targeted ortholog search method, HaMStR. HaMStR can be optionally used to either extend pre-existing orthologous groups (using the hamstr.pl script), or can compile on-the-fly an orthologous group for an individual seed protein (the oneseq.pl script). The HaMStR package ships, per default, with the reference proteomes of the Quest for Orthologs consortium. To add further species, refer to the quick guidelines given below. For a more comprehensive description please refer to the documentation of the HaMStR package.
Once HaMStR-OneSeq has been successfully configured, you will have the following directory layout in your HaMStR directory:
HaMStR
______________________|_____________________
| | |
genome_dir blast_dir weight_dir
-
genome_dir - This is the directory where the gene sets for the species you want to search orthologs in are stored.
- For every species, create a sub-directory whithin genome_dir using the following naming convention:
SPECIESNAME@NCBI-Taxonomy-ID@Geneset-Version
Copy the gene set file inside this directory with same name as the sub-directory name, appending the ending .fa. For example, if your reference taxa is human, a gene set can look like
genome_dir/HUMAN@9606@1/HUMAN@[email protected]
-
blast_dir - This directory has the same structure as the genome_dir.
- for each species, create a sub-directory according to the naming conventions explained above
- copy the fasta file from the corresponding sub-directory of the genome_dir into the this sub-directory
- run makeblastdb to create the BLAST databases
makeblastdb -dbtype prot -in HUMAN@[email protected] -out HUMAN@9606@1
- weight_dir - This directory contains the feature annotations for every species. You will have to annotate Pfam and SMART domains, low complexity regions and the like using the annotation script provided with the HaMStR package
perl /path/to/your/hamstr/bin/fas/annotation.pl -fasta=/path/to/your/hamstr/genome_dir/HUMAN@9606@1/HUMAN@[email protected] -path=/path/to/your/hamstr/weight_dir -name=HUMAN@9606@1
Please take care that all parameter paths are provided as absolute paths. This action takes considerably longer than the BLAST database creation with makeblastdb (it takes about one hour to annotate a gene set with 5000 sequences)
To check if your manually added species is integrated into the HaMStR framework, please run
perl bin/oneSeq.pl -showTaxa
This command prints a list of all species available for the HaMStR analysis.
Once everything is set, you are ready to update the config file of protTrace by providing the path to the HaMStR working directory.