HaMStR SetUp

Preparation of the HaMStR environment

Preparing files / directories for HaMStR / HaMStR-OneSeq

ProtTrace integrates with the targeted ortholog search method, HaMStR. HaMStR can be optionally used to either extend pre-existing orthologous groups (using the hamstr.pl script), or can compile on-the-fly an orthologous group for an individual seed protein (the oneseq.pl script). The HaMStR package ships, per default, with the reference proteomes of the Quest for Orthologs consortium. To add further species, refer to the quick guidelines given below. For a more comprehensive description please refer to the documentation of the HaMStR package.

Once HaMStR-OneSeq has been successfully configured, you will have the following directory layout in your HaMStR directory:

                       HaMStR 
    ______________________|_____________________ 
   |                      |                     |
genome_dir            blast_dir            weight_dir

genome_dir - This is the directory where the gene sets for the species you want to search orthologs in are stored.
- For every species, create a sub-directory whithin genome_dir using the following naming convention:

SPECIESNAME@NCBI-Taxonomy-ID@Geneset-Version

Copy the gene set file inside this directory with same name as the sub-directory name, appending the ending .fa. For example, if your reference taxa is human, a gene set can look like

genome_dir/HUMAN@9606@1/HUMAN@[email protected]

blast_dir - This directory has the same structure as the genome_dir.
- for each species, create a sub-directory according to the naming conventions explained above
- copy the fasta file from the corresponding sub-directory of the genome_dir into the this sub-directory
- run makeblastdb to create the BLAST databases

makeblastdb -dbtype prot -in HUMAN@[email protected] -out HUMAN@9606@1

weight_dir - This directory contains the feature annotations for every species. You will have to annotate Pfam and SMART domains, low complexity regions and the like using the annotation script provided with the HaMStR package

perl /path/to/your/hamstr/bin/fas/annotation.pl -fasta=/path/to/your/hamstr/genome_dir/HUMAN@9606@1/HUMAN@[email protected] -path=/path/to/your/hamstr/weight_dir -name=HUMAN@9606@1

Please take care that all parameter paths are provided as absolute paths. This action takes considerably longer than the BLAST database creation with makeblastdb (it takes about one hour to annotate a gene set with 5000 sequences)

To check if your manually added species is integrated into the HaMStR framework, please run

perl bin/oneSeq.pl -showTaxa

This command prints a list of all species available for the HaMStR analysis.

Once everything is set, you are ready to update the config file of protTrace by providing the path to the HaMStR working directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HaMStR SetUp

Preparation of the HaMStR environment

Preparing files / directories for HaMStR / HaMStR-OneSeq

Clone this wiki locally