forked from tw7649116/MultiplexSSR
-
Notifications
You must be signed in to change notification settings - Fork 0
License
ferrolad/MultiplexSSR
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Manual of multiplexPCR (Email:[email protected]) 1. About multiplexPCR The aim of the pipeline is to develop multiplex SSR-PCRs with resequencing data. This pipeline includes two scripts, MultiplexSSR.pl and random.pl. MultiplexSSR.pl takes resequencing data as input to develop multiplex SSR-PCRs. random.pl takes SSRs in vcf format as input to assess the saturation of allele number, range, maximum position and minimum position with the increase of individual number for each SSR. 2. MultiplexSSR.pl packages six programs, including: 1) Tandem Repeats Finder http://tandem.bu.edu/trf/trf.html 2) Lobstr http://lobstr.teamerlich.org/ 3) BWA http://bio-bwa.sourceforge.net/ 4) SAMtools http://samtools.sourceforge.net/ 5) ePCR https://launchpad.net/ubuntu/+source/epcr/ 6) Multiplx http://bioinfo.ut.ee/?page_id=167 The directory structure ├── bin │ ├── ePCR │ │ ├── e-PCR │ │ ├── fahash │ │ ├── famap │ │ └── re-PCR │ ├── lobSTR │ │ ├── bin │ │ └── share │ ├── Multiplx │ │ ├── cmultiplx │ │ ├── primers-100.txt │ │ ├── thermodynamics.txt │ │ └── thermodynamics.txt.primer3 │ ├── primer3-2.4.0 │ │ ├── cmp_settings.pl │ │ ├── create_test_folders.pl │ │ ├── example │ │ ├── kmer_lists │ │ ├── LICENSE │ │ ├── README.md │ │ ├── settings_files │ │ ├── src │ │ └── test │ └── trf409.linux64 ├── MultiplexSSR.pl └── random.pl The script MultiplexSSR.pl can automaticly obtain the path of BWA and SAMtools. Other four programs can be installed independently and the pathes need be changed by manually from line 43 to 47. 3. MultiplexSSR.pl includes six steps: 1) The tandem repeats are first detected using Tandem Repeats Finder. 2) The SNPs and Indels are called with BWA-MEM and SAMtools, and SSRs are called with Lobstr. 3) The flanking sequences of 500 bp in each side of SSRs are extracted. 4) The primers are designed with Primer3 with SNPs, Indels and tandem repeats in reference are masked. 5) The primers are filtered out with ePCR that match more than two locations in reference. 6) The primer pairs are grouped with Multiplx and subgrouped according to the position and range. The designed primers are subgrouped for dye label and could be directly used. 4. random.pl is dependent on three R pacakges 1) psych 2) PMCMR 3) ggplot2 5. random.pl includes two steps: 1) The alleles for each SSR are counted for number, range, maximum size and minimum size. The maximum individual number is automatically detected and the minimum individual number is set to five. 2) The relationship between number, range, maximum size and minimum size with individual number is tested and showed as graphs. 6. System requirement The pipeline was tested on Linux version 4.13.0-36-generic (buildd@lgw01-amd64-033) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0- 6ubuntu1~16.04.9)) 7. Data requirement. 1) reference sequence in format of fasta 2) resequencing data in the format “fq/fastq” without compression. 8. Create a list file: input.txt less input.txt test1.1.fq test1.2.fq test2.1.fq test2.2.fq 9. Run the pipeline The script MultiplexSSR.pl can be run in two models, the beginning-to-end model and skip model. 9.1 Run the pipeline in beginning-to-end model. Under this model, the script is run from the first step to the sixth step. MultiplexSSR.pl –t 20 9.2 Run the pipeline in skip model. This model skip the first two steps with two advantages. The step of genotyping is skipped, which is not necessary for parameter adjustment and time wasting. The other sourced genotypes of SSRs and SNPs could be conveniently used. This model is only depend on ePCR and needs reference.fa.2.7.7.80.10.50.500.mask, reference.fa, SSR.vcf and SNP.vcf as input. MultiplexSSR.pl –s T –d 5 10. The results of MultiplexSSR.pl 1) Multiplex.summary.txt: the primers for each SSR 2) Multiplex.primer.txt: the SSR groups Intermediate results 1) SNP.vcf: the called SNPs 2) SSR.vcf: the called SSRs 3) reference.fa.2.7.7.80.10.50.500.mask: The masked reference by Tandem Repeats Finder 4) primer.f.txt: the designed primers 5) primer.ff.txt: the specific primers after filtered with ePCR 11. Check the saturation with individual number random.pl –i SSR.vcf 12. The results of random.pl 1) random.pairs.panels.tiff 2) random.AlleleNumber_IndividualNumber.tiff 3) random.Range_IndividualNumber.tiff 4) random.Maximum_IndividualNumber.tiff 5) random.Minimum_IndividualNumber.tiff 6) random.text.txt 13. The SSR count in the reference ###under the current path ###need the file reference.fa.2.7.7.80.10.50.500.dat ###format change perl ./dat_bed.pl ###count the SSR by length perl ./stastic.misa.length100.pl ###count the SSR by repeat number of motif perl ./stastic.misa.motif.pl
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- C 71.8%
- HTML 19.4%
- Perl 4.1%
- Python 2.8%
- Other 1.9%