You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to use Mmseqs to generate MSAs against uniprot and use that with the default alphafold2-multimer pipeline. However, when I run the following script, the header in the sto file is just the uniprot ID without species info. The default alphafold2 with jackhmmer gives sto with header that is parsed by the af2 pipeline to pair sequences for multimer. How can I accomplish the same behavior with mmseqs??
Thank you.
Jackhmmer sample output in sto format:
#=GF ID chain_B-i1
I want to use Mmseqs to generate MSAs against uniprot and use that with the default alphafold2-multimer pipeline. However, when I run the following script, the header in the sto file is just the uniprot ID without species info. The default alphafold2 with jackhmmer gives sto with header that is parsed by the af2 pipeline to pair sequences for multimer. How can I accomplish the same behavior with mmseqs??
Thank you.
Jackhmmer sample output in sto format:
#=GF ID chain_B-i1
#=GS tr|A0A2J8M6R0|A0A2J8M6R0_PANTR/1-79 DE [subseq from] COA6 isoform 2 OS=Pan troglodytes OX=9598 GN=CK820_G0023150 PE=4 SV=1
#=GS tr|A0A2J8UE17|A0A2J8UE17_PONAB/1-79 DE [subseq from] COA6 isoform 2 OS=Pongo abelii OX=9601 GN=CR201_G0028559 PE=4 SV=1
#=GS sp|Q5JTJ3|COA6_HUMAN/47-125 DE [subseq from] Cytochrome c oxidase assembly factor 6 homolog OS=Homo sapiens OX=9606 GN=COA6 PE=1 SV=1
#=GS tr|A0A2I3SJ14|A0A2I3SJ14_PANTR/47-125 DE [subseq from] Cytochrome c oxidase assembly factor 6 OS=Pan troglodytes OX=9598 GN=COA6 PE=4 SV=1
#=GS tr|A0A6D2WTQ0|A0A6D2WTQ0_PANTR/47-125 DE [subseq from] COA6 isoform 3 OS=Pan troglodytes OX=9598 GN=CK820_G0023150 PE=4 SV=1
Mmseqs sample output in sto format:
Q5JTJ3 MAAPSMKERQVCWGARDEYWKCLDENLEDASQCKKLRSSFESSCPQQWIKYFDKRRDYLKFKEKFEAGQFEPSETTAKS
A0A2J8M6R0 MAAPSMKERQVCWGARDEYWKCLDENLEDASQCKKLRSSFESSCPQQWIKYFDKRRDYLKFKEKFEAGQFEPSETTAKS
A0A2I3SJ14 MAAPSMKERQVCWGARDEYWKCLDENLEDASQCKKLRSSFESSCPQQWIKYFDKRRDYLKFKEKFEAGQFEPSETTAKS
A0A6D2WTQ0 MAAPSMKERQVCWGARDEYWKCLDENLEDASQCKKLRSSFESSCPQQWIKYFDKRRDYLKFKEKFEAGQFEPSETTAKS
Script I ran for Mmseqs:
`MMSEQS="mmseqs" # Path to MMseqs2 binary
QUERY=<path_to_query> # Query file
BASE="./output" # Output directory
UNIPROT_DB=<path_to_uniprot> # UniProt database
EXPAND_EVAL=inf
ALIGN_EVAL=10
QSC=-20.0
MAX_ACCEPT=1000000
SEARCH_PARAM="--num-iterations 3 --db-load-mode 2 -a --k-score 'seq:96,prof:80' -e 0.1 --max-seqs 10000"
FILTER_PARAM="--filter-min-enable 1000 --diff 3000 --qid 0.0,0.2,0.4,0.6,0.8,1.0 --qsc 0 --max-seq-id 0.95"
mkdir -p "${BASE}"
"${MMSEQS}" createdb "${QUERY}" "${BASE}/qdb" --dbtype 1
"${MMSEQS}" search "${BASE}/qdb" "${UNIPROT_DB}" "${BASE}/res" "${BASE}/tmp" $SEARCH_PARAM
"${MMSEQS}" align "${BASE}/qdb" "${UNIPROT_DB}" "${BASE}/res" "${BASE}/res_aln" --db-load-mode 2 -e ${ALIGN_EVAL} --max-accept ${MAX_ACCEPT} --alt-ali 10 -a
"${MMSEQS}" filterresult "${BASE}/qdb" "${UNIPROT_DB}" "${BASE}/res_aln" "${BASE}/res_filtered" --db-load-mode 2 --qid 0 --qsc $QSC --diff 0 --max-seq-id 1.0 --filter-min-enable 100
"${MMSEQS}" result2msa "${BASE}/qdb" "${UNIPROT_DB}" "${BASE}/res_filtered" "${BASE}/uniprot.sto" --msa-format-mode 4 --db-load-mode 2`
The text was updated successfully, but these errors were encountered: