Why Pairwise alignment (--allpairs_global) only support positive strand? #576

billzt · 2024-10-15T01:31:09Z

I have a set of sequences that are definitely homologous. However some of them are in negative strand and I don't know which of them are. Therefore I cannot use --allpairs_global directly since sequences are aligned on their plus strand only. Any other good suggestions?

The text was updated successfully, but these errors were encountered:

torognes · 2024-10-15T10:37:44Z

Perhaps the --orient command may be useful to orient all sequences in the same direction before aligning them with the --allpairs_global command?

If all the sequences are fairly similar, I don't think you'll need many sequences in the database for the orient command to work fine.

billzt · 2024-10-15T11:09:11Z

Thank you. I will try --orient. But I still hope --allpairs_global could support --strand both in the future, the same as that in --cluster_fast.

torognes · 2024-10-15T12:29:11Z

We'll consider adding --strand both in the future.

frederic-mahe · 2024-10-18T10:28:43Z

hello @billzt

here is a possible workaround, using a tiny dataset for demonstration purpose:

>s1
AAAA
>s2
AAAT
>s3
TTTT

s1 has 75% similarity with s2,
s1 has 100% similarity with s3 (if s3 is reverse-complemented)

s1 AAAA
   |||
s2 AAAT

s1 AAAA
   ||||
s3 AAAA (reverse-complement)

use --fastx_revcomp to reverse-complement the dataset,
concatenate both normal and reverse-complement datasets,
use --allpairs_global to find matching pairs,
(optional: filter out the results)

FASTA_FILE=$(mktemp)
printf ">s1\nAAAA\n>s2\nAAAT\n>s3\nTTTT\n" > "${FASTA_FILE}"

(
    cat "${FASTA_FILE}"
    vsearch \
        --fastx_revcomp "${FASTA_FILE}" \
        --quiet \
        --label_suffix "_rv" \
        --fastaout -
) | \
    vsearch \
        --allpairs_global - \
        --id 0.75 \
        --iddef 1 \
        --quiet \
        --blast6out -

rm "${FASTA_FILE}"

We obtain the expected results, equivalent to a search on both strands:

s1	s3_rv	100.0	4	0	0	1	4	1	4	-1	0
s1	s2	75.0	4	1	0	1	4	1	4	-1	0
s3	s1_rv	100.0	4	0	0	1	4	1	4	-1	0

Warning: doubling the size of a dataset quadruples computation time.

torognes added the question label Oct 15, 2024

torognes added the enhancement label Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why Pairwise alignment (--allpairs_global) only support positive strand? #576

Why Pairwise alignment (--allpairs_global) only support positive strand? #576

billzt commented Oct 15, 2024 •

edited

Loading

torognes commented Oct 15, 2024

billzt commented Oct 15, 2024 •

edited

Loading

torognes commented Oct 15, 2024

frederic-mahe commented Oct 18, 2024

Why Pairwise alignment (--allpairs_global) only support positive strand? #576

Why Pairwise alignment (--allpairs_global) only support positive strand? #576

Comments

billzt commented Oct 15, 2024 • edited Loading

torognes commented Oct 15, 2024

billzt commented Oct 15, 2024 • edited Loading

torognes commented Oct 15, 2024

frederic-mahe commented Oct 18, 2024

billzt commented Oct 15, 2024 •

edited

Loading

billzt commented Oct 15, 2024 •

edited

Loading