Add uchime2_denovo to close #92 #100

colinbrislawn · 2024-09-29T18:53:56Z

WIP to close #92
Add internal functions and tests to use 'uchime' 'uchime2' or 'uchime3'
Open questions:

Externally, do we want to present all these as new methods/functions or new settings?

qiime vsearch uchime-denovo
qiime vsearch uchime2-denovo
qiime vsearch uchime3-denovo
# or
qiime vsearch uchime-denovo --p-method 'uchime2'

Internally, do we want to combine some of these? They are very similar, especially uchime2 and 3

The only difference [in --uchime3_denovo] from --uchime2_denovo is that the default minimum abundance
skew (--abskew) is set to 16.0 rather than 2.0.

Do we want to expose --abskew for some or all of these?

hagenjp · 2024-10-03T17:23:10Z

Hi @colinbrislawn,
1./2. We all agree that new parameters would be best vs. new methods. Thank you!
3. We do not have strong feelings about --abskew but making sure that the default is none (would be best so that it can match the algorithm by default)

colinbrislawn · 2024-10-03T19:18:40Z

new parameters would be best

Cool!

How does this look for the CLI? (CLI docs for the existing function)

qiime vsearch uchime-denovo \
  --p-method 'uchime2' \
  --p-mindiffs 99 # ignored when running uchime2 and uchime3
  --p-mindiv 0.8 # ignored when running uchime2 and uchime3
  --p-minh 0.99 # ignored when running uchime2 and uchime3
  ...

What should we do if someone passes settings that are not used by uchime2 and uchime3?
vsearch simply ignores them silently, which I don't love for our API

colinbrislawn · 2024-10-03T19:58:17Z

Let's add support for --abskew in a different PR, to keep things tidy ✨ 🧹

q2_vsearch/_chimera.py

colinbrislawn · 2024-10-03T21:20:32Z

Is --p-method a good name? I could go back to --p-algorithm or something new.

The methods are tested with 100% coverage, but the results are not, as I'm not sure how they should work with non-trivial examples.

If we want to build a test that shows the difference between these methods, here's commentary on vsearch's implementation:
torognes/vsearch#283
torognes/vsearch#503

@nbokulich, if you have the time and interest, I would appreciate your review!

q2_vsearch/_chimera.py

q2_vsearch/plugin_setup.py

colinvwood · 2024-12-13T17:37:52Z

q2_vsearch/tests/test_chimera.py

What are your thoughts on adding tests for these new algorithm versions (beyond testing the command string)?

I think there's a trivial test that shows both working.

I don't have an example in which these methods differ.... Would you like me to try and find one?

It's true that there is a test to which we pass "uchime3" as the method; however we technically can't be sure that this method is being implemented by the underlying software without differentiating behavior.

I understand if it's too difficult to contrive input data that shows different expected behavior for the different algorithm methods, but if it is reasonably easy to do so it would be best.

colinbrislawn · 2024-12-14T19:05:38Z

q2_vsearch/plugin_setup.py

@@ -404,12 +406,17 @@
                  'abundances).'),
    },
    parameter_descriptions={
+        'method': ('Denovo chimera detection based on uchime (Edgar 2011), '


@colinvwood While trying to keep this short and sweet, I've added a little more detail.

How does this look?

I think we should cite Rob's papers too. Let me work on adding those two papers to the .bib file...

This is good 👍🏻

Did you want me to add these citations to the .bib and link them up, or are we good to go?

If you're referring to each of the citations for each of the algorithm iterations, I think that yes we should do that. If you're referring to "Rob's papers" I'm unsure which ones those are exactly.

colinvwood · 2024-12-16T18:49:59Z

Hey @colinbrislawn, not sure if you're waiting on any more input from me but everything looks good to me except for the outstanding tests discussion.

q2_vsearch/plugin_setup.py

colinbrislawn · 2024-12-19T19:59:30Z

I'm working to build some good tests, starting from PR2 from the test data.

Note: if I knew these algorithms better, I may be able to make mock tests from first principles. But I don't!

I may need to put this on hold while I work on other projects.

# shorten names
vsearch --fastx_uniques PR2-18S-rRNA-V4.derep.fsa --sizein --sizeout --relabel derep_ --fastaout PR2_short.fsa

# run all 3 methods
vsearch --uchime_denovo  PR2_short.fsa --chimeras chi_v1.fasta --uchimeout nonchi_v1.uc &
vsearch --uchime2_denovo PR2_short.fsa --chimeras chi_v2.fasta --uchimeout nonchi_v2.uc &
vsearch --uchime3_denovo PR2_short.fsa --chimeras chi_v3.fasta --uchimeout nonchi_v3.uc &

# inspect for differences
git diff --no-index --word-diff -U0 nonchi_v1.uc nonchi_v2.uc
git diff --no-index --word-diff -U0 nonchi_v1.uc nonchi_v3.uc
git diff --no-index --word-diff -U0 nonchi_v2.uc nonchi_v3.uc

grep 'derep_33;size=86\s' nonchi_v1.uc | head -n 1
grep 'derep_33;size=86\s' nonchi_v2.uc | head -n 1
grep 'derep_33;size=86\s' nonchi_v3.uc | head -n 1

Possible reads from PR2 to use. Note the short names!

query	parent1	parent2	uchime1	uchim2	uchime3
derep_33;size=86	derep_2;size=485	derep_5;size=315	N	Y	N
derep_34;size=85	derep_2;size=485	derep_5;size=315	N	Y	N

Testing uchime2 vs uchime3 should be easy.

The only difference from --uchime2_denovo is that the default minimum abundance skew (--abskew) is set to 16.0 rather than 2.0.

This higher default --abskew threshold should lead to fewer called chimeras, which is what I see!

colinbrislawn · 2024-12-19T20:05:49Z

Would this work as a minimal test?

>uchime
0.0239  derep_33;size=86        derep_2;size=485        derep_5;size=315        derep_5;size=315        100.0   98.1    99.4    97.5    99.4    1       00       3       0       0       0.6     N
>uchime2
0.0239  derep_33;size=86        derep_2;size=485        derep_5;size=315        derep_5;size=315        100.0   98.1    99.4    97.5    99.4    1       00       3       0       0       0.6     Y
>ucmime3
0.0000  derep_33;size=86        *       *       *       *       *       *       *       *       0       0       0       0       0       0       *       N

1 and 2 report the same score, but different final decision
and 3 reports nothing

Let me check if this works with only three reads!

colinbrislawn · 2025-01-17T10:14:05Z

vsearch --fastx_uniques PR2-18S-rRNA-V4.derep.fsa --sizein --sizeout --relabel derep_ --fastaout PR2_short.fsa
vsearch --uchime_denovo  PR2_short.fsa --uchimeout nonchi_v1.uc &
vsearch --uchime2_denovo PR2_short.fsa --uchimeout nonchi_v2.uc &
vsearch --uchime3_denovo PR2_short.fsa --uchimeout nonchi_v3.uc &

git diff --no-index --word-diff -U0 nonchi_v1.uc nonchi_v2.uc
git diff --no-index --word-diff -U0 nonchi_v1.uc nonchi_v3.uc
git diff --no-index --word-diff -U0 nonchi_v2.uc nonchi_v3.uc

grep 'derep_16\;' nonchi_v1.uc | head -n 1
grep 'derep_16\;' nonchi_v2.uc | head -n 1
grep 'derep_16\;' nonchi_v3.uc | head -n 1

# Pull derep_2, derep_5, and derep_16 into a new file. Then:
vsearch --uchime_denovo  PR2_pull.fsa --quiet --uchimeout - | grep 'derep_16'
vsearch --uchime2_denovo  PR2_pull.fsa --quiet --uchimeout - | grep 'derep_16'
vsearch --uchime3_denovo  PR2_pull.fsa --quiet --uchimeout - | grep 'derep_16'

colinbrislawn · 2025-01-17T10:52:34Z

Here's a toy fasta file, which I've named PR2_pull.fsa

>derep_2;size=485
AGCTCCAATAGCGTATATTAAAGTTGTTGTGGTTAAAAAGCTCGTAGTTGAACCTTGGGCCTGGCTGGCCGGTCCGCCTC
ACCGCGTGCACTGGTCCGGCCGGGCCTTTCCCTCTGTGGAACCCCATACCCTTCACTGGGCGTGGCGGGGAAACAGGACA
TTTACTTTGAAAAAATTAGAGTGCTCCAGGCAGGCCTATGCTCGAATACATTAGCATGGAATAATAAAATAGGACGCGCG
GTTCTATTTTGTTGGTTTATAGGACCGCCGTAATGATTAATAGGGACAGTCGGGGGCATCAGTATTCAACTGTCAGAGGT
GAAATTCTTGGATCAGTTGAAGACTAACTACTGCGAAAGCATTTGCCAAGGATGTTTTCA
>derep_5;size=315
AGCTCCAATAGCGTATATTAAAGTTGTTGTGGTTAAAAAGCTCGTAGTTGAACCTTGGGCCTGGCTGGCCGGTCCGCCTC
ACCGCGTGTACTGGTCCGGCCGGTGAAATTCTTGGATTTATTGAAGACTAACTACTGCGAAAGCATTTGCCAAGGATGTT
TTCA
>derep_16;size=146
AGCTCCAATAGCGTATATTAAAGTTGTTGTGGTTAAAAAGCTCGTAGTTGAACCTTGGGCCTGGCTGGCCGGTCCGCCTC
ACCGCGTGCACTGGTCCGGCCGGTGAAATTCTTGGATTTATTGAAGACTAACTACTGCGAAAGCATTTGCCAAGGATGTT
TTCA

Running this file with uchime*_denovo variants produces different results.

vsearch --uchime_denovo  PR2_pull.fsa --quiet --uchimeout -
vsearch --uchime2_denovo  PR2_pull.fsa --quiet --uchimeout -
vsearch --uchime3_denovo  PR2_pull.fsa --quiet --uchimeout -

program	key difference
uchime	first column score `0.0239`, final vote `N`
uchime2_denovo	first column score `0.0239`, final vote `Y`
uchime3_denovo	first column score *, final vote `N`

vsearch --uchime_denovo  PR2_pull.fsa --quiet --uchimeout -
0.0000  derep_2;size=485        *       *       *       *       *       *       *       *       0       0       0       0       0    0*       N
0.0000  derep_5;size=315        *       *       *       *       *       *       *       *       0       0       0       0       0    0*       N
0.0239  derep_16;size=146       derep_2;size=485        derep_5;size=315        derep_5;size=315        100.0   98.1    99.4    97.5 99.4     1       0       0       3       0       0       0.6     N

vsearch --uchime2_denovo  PR2_pull.fsa --quiet --uchimeout -
0.0000  derep_2;size=485        *       *       *       *       *       *       *       *       0       0       0       0       0    0*       N
0.0000  derep_5;size=315        *       *       *       *       *       *       *       *       0       0       0       0       0    0*       N
0.0239  derep_16;size=146       derep_2;size=485        derep_5;size=315        derep_5;size=315        100.0   98.1    99.4    97.5 99.4     1       0       0       3       0       0       0.6     Y

vsearch --uchime3_denovo  PR2_pull.fsa --quiet --uchimeout -
0.0000  derep_2;size=485        *       *       *       *       *       *       *       *       0       0       0       0       0    0*       N
0.0000  derep_5;size=315        *       *       *       *       *       *       *       *       0       0       0       0       0    0*       N
0.0000  derep_16;size=146       *       *       *       *       *       *       *       *       0       0       0       0       0    0*       N

I've taken a look at the testing framework, but I can't make head or tails of it.
It looks like the input fasta files are built from features tables (??)

@colinvwood, if you are willing to build that tests for this, I would very much appreciate it!

colinbrislawn added 3 commits September 29, 2024 14:17

Add uchime2_denovo

31dcbc3

lint

82135a8

Add CLI

f24f45f

Refactor to method

690f18e

colinbrislawn commented Oct 3, 2024

View reviewed changes

q2_vsearch/_chimera.py Show resolved Hide resolved

colinbrislawn marked this pull request as ready for review October 3, 2024 21:18

Merge branch 'qiime2:dev' into uchime23

4f9a4da

ebolyen assigned colinvwood Dec 12, 2024

colinvwood reviewed Dec 13, 2024

View reviewed changes

colinvwood and others added 3 commits December 13, 2024 10:44

rerun CI

b8cd1e0

remove abskew from this PR

0f149b5

List methods and brief citations

3da9218

colinbrislawn commented Dec 14, 2024

View reviewed changes

Add three citations

5e7c7fc

colinbrislawn commented Dec 18, 2024

View reviewed changes

q2_vsearch/plugin_setup.py Outdated Show resolved Hide resolved

colinbrislawn added 2 commits December 18, 2024 14:37

fix lint

38338e0

fix citations dict

88c1e8c

Merge branch 'qiime2:dev' into uchime23

d3b3b0e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add uchime2_denovo to close #92 #100

Add uchime2_denovo to close #92 #100

colinbrislawn commented Sep 29, 2024 •

edited

Loading

hagenjp commented Oct 3, 2024

colinbrislawn commented Oct 3, 2024 •

edited

Loading

colinbrislawn commented Oct 3, 2024

colinbrislawn commented Oct 3, 2024 •

edited

Loading

colinvwood Dec 13, 2024

colinbrislawn Dec 13, 2024

colinvwood Dec 13, 2024

colinbrislawn Dec 14, 2024

colinvwood Dec 16, 2024

colinbrislawn Dec 17, 2024

colinvwood Dec 17, 2024

colinvwood commented Dec 16, 2024

colinbrislawn commented Dec 19, 2024 •

edited

Loading

colinbrislawn commented Dec 19, 2024

colinbrislawn commented Jan 17, 2025

colinbrislawn commented Jan 17, 2025 •

edited

Loading

Add uchime2_denovo to close #92 #100

Are you sure you want to change the base?

Add uchime2_denovo to close #92 #100

Conversation

colinbrislawn commented Sep 29, 2024 • edited Loading

hagenjp commented Oct 3, 2024

colinbrislawn commented Oct 3, 2024 • edited Loading

colinbrislawn commented Oct 3, 2024

colinbrislawn commented Oct 3, 2024 • edited Loading

colinvwood Dec 13, 2024

Choose a reason for hiding this comment

colinbrislawn Dec 13, 2024

Choose a reason for hiding this comment

colinvwood Dec 13, 2024

Choose a reason for hiding this comment

colinbrislawn Dec 14, 2024

Choose a reason for hiding this comment

colinvwood Dec 16, 2024

Choose a reason for hiding this comment

colinbrislawn Dec 17, 2024

Choose a reason for hiding this comment

colinvwood Dec 17, 2024

Choose a reason for hiding this comment

colinvwood commented Dec 16, 2024

colinbrislawn commented Dec 19, 2024 • edited Loading

colinbrislawn commented Dec 19, 2024

colinbrislawn commented Jan 17, 2025

colinbrislawn commented Jan 17, 2025 • edited Loading

colinbrislawn commented Sep 29, 2024 •

edited

Loading

colinbrislawn commented Oct 3, 2024 •

edited

Loading

colinbrislawn commented Oct 3, 2024 •

edited

Loading

colinbrislawn commented Dec 19, 2024 •

edited

Loading

colinbrislawn commented Jan 17, 2025 •

edited

Loading