-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add uchime2_denovo to close #92 #100
base: dev
Are you sure you want to change the base?
Conversation
Hi @colinbrislawn, |
Cool! How does this look for the CLI? (CLI docs for the existing function) qiime vsearch uchime-denovo \
--p-method 'uchime2' \
--p-mindiffs 99 # ignored when running uchime2 and uchime3
--p-mindiv 0.8 # ignored when running uchime2 and uchime3
--p-minh 0.99 # ignored when running uchime2 and uchime3
... What should we do if someone passes settings that are not used by uchime2 and uchime3? |
Let's add support for |
Is The methods are tested with 100% coverage, but the results are not, as I'm not sure how they should work with non-trivial examples. If we want to build a test that shows the difference between these methods, here's commentary on vsearch's implementation: @nbokulich, if you have the time and interest, I would appreciate your review! |
q2_vsearch/tests/test_chimera.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on adding tests for these new algorithm versions (beyond testing the command string)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a trivial test that shows both working.
I don't have an example in which these methods differ.... Would you like me to try and find one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that there is a test to which we pass "uchime3" as the method; however we technically can't be sure that this method is being implemented by the underlying software without differentiating behavior.
I understand if it's too difficult to contrive input data that shows different expected behavior for the different algorithm methods, but if it is reasonably easy to do so it would be best.
@@ -404,12 +406,17 @@ | |||
'abundances).'), | |||
}, | |||
parameter_descriptions={ | |||
'method': ('Denovo chimera detection based on uchime (Edgar 2011), ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colinvwood While trying to keep this short and sweet, I've added a little more detail.
How does this look?
I think we should cite Rob's papers too. Let me work on adding those two papers to the .bib file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want me to add these citations to the .bib and link them up, or are we good to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're referring to each of the citations for each of the algorithm iterations, I think that yes we should do that. If you're referring to "Rob's papers" I'm unsure which ones those are exactly.
Hey @colinbrislawn, not sure if you're waiting on any more input from me but everything looks good to me except for the outstanding tests discussion. |
I'm working to build some good tests, starting from PR2 from the test data. Note: if I knew these algorithms better, I may be able to make mock tests from first principles. But I don't! I may need to put this on hold while I work on other projects. # shorten names
vsearch --fastx_uniques PR2-18S-rRNA-V4.derep.fsa --sizein --sizeout --relabel derep_ --fastaout PR2_short.fsa
# run all 3 methods
vsearch --uchime_denovo PR2_short.fsa --chimeras chi_v1.fasta --uchimeout nonchi_v1.uc &
vsearch --uchime2_denovo PR2_short.fsa --chimeras chi_v2.fasta --uchimeout nonchi_v2.uc &
vsearch --uchime3_denovo PR2_short.fsa --chimeras chi_v3.fasta --uchimeout nonchi_v3.uc &
# inspect for differences
git diff --no-index --word-diff -U0 nonchi_v1.uc nonchi_v2.uc
git diff --no-index --word-diff -U0 nonchi_v1.uc nonchi_v3.uc
git diff --no-index --word-diff -U0 nonchi_v2.uc nonchi_v3.uc
grep 'derep_33;size=86\s' nonchi_v1.uc | head -n 1
grep 'derep_33;size=86\s' nonchi_v2.uc | head -n 1
grep 'derep_33;size=86\s' nonchi_v3.uc | head -n 1 Possible reads from PR2 to use. Note the short names!
Testing
This higher default --abskew threshold should lead to fewer called chimeras, which is what I see! |
Would this work as a minimal test? >uchime
0.0239 derep_33;size=86 derep_2;size=485 derep_5;size=315 derep_5;size=315 100.0 98.1 99.4 97.5 99.4 1 00 3 0 0 0.6 N
>uchime2
0.0239 derep_33;size=86 derep_2;size=485 derep_5;size=315 derep_5;size=315 100.0 98.1 99.4 97.5 99.4 1 00 3 0 0 0.6 Y
>ucmime3
0.0000 derep_33;size=86 * * * * * * * * 0 0 0 0 0 0 * N 1 and 2 report the same score, but different final decision Let me check if this works with only three reads! |
vsearch --fastx_uniques PR2-18S-rRNA-V4.derep.fsa --sizein --sizeout --relabel derep_ --fastaout PR2_short.fsa
vsearch --uchime_denovo PR2_short.fsa --uchimeout nonchi_v1.uc &
vsearch --uchime2_denovo PR2_short.fsa --uchimeout nonchi_v2.uc &
vsearch --uchime3_denovo PR2_short.fsa --uchimeout nonchi_v3.uc &
git diff --no-index --word-diff -U0 nonchi_v1.uc nonchi_v2.uc
git diff --no-index --word-diff -U0 nonchi_v1.uc nonchi_v3.uc
git diff --no-index --word-diff -U0 nonchi_v2.uc nonchi_v3.uc
grep 'derep_16\;' nonchi_v1.uc | head -n 1
grep 'derep_16\;' nonchi_v2.uc | head -n 1
grep 'derep_16\;' nonchi_v3.uc | head -n 1
# Pull derep_2, derep_5, and derep_16 into a new file. Then:
vsearch --uchime_denovo PR2_pull.fsa --quiet --uchimeout - | grep 'derep_16'
vsearch --uchime2_denovo PR2_pull.fsa --quiet --uchimeout - | grep 'derep_16'
vsearch --uchime3_denovo PR2_pull.fsa --quiet --uchimeout - | grep 'derep_16' |
Here's a toy fasta file, which I've named >derep_2;size=485
AGCTCCAATAGCGTATATTAAAGTTGTTGTGGTTAAAAAGCTCGTAGTTGAACCTTGGGCCTGGCTGGCCGGTCCGCCTC
ACCGCGTGCACTGGTCCGGCCGGGCCTTTCCCTCTGTGGAACCCCATACCCTTCACTGGGCGTGGCGGGGAAACAGGACA
TTTACTTTGAAAAAATTAGAGTGCTCCAGGCAGGCCTATGCTCGAATACATTAGCATGGAATAATAAAATAGGACGCGCG
GTTCTATTTTGTTGGTTTATAGGACCGCCGTAATGATTAATAGGGACAGTCGGGGGCATCAGTATTCAACTGTCAGAGGT
GAAATTCTTGGATCAGTTGAAGACTAACTACTGCGAAAGCATTTGCCAAGGATGTTTTCA
>derep_5;size=315
AGCTCCAATAGCGTATATTAAAGTTGTTGTGGTTAAAAAGCTCGTAGTTGAACCTTGGGCCTGGCTGGCCGGTCCGCCTC
ACCGCGTGTACTGGTCCGGCCGGTGAAATTCTTGGATTTATTGAAGACTAACTACTGCGAAAGCATTTGCCAAGGATGTT
TTCA
>derep_16;size=146
AGCTCCAATAGCGTATATTAAAGTTGTTGTGGTTAAAAAGCTCGTAGTTGAACCTTGGGCCTGGCTGGCCGGTCCGCCTC
ACCGCGTGCACTGGTCCGGCCGGTGAAATTCTTGGATTTATTGAAGACTAACTACTGCGAAAGCATTTGCCAAGGATGTT
TTCA Running this file with uchime*_denovo variants produces different results. vsearch --uchime_denovo PR2_pull.fsa --quiet --uchimeout -
vsearch --uchime2_denovo PR2_pull.fsa --quiet --uchimeout -
vsearch --uchime3_denovo PR2_pull.fsa --quiet --uchimeout -
vsearch --uchime_denovo PR2_pull.fsa --quiet --uchimeout -
0.0000 derep_2;size=485 * * * * * * * * 0 0 0 0 0 0* N
0.0000 derep_5;size=315 * * * * * * * * 0 0 0 0 0 0* N
0.0239 derep_16;size=146 derep_2;size=485 derep_5;size=315 derep_5;size=315 100.0 98.1 99.4 97.5 99.4 1 0 0 3 0 0 0.6 N
vsearch --uchime2_denovo PR2_pull.fsa --quiet --uchimeout -
0.0000 derep_2;size=485 * * * * * * * * 0 0 0 0 0 0* N
0.0000 derep_5;size=315 * * * * * * * * 0 0 0 0 0 0* N
0.0239 derep_16;size=146 derep_2;size=485 derep_5;size=315 derep_5;size=315 100.0 98.1 99.4 97.5 99.4 1 0 0 3 0 0 0.6 Y
vsearch --uchime3_denovo PR2_pull.fsa --quiet --uchimeout -
0.0000 derep_2;size=485 * * * * * * * * 0 0 0 0 0 0* N
0.0000 derep_5;size=315 * * * * * * * * 0 0 0 0 0 0* N
0.0000 derep_16;size=146 * * * * * * * * 0 0 0 0 0 0* N I've taken a look at the testing framework, but I can't make head or tails of it. @colinvwood, if you are willing to build that tests for this, I would very much appreciate it! |
WIP to close #92
Add internal functions and tests to use 'uchime' 'uchime2' or 'uchime3'
Open questions:
--abskew
for some or all of these?