Skip to content

Commit

Permalink
MAINT: Wraps up transition from observed_otus to observed_features (#482
Browse files Browse the repository at this point in the history
)
  • Loading branch information
ChrisKeefe authored Aug 20, 2020
1 parent f9708f7 commit 0b1ae30
Show file tree
Hide file tree
Showing 5 changed files with 7 additions and 11 deletions.
2 changes: 1 addition & 1 deletion source/citation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ A good methods section
``````````````````````
This methods section was adapted from `Pearson et al. 2019`_ (and shortened in the interest of brevity; see the original publication for a full methods description). Note that each step of analysis is described, including non-default parameter settings, the plugins performing each operation are mentioned, and individual plugins, underlying software, and methods/metrics are cited as appropriate. This paragraph describes most of the steps that are performed in a basic QIIME 2 analysis (e.g., following the :doc:`moving pictures tutorial <tutorials/moving-pictures>`), plus some additional steps; it may be used as a template methods section for similar workflows.

Microbiome bioinformatics were performed with QIIME 2 2017.4 (Bolyen et al. 2019). Raw sequence data were demultiplexed and quality filtered using the q2‐demux plugin followed by denoising with DADA2 (Callahan et al. 2016) (via q2‐dada2). All amplicon sequence variants (ASVs) were aligned with mafft (Katoh et al. 2002) (via q2‐alignment) and used to construct a phylogeny with fasttree2 (Price et al. 2010) (via q2‐phylogeny). Alpha‐diversity metrics (observed OTUs and Faith's Phylogenetic Diversity (Faith 1992)), beta diversity metrics (weighted UniFrac (Lozupone et al. 2007), unweighted UniFrac (Lozupone et al. 2005), Jaccard distance, and Bray‐Curtis dissimilarity), and Principle Coordinate Analysis (PCoA) were estimated using q2‐diversity after samples were rarefied (subsampled without replacement) to 900 sequences per sample. Taxonomy was assigned to ASVs using the q2‐feature‐classifier (Bokulich et al. 2018a) classify‐sklearn naïve Bayes taxonomy classifier against the Greengenes 13_8 99% OTUs reference sequences (McDonald et al. 2012). We computed the change in direction and magnitude in the first principal co-ordinate axis (PC1) for each subject between their pretreatment and posttreatment samples using q2‐longitudinal (Bokulich et al. 2018b). The average change in PC1 for each treatment group, overall and stratified by sex, was tested for difference from zero using a one‐sample t test with Benjamini‐Hochberg false discovery rate (FDR) correction (Benjamini and Hochberg 1995).
Microbiome bioinformatics were performed with QIIME 2 2017.4 (Bolyen et al. 2019). Raw sequence data were demultiplexed and quality filtered using the q2‐demux plugin followed by denoising with DADA2 (Callahan et al. 2016) (via q2‐dada2). All amplicon sequence variants (ASVs) were aligned with mafft (Katoh et al. 2002) (via q2‐alignment) and used to construct a phylogeny with fasttree2 (Price et al. 2010) (via q2‐phylogeny). Alpha‐diversity metrics (observed features and Faith's Phylogenetic Diversity (Faith 1992)), beta diversity metrics (weighted UniFrac (Lozupone et al. 2007), unweighted UniFrac (Lozupone et al. 2005), Jaccard distance, and Bray‐Curtis dissimilarity), and Principle Coordinate Analysis (PCoA) were estimated using q2‐diversity after samples were rarefied (subsampled without replacement) to 900 sequences per sample. Taxonomy was assigned to ASVs using the q2‐feature‐classifier (Bokulich et al. 2018a) classify‐sklearn naïve Bayes taxonomy classifier against the Greengenes 13_8 99% OTUs reference sequences (McDonald et al. 2012). We computed the change in direction and magnitude in the first principal co-ordinate axis (PC1) for each subject between their pretreatment and posttreatment samples using q2‐longitudinal (Bokulich et al. 2018b). The average change in PC1 for each treatment group, overall and stratified by sex, was tested for difference from zero using a one‐sample t test with Benjamini‐Hochberg false discovery rate (FDR) correction (Benjamini and Hochberg 1995).


* Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995;57:289‐300.
Expand Down
6 changes: 2 additions & 4 deletions source/tutorials/moving-pictures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -255,8 +255,6 @@ QIIME 2's diversity analyses are available through the ``q2-diversity`` plugin,
* unweighted UniFrac distance (a qualitative measure of community dissimilarity that incorporates phylogenetic relationships between the features)
* weighted UniFrac distance (a quantitative measure of community dissimilarity that incorporates phylogenetic relationships between the features)

.. note:: 🏗👷 Some descriptions are changing in QIIME 2's ``diversity`` tools. The phrase **"observed otus" is being replaced with "observed features"**, because "features" better describes the different ways in which users work with non-taxonomic features. This will affect both documentation and (in places) the names of command arguments. You will see both phrases, but these measures of diversity are identical `under the hood <http://scikit-bio.org/docs/latest/generated/skbio.diversity.alpha.observed_otus.html#skbio.diversity.alpha.observed_otus>`_.

An important parameter that needs to be provided to this script is ``--p-sampling-depth``, which is the even sampling (i.e. rarefaction) depth. Because most diversity metrics are sensitive to different sampling depths across different samples, this script will randomly subsample the counts from each sample to the value provided for this parameter. For example, if you provide ``--p-sampling-depth 500``, this step will subsample the counts in each sample without replacement so that each sample in the resulting table has a total count of 500. If the total count for any sample(s) are smaller than this value, those samples will be dropped from the diversity analysis. Choosing this value is tricky. We recommend making your choice by reviewing the information presented in the ``table.qzv`` file that was created above. Choose a value that is as high as possible (so you retain more sequences per sample) while excluding as few samples as possible.

.. question::
Expand Down Expand Up @@ -368,10 +366,10 @@ The bottom plot in this visualization is important when grouping samples by meta
The value that you provide for ``--p-max-depth`` should be determined by reviewing the "Frequency per sample" information presented in the ``table.qzv`` file that was created above. In general, choosing a value that is somewhere around the median frequency seems to work well, but you may want to increase that value if the lines in the resulting rarefaction plot don't appear to be leveling out, or decrease that value if you seem to be losing many of your samples due to low total frequencies closer to the minimum sampling depth than the maximum sampling depth.

.. question::
When grouping samples by "body-site" and viewing the alpha rarefaction plot for the "observed_otus" metric, which body sites (if any) appear to exhibit sufficient diversity coverage (i.e., their rarefaction curves level off)? How many sequence variants appear to be present in those body sites?
When grouping samples by "body-site" and viewing the alpha rarefaction plot for the "observed_features" metric, which body sites (if any) appear to exhibit sufficient diversity coverage (i.e., their rarefaction curves level off)? How many sequence variants appear to be present in those body sites?

.. question::
When grouping samples by "body-site" and viewing the alpha rarefaction plot for the "observed_otus" metric, the line for the "right palm" samples appears to level out at about 40, but then jumps to about 140. What do you think is happening here? (Hint: be sure to look at both the top and bottom plots.)
When grouping samples by "body-site" and viewing the alpha rarefaction plot for the "observed_features" metric, the line for the "right palm" samples appears to level out at about 40, but then jumps to about 140. What do you think is happening here? (Hint: be sure to look at both the top and bottom plots.)


.. _`moving pics taxonomy`:
Expand Down
4 changes: 1 addition & 3 deletions source/tutorials/pd-mice.rst
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ By default, 10 rarefied tables are calculated at each sampling depth to provide
--p-min-depth 10 \
--p-max-depth 4250

The visualization file will display two plots. The upper plot will display the alpha diversity (observed OTUs or shannon) as a function of the sampling depth. This is used to determine whether the richness or evenness has saturated based on the sampling depth. The rarefaction curve should “level out” as you approach the maximum sampling depth. Failure to do so, especially with a diversity-only metric such as observed OTUs or Faith's PD diversity, may indicate that the richness in the samples has not been fully saturated.
The visualization file will display two plots. The upper plot will display the alpha diversity (observed features or shannon) as a function of the sampling depth. This is used to determine whether the richness or evenness has saturated based on the sampling depth. The rarefaction curve should “level out” as you approach the maximum sampling depth. Failure to do so, especially with a diversity-only metric such as observed features or Faith's PD diversity, may indicate that the richness in the samples has not been fully saturated.

The second plot shows the number of samples in each metadata category group at each sampling depth. This is useful to determine the sampling depth where samples are lost, and whether this may be biased by metadata column group values. Remember that rarefaction is a two-step process and samples that do not meet the rarefaction depth are filtered out of the table. We can use the curves to look at the number of samples by different metadata columns.

Expand Down Expand Up @@ -365,8 +365,6 @@ We'll start by using the ``qiime diversity core-metrics-phylogenetic`` method, w
- Unweighted UniFrac distance
- Weighted UniFrac distance

.. note:: 🏗👷 Some descriptions are changing in QIIME 2's ``diversity`` tools. The phrase **"observed otus" is being replaced with "observed features"**, because "features" better describes the different ways in which users work with non-taxonomic features. This will affect both documentation and (in places) the names of command arguments. You will see both phrases, but these measures of diversity are identical `under the hood <http://scikit-bio.org/docs/latest/generated/skbio.diversity.alpha.observed_otus.html#skbio.diversity.alpha.observed_otus>`_.

There is a very good discussion of diversity metrics and their meanings in a `forum post by Stephanie Orchanian`_.

The ``qiime diversity core-metrics-phylogenetic`` method wraps several other methods, and it's worthwhile to note that the steps can also be executed independently.
Expand Down
4 changes: 2 additions & 2 deletions source/tutorials/phylogeny.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Inferring phylogenies
=====================
Several downstream diversity metrics, available within QIIME 2, require that a
phylogenetic tree be constructed using the Operational Taxonomic Units
(`OTUs`_) or Exact Sequence Variants (`ESVs`_) being investigated.
(`OTUs`_) or Amplicon Sequence Variants (`ASVs`_) being investigated.

*But how do we proceed to construct a phylogeny from our sequence data?*

Expand Down Expand Up @@ -563,7 +563,7 @@ This can all be accomplished by simply running the following:
**Congratulations! You now know how to construct a phylogeny in QIIME 2!**

.. _OTUs: https://en.wikipedia.org/wiki/Operational_taxonomic_unit
.. _ESVs: https://doi.org/10.1038/ismej.2019.119
.. _ASVs: https://doi.org/10.1128%2FmSystems.00191-16
.. _fragment insertion: https://doi.org/10.1128/mSystems.00021-18
.. _fragment insertion examples: https://library.qiime2.org/plugins/q2-fragment-insertion/16/
.. _phylogeny: https://simple.wikipedia.org/wiki/Phylogeny
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ The :ref:`overview tutorial <Denoising>` provides more in-depth discussion of th
Regardless of how you group your sequences, the grouping methods will output:

1. A list of representative sequences for each of your OTUs and/or ASVs (QIIME 2 data format ``FeatureData[Sequence]``), and
2. A feature table which indicates how many reads of each OTU/sequence variants were observed in each sample. (QIIME 2 data format ``FeatureTable[Frequency]``)
2. A feature table which indicates how many reads of each OTU/sequence variant were observed in each sample. (QIIME 2 data format ``FeatureTable[Frequency]``)

DADA2 and deblur will also produce a stats summary file with useful information regarding the filtering and denoising.

Expand Down

0 comments on commit 0b1ae30

Please sign in to comment.