Finding differentially abundant features between metagenomic and metatranscriptomic datasets #80

johnne · 2025-02-01T06:31:54Z

Thanks for a great package and several useful papers on compositional data analysis!

I'm working with a time-series of samples for which metagenomic (mg) and metatranscriptomic (mt) data has been generated using Illumina sequencing. Reads from all samples have been mapped to the same features (a collection of genes predicted in 806 Metagenome Assembled Genomes (MAGs)). There are a total of 84 samples taken in the same location at different dates over three years and for 19 sampling dates there's both mg and mt data.

In one part of the project I'm trying to identify MAGs that have a significantly different proportion in one dataset compared to the other, that is to find MAGs whose transcriptional activity is significantly higher/lower. I read the Quinn et al field guide paper and found the section on vertical data integration highly interesting as it sounds like what I'm trying to do

use ALDEx2 to find features where mRNA abundance changes more than protein abundance, relative to a common reference (and vice versa).

What I've done so far is to use the ALDEx2 package with this set up:

subset both datasets to the 19 dates with paired omics data
sum raw counts for each MAG in each sample
create a condition vector with the omics type ('mg' and 'mt')
run a modular ALDEx2 analysis on the 806 x 38 count matrix, including clr transformation, calculation of paired t-test statistics and effect size estimation

Using this approach I've managed to identify a number of MAGs that appear to be differentially abundant between datasets (see attached figure), using criteria for log2 fold change, q-values and 95% confidence intervals.

My question is: Is this a valid method or are there modifications that should be made here? Specifically I'm wondering if I should perform the CLR transformation differently as now the transformed values are wrt the full dataset (mg + mt). Should I instead attempt to CLR transform each dataset separately before running ALDEx2?

I understand that answering questions related to specific projects is asking a lot, but any insights or suggestions that you feel you have the time to offer would be greatly appreciated.

All the best,
John

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finding differentially abundant features between metagenomic and metatranscriptomic datasets #80

Finding differentially abundant features between metagenomic and metatranscriptomic datasets #80

johnne commented Feb 1, 2025

Finding differentially abundant features between metagenomic and metatranscriptomic datasets #80

Finding differentially abundant features between metagenomic and metatranscriptomic datasets #80

Comments

johnne commented Feb 1, 2025