betta() and betta_random() p-value between each combination of discrete variable #198

AlexaBennett · 2024-11-13T17:43:40Z

I am hoping you can point me in the right direction. I was following the diversity-hypothesis-testing vingette, and wanted to know the mathematically sound way for one to determine the statistical significance between all groups. In the case of the tutorial, this would be the diversity at day 0 vs d 12, day 0 vs day 82, and day 12 vs 82.

adw96 · 2024-11-13T18:22:40Z

Thanks @AlexaBennett , great question. Before I dive in to checking for you, do you have any other adjustment variables (in addition to day)? That would help give you a better answer. Thanks!!

AlexaBennett · 2024-11-13T22:59:35Z

Right now, I am using a single adjustment variable. I plan to add another discrete variable.

Regarding the single variable, here is what I have tried.

The first attempt was to subset the data and remove the previous intercept. I then used the p.adjust() function from the stats package to correct the p-values. This strategy was to avoid re-analyzing the relationship between the two groups. However, I noticed that this can influence the estimates, errors, and p-values of the remaining groups. In my data, this could change the result's significance.
Do not subset, but add factor levels to force one of the variables into the intercept. This produces more stable estimates, which, in my data, begin to vary in the 1e-06 position. It makes relatively stable errors for groups that are not the intercept. However, the intercept's error tends to either increase or decrease. The p-values between two groups (no data removed, only factor levels applied to the discrete variable) could have ~~marked~~ differences. For example, the p-value of comparison between A-B, the intercept as A, was 0.0000002 while B-A, the intercept as B, was 0.3866848. Striking this out since I noticed one of my groups lost too many members for statistical comparison. For example, the p-value of comparison between A-B, the intercept as A, was 0 while B-A, the intercept as B, was 0.0001250.

AlexaBennett · 2024-11-14T17:03:33Z

Given the currently available functions and the stability of results for each intercept, this is my planned approach:

Calculate the estimates with each group as the intercept
Plot the estimates and errors using the median value obtained for each group from all iterations of the above
Take the max p-value obtained for each A-B comparison
Apply p-value correction

I am open to a more statistically sound methodology as my background is in Biology and not Statistics.

adw96 · 2024-11-14T17:51:59Z

These are great questions and thanks for circling back, @AlexaBennett . Fortunately there is an easier way than iterating through various model parameterizations. As you suggest, let's suppose we want to test that the average richness is equal for all days, i.e.,
average richness (Day 0) = average richness (Day 12) = average richness (Day 82).

In this case, we can look at the output of global as follows:

library(tidyverse)
library(breakaway)
library(corncob)
library(phyloseq)
data("soil_phylo")
soil_phylo %>% sample_data %>% head
subset_soil <- soil_phylo %>%
  subset_samples(Amdmt == 1) # only biochar
richness_soil <- subset_soil %>% breakaway
meta <- subset_soil %>%
  sample_data %>%
  as_tibble %>%
  mutate("sample_names" = subset_soil %>% sample_names )
combined_richness <- meta %>%
  left_join(summary(richness_soil),
            by = "sample_names")
bt_day_fixed <- betta(formula = estimate ~ Day, 
                      ses = error, data = combined_richness)
bt_day_fixed$table
bt_day_fixed$global[2] ## p-value for strong null

and in this case we would reject the null (p < 0.001).

This approach will test your desired hypothesis as long as you have no other variables in the model. It will take a bit of work on the backend for us to build that for you -- but it's work worth doing if you need it! Before we do that, can you confirm that you want to test, "average richness is equal across all days with the same amendment" using something like the following

subset_soil2 <- soil_phylo %>%
  subset_samples(Amdmt %in% c(0, 1)) # biochar and unmodified
richness_soil2 <- subset_soil2 %>% breakaway
meta2 <- subset_soil2 %>%
  sample_data %>%
  as_tibble %>%
  mutate("sample_names" = subset_soil2 %>% sample_names )
combined_richness2 <- meta2 %>%
  left_join(summary(richness_soil2),
            by = "sample_names")
bt_day_fixed2 <- betta(formula = estimate ~ Day + Amdmt, 
                      ses = error, data = combined_richness2)

If so, I'll see what we can do.

AlexaBennett · 2024-11-14T18:47:41Z

@adw96 This is what I am looking for, with the added ability to see which days are statistically different from one another within each amendment.

While not the original point of this issue, the instability of the standard error of the estimate for the intercept is a higher priority for me at this time. Should I open another issue?

With your example, I was setting factor levels for "Day" to get the final p-value for the relation of day 12 to day 84. That was when I realized, in my real-life data, that the feature used for the intercept in the $table was susceptible to the error being decreased by up to 75% in some cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

betta() and betta_random() p-value between each combination of discrete variable #198

betta() and betta_random() p-value between each combination of discrete variable #198

AlexaBennett commented Nov 13, 2024

adw96 commented Nov 13, 2024

AlexaBennett commented Nov 13, 2024 •

edited

Loading

AlexaBennett commented Nov 14, 2024

adw96 commented Nov 14, 2024

AlexaBennett commented Nov 14, 2024

betta() and betta_random() p-value between each combination of discrete variable #198

betta() and betta_random() p-value between each combination of discrete variable #198

Comments

AlexaBennett commented Nov 13, 2024

adw96 commented Nov 13, 2024

AlexaBennett commented Nov 13, 2024 • edited Loading

AlexaBennett commented Nov 14, 2024

adw96 commented Nov 14, 2024

AlexaBennett commented Nov 14, 2024

AlexaBennett commented Nov 13, 2024 •

edited

Loading