-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathStatistical-mistakes.R
608 lines (458 loc) · 27 KB
/
Statistical-mistakes.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
# Common Mistakes in Statistics
#~ by Dr. Wolfgang Forstmeier
#~ 2022 09 27
# General Introduction ----
#~ Statistical testing is used to distinguish "significant" patterns from
#~ random noise. P-values are supposed to tell us how often a pattern of a
#~ certain strength or magnitude (i.e. effect size for a given N) arises by
#~ chance alone. However, in practice, there are many mistakes that one can make
#~ during data analysis, and many of them will result in anti-conservative
#~ p-values. Hence, we often believe that we have discovered a real effect,
#~ but in fact this was just a matter of chance (i.e. we got a false-positive
#~ finding). According to my experience, anti-conservative p-values (or likewise
#~ confidence intervals that are too narrow) are a major reason why many
#~ research findings cannot be replicated between studies. Accordingly, meta
#~ analyses show high levels of heterogeneity in effect sizes between studies.
#~ Here we will work through the most common and most serious mistakes. If you
#~ learn to avoid these mistakes, you will less often be fooled by the data, and
#~ will less often waste your time on follow-up studies of phenomena that do not
#~ exist. If you also learn to recognize these mistakes in published studies,
#~ you will also less often get misled by the mistakes of others. More details
#~ can be found here (Forstmeier et al. 2016).
# 0. Set-up -----
#~ Before we can start, we need to install and load some R packages
#~ If the packages are not installed yet, use the 'install.packages'-line by
#~ removing the # in front of it, and then run the line (Run button in RStudio)
# install.packages(c("lme4", "lmerTest", "ggplot2","pedigreemm", "data.table", "magrittr" ))
library(lme4)
library(lmerTest)
library(ggplot2)
library(pedigreemm)
library(data.table)
library(magrittr)
# 1. The issue of multiple testing -----
#~ Introduction to multiple testing ----
#~ Multiple hypothesis testing (and failing to realize that one did numerous
#~ tests before finding something significant) is a major source of false-
#~ positive findings. If you conduct 20 tests, one of them should be significant
#~ at p<0.05 anyway. Hence, it is essential to keep track of how many tests were
#~ conducted, which includes many variants of "giving the data a second chance",
#~ such as adding more data, adding covariates, categorizing variables,
#~ transforming variables, removing outliers, removing a treatment category etc.
#~ Such "playing with data until reaching significance" is known as p-hacking,
#~ and this is so well-known (and frowned upon) that we do not need to cover
#~ this here.
#~ Yet what is sometimes overlooked, is what we called "cryptic multiple
#~ hypothesis testing" during model selection (Forstmeier & Schielzeth 2011).
#~ Below, you will experience a drastic example, which works so well because the
#~ initial full model is massively over-fitted (i.e. too many predictors for
#~ just a few data points). Such over-fitting can be considered a statistical
#~ crime. Yet, it may be "the perfect crime" that leaves no traces, if you do not
#~ honestly report how you arrived at your minimal model. Often people have just
#~ presented the minimal model and have claimed that these results confirm their
#~ initial hypotheses.
#~ Simulations of model selection ----
#~ Let us assume we have 30 observations and 6 predictors
N_OBS = 30
N_MAIN_EFFECTS = 6
# First, we set a seed to make randomly generated data reproducible
set.seed(7)
#~ Now we randomly generate a data.table with N_OBS rows and N_MAIN_EFFECTS + 1
#~ columns. All values are from a normal distribution.
mydata = matrix( data = rnorm( (N_OBS *(N_MAIN_EFFECTS + 1)) ),
nrow = N_OBS, ncol = (N_MAIN_EFFECTS + 1) ) %>% data.table
#~ The first column is the dependent variable Y, the remaining columns are the
#~ predictors A, B, C...
colnames (mydata) = c("Y", LETTERS[1:N_MAIN_EFFECTS])
mydata
#~ Now we fit a full model that tries to explain Y by the 6 main effects (A-F)
#~ and by the 15 (two-way) interactions.
#~ We then automatically simplify this "full model" by always removing the least
#~ significant term until we get a "minimal model"
source("Fun_Model_Simplifier.R")
Model_Simplifier()
#~ The high significance of the minimal model (p=0.005551) and of its terms
#~ (up to p=0.000613) is remarkable.
#~ Let's try this again with different seeds
Model_Simplifier(N_OBS = 30, N_MAIN_EFFECTS = 6, MYSEED = 8)
Model_Simplifier(N_OBS = 30, N_MAIN_EFFECTS = 6, MYSEED = 9)
Model_Simplifier(N_OBS = 30, N_MAIN_EFFECTS = 6, MYSEED = 10)
Model_Simplifier(N_OBS = 30, N_MAIN_EFFECTS = 6, MYSEED = 11)
Model_Simplifier(N_OBS = 30, N_MAIN_EFFECTS = 6, MYSEED = 12)
#~ Lesson 1 ------
#~ Automatic model simplification comes with a considerable burden of multiple
#~ testing, especially if the initial full model was over-fitted (N<3k,
#~ k = number of parameters). Minimal models often look convincing, but the
#~ problematic history of getting there is often forgotten.
#~ Exploratory testing of many interaction terms is generally discouraged.
#~ Realistically, one can argue that interaction require 16x more data than main
#~ effects (https://statmodeling.stat.columbia.edu/2018/03/15/need-16-times-sample-size-estimate-interaction-estimate-main-effect/).
#~ If N is small (e.g. N<30), it is better to test each predictor singly.
# 2. Pseudoreplication -----
#~ Introduction to pseudoreplication ----
#~ In my opinion, pseudoreplication is the most underestimated problem in the
#~ reproducibility crisis. P-values are calculated under the assumption that
#~ the data points are independent of each other (unless you specify existing
#~ dependencies in your model). Yet, in reality there are many sources of non-
#~ independence: repeated measures from the same individual, individuals that
#~ are genetically related, samples that are geographically structured,
#~ observations that are related to each other in time. Failing to account for
#~ such dependencies may mean to overestimate the true sample size and hence
#~ confidence in the findings.
# 2A. Getting the basic idea -----
#~ Let's create two samples (groups A and B) of 10 human height measurements
#~ that are drawn from the same (!) normal distribution of mean 165 and sd 10
set.seed(20220721)
sample1 <- data.frame(
Group = rep("A", 10),
IndID = 1:10,
Height = rnorm(n = 10, mean = 165, sd = 10))
sample1
sample2 <- data.frame(
Group = rep("B", 10),
IndID = 11:20,
Height = rnorm(n = 10, mean = 165, sd = 10))
sample2
dataset <- rbind(sample1, sample2)
dataset
boxplot(dataset$Height ~ dataset$Group)
#~ Due to sampling noise, the two groups A and B will never be identical.
#~ Let's test whether the average height differs significantly between A and B
model1 <- lm(Height ~ Group, data = dataset)
summary(model1)
#~ In the model output, the intercept represents Group A (on average 168.13 cm)
#~ and group B differs from that by -4.565 cm (being shorter).
#~ Yet, the difference is clearly non-significant (p=0.357).
#~ Now, the researchers decide that there is too much noise in the data, that
#~ prevents them from seeing the truth that B is in fact shorter.
#~ To increase precision, the researchers hence measure each person 10 times.
#~ For now, let's assume that the 10 repeated measures turn out exactly the same.
#~ So we essentially copy the same data 10 times into a large table that
#~ contains all 20 x 10 = 200 measurements.
dataset10 <- rbind(dataset, dataset, dataset, dataset, dataset, dataset, dataset, dataset, dataset, dataset)
### proper way to write this: data10 <- do.call("rbind", replicate(10, dataset, simplify = FALSE))
#~ Now we test again whether the average height is different between the groups,
#~ with each individual measured 10 times (identical values!).
model2 <- lm(Height ~ Group, data = dataset10)
summary(model2)
#~ Note that the parameter estimates are still the same (168.13 and -4.565)
#~ but the Standard Errors have become much smaller (higher confidence)
#~ and hence the same difference of 4.565 cm is judged as highly significant
#~ (p=0.002). This is because the model assumes that there are N=100 independent
#~ samples in each group, and with such high N, a difference by 4.565 cm would
#~ not easily arise by chance.
#~ Lesson 2A -----
#~ Note the meaning of the term "pseudoreplication", in contrast to having truly
#~ independent samples only.
#~ The assumption of independence of data points is being violated.
#~ With pseudoreplication, p-values become simply incorrect.
#~ SEs (and CIs) are too small, confidence in the difference is too high.
# 2B. A realistic example with measurement error (replicates are not identical) -----
#~ This time we add a little bit of measurement error to each measurement
#~ (with a mean of 0 and SD of 0.5 cm)
dataset10$SmallNoise <- rnorm(200, 0, 0.5)
dataset10$HeightWithSmallNoise <- dataset10$Height + dataset10$SmallNoise
View(dataset10)
#~ Again we test whether the average height differs between the groups,
#~ where each individual was measured 10 times with some error
model3 <- lm(HeightWithSmallNoise ~ Group, data = dataset10)
summary(model3)
#~ Note that the estimates have changed slightly (compared to model 2) due to
#~ the added noise, but the conclusions are the same (p=0.00199)
#~ Now, in such realistic data sets, we can start specifying how the data points
#~ are non-independent. We specify individual identity (IndID) as a random
#~ effect in a mixed effect model, thereby allowing each individual to have its
#~ own average phenotype ("random intercepts")
model4 <- lmer(HeightWithSmallNoise ~ Group + (1|IndID), data = dataset10)
summary(model4)
#~ Note that this gets us back to the correct p-value (p=0.357)
#~ very close to the one we had before adding replicates.
#~ Note from the output on random effects that nearly all Variance is explained
#~ by IndID (116.7704) compared to the Residual (0.2427) which is the
#~ measurement error that we added to each value of height.
#~ Lesson 2B -----
#~ Fitting individual ID as a random intercept takes care of the
#~ repeated measurements per individual and restores the correct p-value
# 2C. A real example: Is egg mass affected by treatment? -----
#~ This is about fitting random intercepts vs. random slopes. -----
#~ For this we load and inspect some data on egg mass in relation to treatment
d45 <- read.table("dataFig45.txt", sep='\t', header=T)
#~ Female ID and Trt are numeric values that should be considered as factors
d45$Female_ID <-as.factor(d45$Female_ID)
d45$Trt <- as.factor(d45$Trt)
ggplot(data=d45, aes(x=Female_ID, y=Egg_mass, group=Trt, colour=Trt)) +
geom_point(size=2)+
labs(x="Female ID", y="Egg mass")+
theme_classic() +
theme(panel.border = element_blank(),
axis.line.x = element_line(size = 0.5, linetype = "solid", colour = "black"),
axis.line.y = element_line(size = 0.5, linetype = "solid", colour = "black")) +
theme(axis.text = element_text (size=10), axis.title = element_text (face="bold",size=13)) +
scale_colour_manual(values = c("1" = "blue","2" = "orange"),labels=c("Reduced", "Enhanced"), name="Treatment")
View(d45)
#~ The plot illustrates that we have 5 eggs per female and two times 6 females
#~ belonging to the two treatment groups.
#~ We want to test whether the treatment had an effect on egg mass.
#~ If we forget that the data is pseudoreplicated, and that we do not have 30
#~ independent eggs of each type, but rather 6 females of each type, the
#~ following incorrect model shows a significant treatment effect
mod_egg1 <- lm(Egg_mass ~ Trt, data = d45)
summary(mod_egg1)
#~ When we specify Female_ID as a random intercept effect, the significance is
#~ gone.
mod_egg2 <- lmer(Egg_mass ~ Trt + (1|Female_ID), data = d45)
summary(mod_egg2)
#~ Note the random effects structure of the model. About half of the variance
#~ is explained by female identity
#~ Maybe we failed to see a treatment effect on egg mass because we ignored
#~ information about the order in which the 5 eggs of each female were laid.
#~ So let's plot egg size in relation to laying order.
ggplot(data=d45, aes(x = Laying_order, y = Egg_mass, group = Trt, colour = Trt)) +
geom_point(size=2)+
geom_smooth(size = 2, method='lm', se = FALSE) +
xlab("Laying order")+ ylab("Egg mass")+
theme_classic() +
theme(panel.border = element_blank(),
axis.line.x = element_line(size = 0.5, linetype = "solid", colour = "black"),
axis.line.y = element_line(size = 0.5, linetype = "solid", colour = "black")) +
theme(axis.title = element_text (face="bold",size=13)) +
theme(axis.text = element_text (size=10)) +
scale_colour_manual(values = c("1" = "blue","2" = "orange"),labels=c("Reduced", "Enhanced"), name="Treatment")
#~ The plot suggests that egg mass was fairly constant over the laying sequence
#~ in the "Reduced" treatment group, while egg mass increased over the sequence
#~ in the "Enhanced" group. Let's test whether the interaction between treatment
#~ and laying sequence is significant!
mod_egg3 <- lmer(Egg_mass ~ Trt*Laying_order + (1|Female_ID), data = d45)
summary(mod_egg3)
#~ The model suggests that the slopes are significantly different (p=0.0479).
#~ Note the random effects structure: we have accounted for the fact that
#~ females have different intercepts (i.e. lay eggs of different mean size), but
#~ is this sufficient, if our interest lies in a slope of change of egg mass
#~ over the laying sequence?
#~ Let's plot what each of the 12 females did in this respect!
ggplot(data=d45, aes(x = Laying_order, y = Egg_mass, group = Trt, colour = Trt)) +
geom_point(size=2)+
geom_smooth(aes(group = Female_ID), method = "lm", se = FALSE) +
geom_smooth(size = 3, method='lm', se = FALSE) +
xlab("Laying order")+ ylab("Egg mass")+
theme_classic() +
theme(panel.border = element_blank(),
axis.line.x = element_line(size = 0.5, linetype = "solid", colour = "black"),
axis.line.y = element_line(size = 0.5, linetype = "solid", colour = "black")) +
theme(axis.title = element_text (face="bold",size=13)) +
theme(axis.text = element_text (size=10)) +
scale_colour_manual(values = c("1" = "blue","2" = "orange"),labels=c("Reduced", "Enhanced"), name="Treatment")
#~ Seeing it this way, you may wonder whether the 6 slopes in orange are in fact
#~ steeper than the 6 slopes in blue.
#~ Hence, let's extract the 12 regression slopes manually.
#~ We create a data set with 1 row per female, with their appropriate treatment
d45_F <- unique(d45[,c('Female_ID', 'Trt')]) # one line per female
d45_F$slope <- NA
d45_F
#~ We then calculate the slope for female 1
mod_F1 <- lm(Egg_mass ~ Laying_order, data = d45[d45$Female_ID == "1",])
summary(mod_F1)
summary(mod_F1)$coeff[2,1]
#~ And we add her value to the data set
d45_F$slope[1] <- summary(mod_F1)$coeff[2,1]
d45_F
#~ Using a loop function, we now do this for each female
for (i in 1:12){
j <- as.character(i)
mod_F <- lm(Egg_mass ~ Laying_order, data = d45[d45$Female_ID == j,])
d45_F$slope[i] <- summary(mod_F)$coeff[2,1]
}
d45_F
#~ Now, let's compare the female slopes between treatment groups
summary(lm(slope ~ Trt, data = d45_F))
#~ This is like doing a t-test of 6 vs 6 slopes, and it is clearly non-
#~ significant (p=0.248) compared to the p=0.0479 we had before.
#~ What was wrong with that random intercept model?
mod_egg3 <- lmer(Egg_mass ~ Trt*Laying_order + (1|Female_ID), data = d45)
summary(mod_egg3)
#~ Note that our interest lies in egg mass slopes over the laying sequence
#~ (whether they increase or decrease), and this is not the same as variation
#~ in intercepts (generally large or small eggs). Given this interest in
#~ whether the treatment affected those slopes (interaction term), we need to
#~ allow each female to have a random slope of change (as this may be a
#~ physiological peculiarity that just varies widely between females and that
#~ is not affected by the treatment).
#~ So let's specify a random slope model!
#~ While (1|Female_ID) gives each female a random intercept,
#~ (Laying_order|Female_ID) gives each female a random intercept and a random
#~ slope over the laying order.
mod_egg4 <- lmer(Egg_mass ~ Trt*Laying_order + (Laying_order|Female_ID), data = d45)
summary(mod_egg4)
#~ This model shows that the above t-test (p=0.248) was correct, and that the
#~ treatment did not affect the female slopes.
#~ Lesson 2C: Sometimes pseudoreplication lies in slopes rather than intercepts
#~ i.e. individuals differ not in their mean value of the dependent variable (y)
#~ but rather in how the dependent variable (y) changes in response to another
#~ variable (X). This variable x may be continuous or just consist of two
#~ conditions (environment 1 vs. 2). A random slope model accounts for the fact
#~ that individuals differ in their response to x and this may be necessary to
#~ consider as soon as you have 3 or more values per individual. More generally,
#~ failing to see the true structure of dependencies yields much spurious
#~ significance. There is a large body of literature reporting significant
#~ treatment by laying order interactions - most likely a huge collection of
#~ false-positive findings.
## 2D. Genetic relatedness -----
#~ Here is an example that shows that controlling for pseudoreplication can be
#~ quite a challenge.
#~ Let's begin by reading and plotting the data
d7 <- read.table("dataFig7.txt", sep='\t', header=T)
head(d7)
tail(d7)
ggplot(d7, aes(x=ESR_10, y=Courtship_rate)) +
geom_jitter(width=0.25, col="dodgerblue4") +
xlab("Number of copies of allele ESR1_10")+ ylab("Residual courtship rate")+
theme_classic() +
theme(panel.border = element_blank(),
axis.line.x = element_line(size = 0.5, linetype = "solid", colour = "black"),
axis.line.y = element_line(size = 0.5, linetype = "solid", colour = "black")) +
theme(axis.title = element_text (face="bold",size=13)) + theme(axis.text = element_text (size=10)) +
geom_smooth(method='lm',color = "red") +
scale_x_continuous(breaks=c(0,1,2), labels=c("0\n\n", "1\nheterozygous", "2\nhomozygous"))
#~ Here we look at an association between genotype (at the locus of the estrogen
#~ receptor) and phenotype (courtship behaviour) of male zebra finches (N=1556).
#~ Males that have one or two copies of a certain allele sing at a higher rate.
#~ Given the large number of males that have been measured, the effect appears
#~ highly significant.
mod_song1 <- glm (Courtship_rate ~ ESR_10, data = d7)
summary(mod_song1)
#~ However, all these males come from the same captive population (comprising 8
#~ generations), and families in which the ESR1_10 allele is more common may
#~ just happen to have (other) genes that lead to a high courtship rate.
#~ Let's see whether adding Family_ID as a random effect makes a difference.
mod_song2 <- lmer (Courtship_rate ~ ESR_10 + (1|Family_ID), data = d7)
summary(mod_song2)
#~ In this model, Family_ID is defined as the joint identities of the parents.
#~ This random effect has 632 levels, and hence it groups together on average
#~ 2-3 brothers in a family, and indeed this explains some variance (0.7041).
#~ Accordingly, the effect is now estimated to be somewhat smaller and with
#~ larger uncertainty (Std. Error), leading to a larger p-value than before.
#~ Still, this does not take into account the resemblance between father and
#~ son, or other forms of relatedness. To get this right, we need to fit the
#~ entire 8-generations pedigree of this population (comprising 3404 birds).
ped <- read.table("data_ped3404_Seewiesen.txt", header=TRUE, sep="\t", na.strings="NA")
tail(ped)
#~ Now we use the pedigreemm package to link the pedigree information to the
#~ data table
ped_mm <- pedigree(sire = as.character(ped$MID), dam = as.character(ped$FID), label = as.character(ped$Animal))
all.data <- merge(d7, ped, by.x="Animal", by.y="Animal")
all.data$Animal <- factor(all.data$Animal)
#~ Now we can run the pedigreemm model
#~ Depending on your computer this might take time (3 min on my laptop)
mod_song3 = pedigreemm(Courtship_rate ~ ESR_10 + (1|Animal), pedigree=list(Animal=ped_mm)
, data=all.data, verbose=TRUE, na.action=na.exclude
, control=lmerControl(check.nobs.vs.nlev="ignore"
, check.nobs.vs.nRE="ignore", check.nobs.vs.rankZ = "ignore"))
summary(mod_song3)
#~ You can see that the estimated effect (per copy of ESR1_10) has decreased
#~ further (0.25852, previously 0.32662, initially 0.39830), suggesting that
#~ more of the variance is attributed to chance differences between families,
#~ and less variance to the ESR1 gene. The standard Error did not increase
#~ further since the previous model, but the t-value has decreased (2.896,
#~ previously 3.64, initially 4.703), and now corresponds to a p-value of
2*(1-pnorm(2.896))
#~ about 0.004, which is far less compelling than initially.
#~ Note the useful 'pnorm'-function that allows you to calculate p-values from
#~ t-values (assuming infinite degrees of freedom, which is somewhat anti-
#~ conservative). If your N is small, you should instead use something like this
2*(pt(abs(2.896),lower.tail=FALSE,df=20))
#~ A cool side effect of pedigreemm is that you can estimate the heritability
#~ of the trait (courtship rate).
varcorsmod = c(VarCorr(mod_song3)$Animal, attr(VarCorr(mod_song3), "sc")^2)
h2.mod = varcorsmod/sum(varcorsmod)
h2.mod
#~ Hence, we estimate heritability from how much of the total variance can be
#~ explained by relatedness in the pedigree. Here, this is 36.4%, the remaining
#~ 63.6% is residual variance.
#~ Lesson 2D: Non-independence of data points may sometimes be hard to account
#~ for completely. Besides relatedness of individuals (causing non-independence
#~ in heritable traits), there is often spatial or temporal autocorrelation in
#~ the data. All these dependencies can be modeled, but this is challenging and
#~ rarely done perfectly. Note that, in the wild, we normally would not have
#~ such pedigree information. Hence p-values often remain anti-conservative, and
#~ this explains in part the difficulties in replicating findings
# 3. Dealing with non-Gaussian data -----
#~ Finally, one should know about the risks of modelling count data.
#~ Good news first: if your data is not normally distributed you can still
#~ continue as usual (assuming normally distributed, "Gaussian", errors) and be
#~ pretty safe (see Knief & Forstmeier 2021).
#~ However, if you instead try to model count data with a Poisson error
#~ distribution you should be aware that this is very risky.
#~ Again we start by reading and plotting some data
d8 <- read.table("dataFig8.txt", sep='\t', header=T)
head(d8)
ggplot(data=d8, aes(x=Exploration_score, y=Latency_min)) +
geom_point(size=2, col="dodgerblue4")+ ylim(0,600)+ scale_x_continuous(breaks=seq(-1.5,1.5, 0.5))+
geom_smooth(method='lm', color='red')+
xlab("Exploration score")+ ylab("Latency to return (min)")+
theme_classic() +
theme(panel.border = element_blank(),
axis.line.x = element_line(size = 0.5, linetype = "solid", colour = "black"),
axis.line.y = element_line(size = 0.5, linetype = "solid", colour = "black")) +
theme(axis.title = element_text (face="bold",size=13)) +
theme(axis.text = element_text (size=10))
#~ In this example a researcher measured the latency of birds to return to their
#~ nest after a disturbance, in relation to some measure of their exploratory
#~ behaviour (i.e. a measure of how they behave in some standardized test where
#~ they explore a new environment). Although the regression line looks very
#~ flat, the relationship was said to be highly significant. A look into the
#~ Methods revealed that the latency was modeled as a count of minutes, assuming
#~ a Poisson error distribution.
mod_latency1 <- glm (Latency_min ~ Exploration_score, data = d8, family = "poisson")
summary(mod_latency1)
#~ Indeed, the negative slope of the regression line appears to be 10.83 SEs
#~ away from zero (z = -10.83 = -0.16112 / 0.01488)!
#~ This is against all intuition about how significant effects look like.
#~ But wait, I can get this even more significant!
#~ Let's model this as a count of seconds by multiplying the minutes with 60
d8$Latency_seconds <- d8$Latency_min * 60
head(d8)
tail(d8)
mod_latency1s <- glm (Latency_seconds ~ Exploration_score, data = d8, family = "poisson")
summary(mod_latency1s)
#~ Now we are even 83.86 SEs away from zero! What a great tool for producing
#~ significant results!
#~ But let's try to be realistic. We log-transform the dependent variable to
#~ approach a normal distribution and then use a model with Gaussian errors.
hist(log(d8$Latency_min))
mod_latency2 <- glm (log(Latency_min) ~ Exploration_score, data = d8, family = "gaussian")
summary(mod_latency2)
#~ The p-value of 0.257 fits much better to the shallow regression line.
#~ The problem with count data is that these usually do not meet the criteria of
#~ a Poisson process. Instead, counts are typically over-dispersed (i.e. with
#~ more extreme values than expected). Under a Poisson distribution, with a mean
#~ of about 100 min, the data should range from about 75 to 130 min. Cases like
#~ 559 min (top left of the plot) should not occur, and hence are given an
#~ extreme weight (in terms of evidence), and this gets worse if we take the
#~ even larger counts in seconds.
#~ To account for the over-dispersion in the count data, there are two ways.
#~ If you have no random effects (hence a glm) you can just write "quasipoisson"
#~ instead of "poisson".
mod_latency3 <- glm (Latency_min ~ Exploration_score, data = d8, family = "quasipoisson")
summary(mod_latency3)
#~ This gives essentially the same p-value (p=0.292) as the Gaussian model above
#~ If you have random effects (hence using glmer) you can just add another
#~ random effect that has as many level as you have rows in your data sheet.
#~ This is called an "observation-level random effect" (OLRE).
#~ So we add such a column to our data sheet.
d8$ObsvID <- 1:nrow(d8)
head(d8)
tail(d8)
#~ And then we fit this column (ObsvID) as a random effect in our poisson model
mod_latency4 <- glmer (Latency_min ~ Exploration_score + (1|ObsvID), data = d8, family = "poisson")
summary(mod_latency4)
#~ Again, we get a reasonable p-value (p=0.232)
#~ Lesson 3: The use of non-Gaussian models (especially Poisson models) should
#~ always ring an alarm bell! Has the author accounted for over-dispersion?
#~ Does a Gaussian model (which is fail safe and equally powerful; see Knief and
#~ Forstmeier 2021) yield the same conclusion?
#~ If not, better don't trust the Poisson model!
# References
#~ Knief, U. and Forstmeier, W. (2021) Violating the normality assumption may be the lesser of two evils. Behavior Research Methods 53: 2576-2590.
#~ Forstmeier, W., Wagenmakers, E.-J. and Parker, T. H. (2017) Detecting and avoiding likely false-positive findings – A practical guide. Biological Reviews 92, 1941-1968.
#~ Forstmeier, W. and Schielzeth, H. (2011) Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner’s curse. Behavioral Ecology Sociobiology 65, 47-55.
#~ Schielzeth, H. and Forstmeier, W. (2009) Conclusions beyond support: Over-confident estimates in mixed-models. Behavioral Ecology 20, 416-420.