forked from GreshamLab/flow
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathflow_single_timepoint_data_extraction.Rmd
611 lines (490 loc) · 25.3 KB
/
flow_single_timepoint_data_extraction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
---
title: "Gresham Lab Floww Cytometry Single Timepoint Analysis"
author: 'G. Avecilla, N. Brandt, S. Lauer, F. Abdul-Rahman, D. Gresham'
date: '`r Sys.Date()`'
output: html_notebook
---
This notebook contains the code necessary to analysis flow cytometry data in the Gresham Lab.
To analyze flow cytometry data, you MUST use the latest version of this code, available on the [Gresham Lab github](https://github.com/GreshamLab/flow).
**Experimental overview**
Write a detailed description of your experiment here including the goal of the analysis and your interpretation of the results.
If you still see this text it means that you have not described the experiment and whatever follows is meaningless.
*This code is designed for use with the Accuri flow cytometer, which is equiped with the following lasers and filters*
* Blue laser (488 nm)
* FL1 filter = 514/20nm GFP
* FL3 filter = 575/25nm YFP
* Yellow/green laser (552 nm)
* FL2 filter = 610/20nm mCherry, dtomato
* FL4 filter = 586/15nm DsRed
**Requirements**
In order to run this code you need:
* to predefine your gates using the gating.R script
* the gates.Rdata workspace, which contains the gates to be used in this script
* a .csv sample sheet in with the first XXXX columns set up as in the example below
* user defined variables, see below and chunk 2
**Output**
This script generates quality control plots in the notebook and a file(s) with a summary of results.
The user can generate the output file(s) in three different styles:
1. As a dataframe converted from fcs with all or some of the data.
2. As a dataframe with summary statistics (e.g. median FL1 per sample)
3. As a new .fcs file with additional information (e.g. phenotype or sample information) appended
**Libraries**
```{r Libraries, eval=TRUE}
# This is a function that just makes sure you have a package, or installs it for you without prompting
requireInstall <- function(packageName,isBioconductor=F) {
if ( !try(require(packageName,character.only=T)) ) {
print(paste0("You don't have ",packageName," accessible, ",
"I'm gonna install it"))
if (isBioconductor) {
source("http://bioconductor.org/biocLite.R")
biocLite(packageName)
} else {
install.packages("packageName", repos = "http://cran.us.r-project.org")
}
}
return(1)
}
#Load libraries
requireInstall("openCyto",isBioconductor=T)
requireInstall("ggcyto",isBioconductor=T)
requireInstall("tidyverse")
requireInstall("ggridges")
```
**Variables and Output Settings**
Variables for the user to set can be found in chunk one. These include both required variables (e.g., working directory), and optional variables (e.g., style of ouput). There are some defaults for these variables.
```{r User Defined Variables}
#working directory
dir = '.'
#file location of fcs and sampleshhet
path.data = paste(getwd(),"/",sep="")
#fcs run sample name
name = "Test"
#name of the gate file to use
gate.name <- NA
#are you checking ploidy with PI stain?
#if you are, you must define the ploidy (of at least the controls) in your sample sheet
pi <- TRUE
#fcs data to extract
extract.FSC_A <- "Yes"
extract.SSC_A <- "Yes"
extract.FL1_A <- "Yes"
extract.FL2_A <- "No"
extract.FL3_A <- "No"
extract.FL4_A <- "No"
extract.FSC_H <- "No"
extract.SSC_H <- "No"
extract.FL1_H <- "No"
extract.FL2_H <- "No"
extract.FL3_H <- "No"
extract.FL4_H <- "No"
extract.Width <- "No"
extract.Time <- "No"
#samplesheet parameters
sample.param <- data.frame(SAMPLE=NA, WELL=NA, STRAIN=NA, GENOTYPE=NA, PLOIDY=NA, MEDIA=NA, EXPERIMENT=NA)
#style of output
#Proportional Data
save.prop <- "Yes"
#FlowSet
save.flowset <- "No"
#filename
folder.flowset <- paste(name, 'flowdata', Sys.Date(), sep = '_')
#DataFrame - Individual data points
save.df <- "Yes"
file.df <- paste(name, 'df', Sys.Date(), sep = '_')
#DataFrame - Experiment Statistics
save.stats <- "Yes"
file.stats <- paste(name, 'stats', Sys.Date(), sep = '_')
#DataFrame - Proportions in each gate
save.prop <- "Yes"
file.prop <- paste(name, 'prop', Sys.Date(), sep = '_')
```
**Read in Data**
*.fcs files must be in a folder with a unique name and have an accompanying .csv samplesheet
The samplesheet name format must be samplesheet_"unique name".csv
It must contain the following columns, in addition you may add additioanl columns depending on your needs
You must also define the parameters under the pData entry
#Example Set of Samplesheet Parameters
Need to define names
* column1 = Well
* column2 = Strain
* column3 = Genotype
* column4 = Ploidy
* column5 = Media
* column6 = Experiment
* column7 = Userdefined
```{r}
#Reads in samplesheet
sample.sheet <- read.csv(paste(path.data,"samplesheet_",name,".csv", sep=""))
#get/set samplesheet parameters, overwrites default
#!!!! REMEMBER TO ADJUST THE CODE SO THAT THE PARAMETERS MATCH THOSE THAT YOU ADD TO THE PDATA, TO THE EXTRATCED DATA, AND TO THE DATA
#reads in FCS files in order outliined in samplesheet
files <- paste(path.data,name,"/",sort(factor(list.files(paste(path.data,name,"/", sep=""),full.names=FALSE), levels = paste(sample.sheet$Well,".fcs",sep="" ), ordered=TRUE)),sep="")
flowData <- read.ncdfFlowSet(files=files, pattern=".fcs", alter.names = TRUE)
#Ensures that the number of entries in the sample sheet match the number of flowFrames in the flowSet
sample.ind <- which(paste(sample.sheet$Well,".fcs", sep="") %in% sampleNames(flowData))
sample.sheet <- sample.sheet[sample.ind,]
sample.sheet <- sample.sheet[order(sample.sheet$Well),]
#Adds a sample sheet data to the pData of the flowset
#Defines the sample name in the two places needed for the flowframe
#sampleNames must be unique and is used by flowframe to differenate between flowframes
#pData()$name is used as a defualt value in many of the functions used to display FCS data
sampleNames(flowData) <- paste(gsub(" ","_",sample.sheet$Strain),"_",sub(" ","_",sample.sheet$Well), sep="")
pData(flowData)$name <- sampleNames(flowData)
#Additional samplesheet parameters added to the the FlowSet
#!!!! REMEMBER THAT YOU NEED TO MAKE SURE THESE MATCH YOUR SAMPLESHEET PARAMETERS
pData(flowData)$Well <- sample.sheet$Well
pData(flowData)$Strain <- sample.sheet$Strain
pData(flowData)$Genotype <- sample.sheet$Genotype
pData(flowData)$Ploidy <- sample.sheet$Ploidy
pData(flowData)$Media <- sample.sheet$Media
pData(flowData)$Experiment <- sample.sheet$Experiment
#Load Gates
load(file = gate.name)
```
** Base Summary **
Just checks to make sure you collected the amount of data you thought you did, as well as gathering the actual number of counts in each sample and the number of samples to be used in later chunks.
```{r flowSet summaries}
#Check how many cells were counted in each fcs file
total <- fsApply(flowData, each_col, length)[1:length(flowData)] #total counts per sample
print(total)
#Print the summary of data values for each sample
summary(flowData)
samples.num <- length(flowData)
print(samples.num)
```
** Gating **
Apply all the gates you created to your data set, if you have more then the 4 generic gates, you'll need to create a new segment of gating code
EXAMPLE CODE
START
GATEDDATA <- Subset(flowData, GATE)
GATEDDATACOUNTS <- fsApply(GATEDDATA, each_col, length)[1:samples.num]
print(GATEDDATACOUNTS)
END
```{r Application of Gates}
##Subset the data by applying sequential gates##
#apply doublet gate to ALL SAMPLES
flowData.singlets <- Subset(flowData, pg.singlets)
singlets <- fsApply(flowData.singlets, each_col, length)[1:samples.num]
print(singlets)
#apply debris gates
filteredData <- Subset(flowData.singlets, pg.nondebris)
non.debris <- fsApply(filteredData, each_col, length)[1:samples.num]
print(non.debris)
#this gate defines nonFL1 cells
flone.neg <- Subset(filteredData, gate.neg)
flone.neg.cells <- fsApply(flone.neg, each_col, length)[1:samples.num]
print(flone.neg.cells)
#this gate defines flouresecent cells
flone.pos <- Subset(filteredData, gate.pos)
flone.pos.cells <- fsApply(flone.pos, each_col, length)[1:samples.num]
print(flone.pos.cells)
```
** Quality control **
#Note that you may need to adjust the axis, the bin size, and the number cols and rows, and the number of fluorescent gates in order to produce the best visulatization for your data
##Gates
```{r}
#Singlets gate
ggcyto(flowData, aes(x = `FSC.H`, y = `FSC.A`)) + geom_hex(bins = 512) + xlim(0,3e6) + ylim(0,3e6) + geom_gate(pg.singlets) + facet_wrap(~name, ncol = 8, nrow = 4) + ggtitle("First flowset - singlets gate")
#Debris gate
ggcyto(flowData, aes(x = `FSC.A`, y = `SSC.A`)) + geom_hex(bins = 512) + xlim(0,2e6) + ylim(0,5e5) + geom_gate(pg.nondebris) + facet_wrap(~name, ncol = 8, nrow = 4) + ggtitle("First flowset - nondebris gate")
#Non-fluorescent population gate
ggcyto(flowData, aes(x = `FSC.A`, y = `FL1.A`)) + geom_hex(bins = 512) + xlim(0,1e6) + ylim(0,3e5) + geom_gate(gate.neg) + facet_wrap(~name, ncol = 8, nrow = 4) + ggtitle("First flowset - non GFP gate")
# Fluorescent population gate
ggcyto(flowData, aes(x = `FSC.A`, y = `FL1.A`)) + geom_hex(bins = 512) + xlim(0,1e6) + ylim(0,3e5) + geom_gate(gate.pos) + facet_wrap(~name, ncol = 8, nrow = 4) + ggtitle("First flowset - GFP gate")
#Hi fluorescing gate
ggcyto(flowData, aes(x = `FSC.A`, y = `FL1.A`)) + geom_hex(bins = 512) + xlim(0,1e6) + ylim(0,3e5) + geom_gate(gate.hi) + facet_wrap(~name, ncol = 8, nrow = 4) + ggtitle("First flowset - high GFP gate")
```
##Ploidy check
You may have to modify x limits based on your data
Make sure you have haploid and diploid controls for this
```{r Ploidy}
p = ggcyto(filteredData, aes(x = `FSC.A`))
p + geom_density_ridges(aes(y = name)) +
facet_null() +
scale_x_continuous(expand=c(0,0)) +
ggtitle('Ploidy check by forward scatter')
if(pi == TRUE) {
p = ggcyto(filteredData, aes(x = `FL2.A`))
p + geom_density_ridges(aes(y = name, fill = Ploidy)) +
facet_null() +
scale_x_continuous(expand=c(0,0), limits = c(0,7.5e3)) +
ggtitle('Ploidy check by PI stain')
}
```
##Data transformation for visualization
```{r}
#In order to look at QC plots the data is transformed using the logicle transform, which is a log transform for high values that transitions to a linear transformation near zero values
#This is simply for visualization purposes
lgcl <- logicleTransform(w = 0.5, t= 10000, m=4.5) #the parameters w,t, and m define the transformation parameters
#Dataset 1 tranformation applied to every channel except width and time
dataLGCLTransform <- transform(filteredData, 'FSC.A' = lgcl(`FSC.A`), 'SSC.A' =lgcl(`SSC.A`), 'FL1.A' = lgcl(`FL1.A`), 'FL2.A' = lgcl(`FL2.A`), 'FL3.A' = lgcl(`FL3.A`), 'FL4.A' = lgcl(`FL4.A`),'FSC.H' = lgcl(`FSC.H`),'SSC.H' = lgcl(`SSC.H`),'FL1.H' = lgcl(`FL1.H`),'FL2.H' = lgcl(`FL2.H`),'FL3.H' = lgcl(`FL3.H`),'FL4.H' = lgcl(`FL4.H`))
```
##Effect of time
```{r}
#The effect of time on signal (of which there shouldn't be any)
ggcyto(dataLGCLTransform, aes(x = `Time`, y = `FL1.A`)) + geom_hex(bins = 128) + xlim(150,250) + ylim(0,8)+facet_wrap(~name, ncol = 8, nrow = 4)
```
##Plots of FL1 versus FSC
```{r}
ggcyto(dataLGCLTransform, aes(x = `FSC.A`, y = `FL1.A`)) + geom_hex(bins = 512) + xlim(4,8) + ylim(2,6) + facet_wrap(~name, ncol = 8, nrow = 4)
```
##Plots of FSC versus SSC
```{r}
ggcyto(dataLGCLTransform, aes(x = `FSC.A`, y = `SSC.A`)) + geom_hex(bins = 512) + xlim(4,8) + ylim(4,8)+facet_wrap(~name, ncol = 8, nrow = 4)
```
**Data Extraction**
Extracts the data you want based on your selections in the user defined variables
Also remember you need to make sure your samplesheet paramters match with the parameters you wanted in your extracted data file
```{r Data extraction}
#Move filtered data into a dataframe
#Create the empty data frame
filtered.data <- sample.param
if(extract.FSC_A == "Yes"){filtered.data<-cbind(filtered.data, FSC.A=NA)}
if(extract.SSC_A == "Yes"){filtered.data<-cbind(filtered.data, SSC.A=NA)}
if(extract.FL1_A == "Yes"){filtered.data<-cbind(filtered.data, FL1.A=NA)}
if(extract.FL2_A == "Yes"){filtered.data<-cbind(filtered.data, FL2.A=NA)}
if(extract.FL3_A == "Yes"){filtered.data<-cbind(filtered.data, FL3.A=NA)}
if(extract.FL4_A == "Yes"){filtered.data<-cbind(filtered.data, FL4.A=NA)}
if(extract.FSC_H == "Yes"){filtered.data<-cbind(filtered.data, FSC.H=NA)}
if(extract.SSC_H == "Yes"){filtered.data<-cbind(filtered.data, SSC.H=NA)}
if(extract.FL1_H == "Yes"){filtered.data<-cbind(filtered.data, FL1.H=NA)}
if(extract.FL2_H == "Yes"){filtered.data<-cbind(filtered.data, FL2.H=NA)}
if(extract.FL3_H == "Yes"){filtered.data<-cbind(filtered.data, FL3.H=NA)}
if(extract.FL4_H == "Yes"){filtered.data<-cbind(filtered.data, FL4.H=NA)}
if(extract.Width == "Yes"){filtered.data<-cbind(filtered.data, Width=NA)}
if(extract.Time == "Yes"){filtered.data<-cbind(filtered.data, Time=NA)}
#Fill the data frame with the data you want
#!!!! REMEMBER THAT YOU NEED TO MAKE SURE YOUR SAMPLESHEET PARAMATERS ARE INCLUDED IN THIS CODE
for(i in 1:length(filteredData)){
sample <- sampleNames(filteredData)[i]
well <- as.character(pData(filteredData)$Well[i])
strain <- as.character(pData(filteredData)$Strain[i])
genotype <- as.character(pData(filteredData)$Genotype[i])
ploidy <- as.character(pData(filteredData)$Ploidy[i])
media <- as.character(pData(filteredData)$Media[i])
exp <- as.character(pData(filteredData)$Experiment[i])
if(extract.FSC_A == "Yes"){fsc.a <- exprs(filteredData[[i,1]])}
if(extract.SSC_A == "Yes"){ssc.a <- exprs(filteredData[[i,2]])}
if(extract.FL1_A == "Yes"){fl1.a <- exprs(filteredData[[i,3]])}
if(extract.FL2_A == "Yes"){fl2.a <- exprs(filteredData[[i,4]])}
if(extract.FL3_A == "Yes"){fl3.a <- exprs(filteredData[[i,5]])}
if(extract.FL4_A == "Yes"){fl4.a <- exprs(filteredData[[i,6]])}
if(extract.FSC_H == "Yes"){fsc.h <- exprs(filteredData[[i,7]])}
if(extract.SSC_H == "Yes"){ssc.h <- exprs(filteredData[[i,8]])}
if(extract.FL1_H == "Yes"){fl1.h <- exprs(filteredData[[i,9]])}
if(extract.FL2_H == "Yes"){fl2.h <- exprs(filteredData[[i,10]])}
if(extract.FL3_H == "Yes"){fl3.h <- exprs(filteredData[[i,11]])}
if(extract.FL4_H == "Yes"){fl4.h <- exprs(filteredData[[i,12]])}
if(extract.Width == "Yes"){width <- exprs(filteredData[[i,11]])}
if(extract.Time == "Yes"){time <- exprs(filteredData[[i,12]])}
#!!!! REMEMBER THAT YOU NEED TO MAKE SURE YOUR SAMPLESHEET PARAMATERS ARE INCLUDED IN THIS CODE
filtered.data <- rbind(filtered.data, cbind(SAMPLE=sample,WELL=well,STRAIN=strain,GENOTYPE=genotype,PLOIDY=ploidy,MEDIA=media,EXPERIMENT=exp,
if(extract.FSC_A == "Yes"){FSC.A=fsc.a},
if(extract.SSC_A == "Yes"){SSC.A=ssc.a},
if(extract.FL1_A == "Yes"){FL1.A=fl1.a},
if(extract.FL2_A == "Yes"){FL2.A=fl2.a},
if(extract.FL3_A == "Yes"){FL3.A=fl3.a},
if(extract.FL4_A == "Yes"){FL4.A=fl4.a},
if(extract.FSC_H == "Yes"){FSC.H=fsc.h},
if(extract.SSC_H == "Yes"){SSC.H=ssc.h},
if(extract.FL1_H == "Yes"){FL1.H=fl1.h},
if(extract.FL2_H == "Yes"){FL2.H=fl2.h},
if(extract.FL3_H == "Yes"){FL3.H=fl3.h},
if(extract.FL4_H == "Yes"){FL4.H=fl4.h},
if(extract.Width == "Yes"){Width=width},
if(extract.Time == "Yes"){Time=time}))
}
#Cleans up DataFrames
filtered.data<-filtered.data[2:nrow(filtered.data),]
if(extract.FSC_A == "Yes"){filtered.data$FSC.A<-as.numeric(filtered.data$FSC.A)}
if(extract.SSC_A == "Yes"){filtered.data$SSC.A<-as.numeric(filtered.data$SSC.A)}
if(extract.FL1_A == "Yes"){filtered.data$FL1.A<-as.numeric(filtered.data$FL1.A)}
if(extract.FL2_A == "Yes"){filtered.data$FL2.A<-as.numeric(filtered.data$FL2.A)}
if(extract.FL3_A == "Yes"){filtered.data$FL3.A<-as.numeric(filtered.data$FL3.A)}
if(extract.FL4_A == "Yes"){filtered.data$FL4.A<-as.numeric(filtered.data$FL4.A)}
if(extract.FSC_H == "Yes"){filtered.data$FSC.A<-as.numeric(filtered.data$FSC.A)}
if(extract.SSC_H == "Yes"){filtered.data$SSC.A<-as.numeric(filtered.data$SSC.A)}
if(extract.FL1_H == "Yes"){filtered.data$FL1.A<-as.numeric(filtered.data$FL1.A)}
if(extract.FL2_H == "Yes"){filtered.data$FL2.A<-as.numeric(filtered.data$FL2.A)}
if(extract.FL3_H == "Yes"){filtered.data$FL3.A<-as.numeric(filtered.data$FL3.A)}
if(extract.FL4_H == "Yes"){filtered.data$FL4.A<-as.numeric(filtered.data$FL4.A)}
if(extract.Width == "Yes"){filtered.data$Width<-as.numeric(filtered.data$Width)}
if(extract.Time == "Yes"){filtered.data$Time<-as.numeric(filtered.data$Time)}
```
**Summary Statistics**
Creates a set od statistics that summarize your data. You will need to make sure to adjust the code depending on the data and statistics you want summarized
Also remember you need to make sure your samplesheet paramters match with the parameters you wanted in your extracted data file
```{r Summary Statistics}
#Create the empty data frame
# !!!! REMEMBER YOU MAY NEED TO ADD OR REMOVE STATISTIC COLUMNS BASED ON WHAT YOUR NEEDS ARE !!!!
stats.data <- cbind(sample.param,COUNT=NA,FSC_MEDIAN=NA,FSC_MEAN=NA,FSC_SD=NA,FL1_MEDIAN=NA,FL1_MEAN=NA,FL1_SD=NA,NORMALIZED_GFP_MEDIAN=NA,NORMALIZED_GFP_MEAN=NA,NORMALIZED_GFP_SD=NA)
#Fill the data frame with the data you want
#!!!! REMEMBER THAT YOU NEED TO MAKE SURE YOUR SAMPLESHEET PARAMATERS ARE INCLUDED IN THIS CODE AND THAT YOU MAY NEED TO ADD OR REMOVE STATISTIC COLUMNS BASED ON WHAT YOUR NEEDS ARE !!!!
for(i in 1:length(filteredData)){
sample <- sampleNames(filteredData)[i]
well <- as.character(pData(filteredData)$Well[i])
strain <- as.character(pData(filteredData)$Strain[i])
genotype <- as.character(pData(filteredData)$Genotype[i])
ploidy <- as.character(pData(filteredData)$Ploidy[i])
media <- as.character(pData(filteredData)$Media[i])
exp <- as.character(pData(filteredData)$Experiment[i])
if(extract.FSC_A == "Yes"){fsc.a <- exprs(filteredData[[i,1]])}
if(extract.SSC_A == "Yes"){ssc.a <- exprs(filteredData[[i,2]])}
if(extract.FL1_A == "Yes"){fl1.a <- exprs(filteredData[[i,3]])}
if(extract.FL2_A == "Yes"){fl2.a <- exprs(filteredData[[i,4]])}
if(extract.FL3_A == "Yes"){fl3.a <- exprs(filteredData[[i,5]])}
if(extract.FL4_A == "Yes"){fl4.a <- exprs(filteredData[[i,6]])}
if(extract.FSC_H == "Yes"){fsc.h <- exprs(filteredData[[i,7]])}
if(extract.SSC_H == "Yes"){ssc.h <- exprs(filteredData[[i,8]])}
if(extract.FL1_H == "Yes"){fl1.h <- exprs(filteredData[[i,9]])}
if(extract.FL2_H == "Yes"){fl2.h <- exprs(filteredData[[i,10]])}
if(extract.FL3_H == "Yes"){fl3.h <- exprs(filteredData[[i,11]])}
if(extract.FL4_H == "Yes"){fl4.h <- exprs(filteredData[[i,12]])}
if(extract.Width == "Yes"){width <- exprs(filteredData[[i,11]])}
if(extract.Time == "Yes"){time <- exprs(filteredData[[i,12]])}
#!!!! REMEMBER THAT YOU NEED TO MAKE SURE YOUR SAMPLESHEET PARAMATERS ARE INCLUDED IN THIS CODE AND THAT YOU MAY NEED TO ADD OR REMOVE STATISTIC COLUMNS BASED ON WHAT YOUR NEEDS ARE !!!!
stats.data<-(rbind(stats.data,cbind(SAMPLE=sample,WELL=well,STRAIN=strain,GENOTYPE=genotype,PLOIDY=ploidy,MEDIA=media,EXPERIMENT=exp,COUNT=length(fsc.a),FSC_MEDIAN=median(fsc.a),FSC_MEAN=mean(fsc.a),FSC_SD=sd(fsc.a),FL1_MEDIAN=median(fl1.a),FL1_MEAN=mean(fl1.a),FL1_SD=sd(fl1.a),NORMALIZED_GFP_MEDIAN=median(fl1.a/fsc.a),NORMALIZED_GFP_MEAN=mean(fl1.a/fsc.a),NORMALIZED_GFP_SD=sd(fl1.a/fsc.a))))
}
#Cleans up DataFrames
stats.data<-stats.data[2:nrow(stats.data),]
stats.data$COUNT<-as.numeric(stats.data$COUNT)
stats.data$FSC_MEDIAN<-as.numeric(stats.data$FSC_MEDIAN)
stats.data$FSC_MEAN<-as.numeric(stats.data$FSC_MEAN)
stats.data$FSC_SD<-as.numeric(stats.data$FSC_SD)
stats.data$FL1_MEDIAN<-as.numeric(stats.data$FL1_MEDIAN)
stats.data$FL1_MEAN<-as.numeric(stats.data$FL1_MEAN)
stats.data$FL1_SD<-as.numeric(stats.data$FL1_SD)
stats.data$NORMALIZED_GFP_MEDIAN<-as.numeric(stats.data$NORMALIZED_GFP_MEDIAN)
stats.data$NORMALIZED_GFP_MEAN<-as.numeric(stats.data$NORMALIZED_GFP_MEAN)
stats.data$NORMALIZED_GFP_SD<-as.numeric(stats.data$NORMALIZED_GFP_SD)
```
#Plots
```{r Desnity Plots}
ggplot(filtered.data)+
geom_density(aes(x = FSC.A), colour = "black")+
theme()+
labs(y = "Counts", x = "FSC.A") +
scale_x_log10()+
facet_wrap('SAMPLE')
ggplot(filtered.data )+
geom_density(aes(x = FL1.A), colour = "green")+
theme()+
labs(y = "Counts", x = "FL1.A") +
scale_x_log10()+
facet_wrap('SAMPLE')
ggplot(filtered.data )+
geom_density(aes(x = FL1.A/FSC.A), colour = "blue")+
theme()+
labs(y = "Counts",x = "FL1.A/FSC.A") +
scale_x_log10()+
facet_wrap('SAMPLE')
```
```{r Data Distribution Plots}
ggplot(filtered.data, aes(SAMPLE,FSC.A)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,vjust=0.5,hjust = 1, size = 10)) +
stat_boxplot(geom ='errorbar', colour = "grey") +
geom_boxplot(outlier.shape = NA, colour = "grey") +
#geom_hline(yintercept=haploid.fsc, lty=2, colour = "black") +
#geom_hline(yintercept=diploid.fsc, lty=2, colour = "blue") +
scale_y_log10()+
xlab("Sample")
ggplot(filtered.data, aes(SAMPLE,FL1.A)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,vjust=0.5,hjust = 1, size = 10)) +
stat_boxplot(geom ='errorbar', colour = "lightgreen") +
geom_boxplot(outlier.shape = NA, colour = "lightgreen") +
#geom_hline(yintercept=gfp.bg, lty=2, colour = "yellow") +
#geom_hline(yintercept=gfp.wt, lty=2, colour = "green") +
labs(title= paste("FL1.A", sep="")) +
scale_y_log10()+
xlab("Sample")
ggplot(filtered.data, aes(SAMPLE,FL1.A/FSC.A)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,vjust=0.5,hjust = 1, size = 10)) +
stat_boxplot(geom ='errorbar', colour = "darkgreen") +
geom_boxplot(outlier.shape = NA, colour = "darkgreen") +
#geom_hline(yintercept=gfp.norm, lty=2, colour = "blue") +
scale_y_log10()+
xlab("Sample")
```
r
```{r Quantitation of Signal}
###FL1###
baseline.FL1 <- stats.data$FL1_MEDIAN[] #MUST REPLACE WITH YOUR FL1 ChannelCONTROL!!
ggplot(stats.data,aes(x=SAMPLE, y = FL1_MEDIAN/baseline.FL1)) +
geom_col() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,vjust=0.5,hjust = 1, size = 6)) +
scale_x_discrete(labels=stats.data$SAMPLE)+
ylab("Relative FL1 Median Expression")
###FSC###
baseline.FSC <- stats.data$FSC_MEDIAN[1] #MUST REPLACE WITH YOUR FSC Channel non-flourescense CONTROL!!
ggplot(stats.data,aes(x=SAMPLE, y = FSC_MEDIAN/baseline.FL1)) +
geom_col() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,vjust=0.5,hjust = 1, size = 6)) +
scale_x_discrete(labels=stats.data$SAMPLE)+
ylab("Relative Median FSC")
```
#Repupose Barplots into normal plot ourput flow
```{r}
ggplot(stats.data,aes(x=SAMPLE, y = singlets/total)) +
geom_col() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,vjust=0.5,hjust = 1, size = 6)) +
scale_x_discrete(labels=stats.data$SAMPLE)+
ylab("Proportion Singlet Cells")
ggplot(stats.data,aes(x=SAMPLE, y = non.debris/total)) +
geom_col() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,vjust=0.5,hjust = 1, size = 6)) +
scale_x_discrete(labels=stats.data$SAMPLE)+
ylab("Proportion Non-debris Cells")
ggplot(stats.data,aes(x=SAMPLE, y = flone.neg.cells/non.debris)) +
geom_col() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,vjust=0.5,hjust = 1, size = 6)) +
scale_x_discrete(labels=stats.data$SAMPLE)+
ylab("Proportion Cells with no Flourescense")
ggplot(stats.data,aes(x=SAMPLE, y = flone.pos.cells/non.debris)) +
geom_col() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,vjust=0.5,hjust = 1, size = 6)) +
scale_x_discrete(labels=stats.data$SAMPLE)+
ylab("Proportion cells with Flourescense")
```
##Population Composition
#Currently Designed for LTEE experiments
#You may need to remove or modify this based on your experiments needs
```{r Population composition}
PopProportion_zeroGFP <- flone.neg.cells/non.debris
PopProportion_oneGFP <- flone.one.cells/non.debris
PopProportion_twoGFP <- flone.two.cells/non.debris
PopProportion_threeGFP <- flone.three.cells/non.debris
prop.data <- cbind(stats.data, PopProportion_zeroGFP, PopProportion_oneGFP, PopProportion_twoGFP, PopProportion_threeGFP, non.debris)
colnames(prop.data) <- c(colnames(stats.data), "PopProp_0copy", "PopProp_1copy", "PopProp_2copy", "PopProp_3plus", "CELL COUNT")
prop.data %>%
select(SAMPLE,PopProp_3plus,PopProp_0copy,PopProp_1copy,PopProp_2copy) %>%
gather(key="GFP_NUMBER",value="SIGNAL",-SAMPLE) %>%
ggplot(aes(fill=factor(GFP_NUMBER, levels = c("PopProp_3plus", "PopProp_2copy", "PopProp_1copy", "PopProp_0copy")), y=SIGNAL, x=SAMPLE)) +
geom_bar(stat="identity") +
scale_fill_manual(values=c("dark green","light green","grey","black")) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90,vjust=0.5,hjust = 1, size = 6)) +
ylab("Proportion cells with Flourescense") +
theme(legend.title=element_blank())
```
** Saving Data **
```{r Saves Data}
#Resave the flowset, this will now include the data from your samplesheet added into the pDATA
#You can also choose to save only gated data by adjusted what flowset you save
#Default is ungated samples
if(save.flowset == "Yes"){write.flowSet(flowData, folder.flowset)}
#The data extracted from the flowset
if(save.df == "Yes"){save(filtered.data, file=paste(file.df,".Rdata",sep=""))}
#The summarized data extracted from the flowset
if(save.stats== "Yes"){write.csv(stats.data, file= paste(file.stats,".csv",sep=""), row.names=TRUE, quote=F)}
#The population proportion data from your flowset
if(save.prop == "Yes"){write.csv(prop.data, file=paste(file.prop,".csv",sep=""), row.names=TRUE, quote=F)}
```