-
Notifications
You must be signed in to change notification settings - Fork 4
Additional files
Here is listed optional additional files for the pipeline.
This file contains all the genes you want to plot on the umap.
It is a excel file (.xslx), composed of 2 columns: the first contains the names of the genes and the second corresponds to the pathway or to the signature of the gene.
Genes | Signatures |
---|---|
IL7R | Naive_CD4_T |
CCR7 | Naive_CD4_T |
CD14 | CD14_Mono |
LYZ | CD14_Mono |
IL7R | Memory_CD4 |
S100A4 | Memory_CD4 |
MS4A1 | B |
CD8A | CD8_T |
FCGR3A | FCGR3A_Mono |
MS4A7 | FCGR3A_Mono |
GNLY | NK |
NKG7 | NK |
FCER1A | DC |
CST3 | DC |
PPBP | Platelet |
You can put as many genes and have as many files as you want.
2. The custom references for Clust_Markers_Annot_GE, Int_Clust_Markers_Annot_GE and Grp_Clust_Markers_Annot_GE steps
You can make your own references to annotate cells and clusters.
Two kinds of references are accepted:
- by expression genes:
This kind of references will be used by SingleR and ClustifyR. The format is a SingleCellExperiment object, save in file.rda, with cell types in sce$labels slot.
##############################################################################
### create our single-cell reference
##############################################################################
>library(SingleCellExperiment)
## Reading data
#logcounts of reference: columns = cells/bulk/microarray; lines = genes; with colnames and rownames
>expr_ref <- read.table("./logcount.txt", sep = ";", header = TRUE, row.names = 1)
>expr_ref[1:5,1:5]
AAAGCAATCTTGCAAG AAAGTAGAGGAGCGAG AAAGTAGAGTCATCCA
RP11-34P13.7 0 0 0
CICP27 0 0 0
RP11-34P13.15 0 0 0
RP11-34P13.13 0 0 0
FO538757.1 0 0 0
AAAGTAGCACAACGTT AACCATGCAATCGGTT
RP11-34P13.7 0 0
CICP27 0 0
RP11-34P13.15 0 0
RP11-34P13.13 0 0
FO538757.1 0 0
#labels of the reference (list of cell types corresponding to the logcount table): first column = cell types, no column name, 1 line = 1 cell/bulk/microarray label
>labels_ref <- read.table("./labels.csv", header = FALSE)
>labels_ref[1:5]
V1
1 B_Lymphocyte
2 CD8_T_Lymphocyte
3 CD4_T_Lymphocyte
4 Macrophage
5 Monocyte
## Make the sce object
>custom_ref <- SingleCellExperiment(assays = list(logcounts = expr_ref)) #if gene expressions are not normalized logcounts but are raw counts, you can use "custom_ref <- SingleCellExperiment(assays = list(counts=expr_ref))" then "custom_ref <- scater::logNormCounts(custom_ref)"
>custom_ref$labels = as.character(labels_ref$V1)
## Save sce object
>save(custom_ref,file="our_ref.rda")
- by markers genes (signature):
This kind of references will be used by Cell-ID. Similar of the markfile, it is a excel file (.xslx), composed of 2 columns: the first contains the names of the genes and the second corresponds to the cell type or to the cell state.
Genes | Cell Types |
---|---|
CTRB1 | Acinar cells (Endoderm, Pancreas) |
KLK1 | Acinar cells (Endoderm, Pancreas) |
RBPJL | Acinar cells (Endoderm, Pancreas) |
FGF10 | Adipocyte progenitor cells (Mesoderm, Connective tissue) |
WT1 | Adipocyte progenitor cells (Mesoderm, Connective tissue) |
SCARA5 | Adipocyte progenitor cells (Mesoderm, Connective tissue) |
At least 10 genes per cell type are required for Cell-ID (and a maximum of 100 genes is recommended).
This file contains all the metadata you want to add in the analysis. They will be added in the seurat object then in the cerebro object. This can be useful for regressing bias, doing custom differential analysis, integrating data or just see them on the umap thanks to cerebro.
Two kinds of .csv files are accepted:
- by cellular barcodes:
The first column must be named "barcodes".
barcodes;Lab_orig;Days
AAACATACAACCAC-1;MOA;1
AAACATTGAGCTAC-1;ILIO5A;2
AAACATTGATCAGC-1;ILIO5A;2
AAACCGTGCTTCCG-1;MOA;3
AAACCGTGTATGCG-1;MOA;1
AAACGCACTGGTAC-1;ILIO5A;2
AAACGCTGACCAGT-1;MOA;3
AAACGCTGGTTCTT-1;MOA;1
...
- by orig.ident (i.e. sample name):
The first column must be named "orig.ident".
orig.ident;tech;conditions
sample1_GE;10X;CTRL
sample2_GE;10X;COND
You can put as many files as you want. They will be processed in the order listed.
Warning: if a column is present in several files, it will be overwritten.
Warning: the metadata file must contain all the cellular barcodes (or orig.ident), present in the analyzed samples, but the metadata file can contain information on the non-analyzed samples.
You need the list of protein names and sequence of the ADT tag (example with 3 proteins).
INDEX_ADT/ADT.fa
>CD3
CTCATTGTAACTCCT
>CD19
CTGGGCAATTACTCG
>CD45RA
TCAATCCTTCCGCTT
/INDEX_ADT/ADT_tr2gs.txt
transcript gene gene_name
CD3 CD3 CD3
CD19 CD19 CD19
CD45RA CD45RA CD45RA
kallisto index 'ADT.fa' -i 'ADT.kidx' --kmer-size=15
Resources of the Theory of single cell RNA-seq
v1.3
Pipeline details
Configuration
-
Parameter file
- Steps
- Alignment_countTable_GE
- Droplets_QC_GE
- Filtering_GE
- Norm_DimRed_Eval_GE
- Clust_Markers_Annot_GE
- Cerebro
- Alignment_countTable_ADT
- Adding_ADT
- Alignment_annotations_TCR_BCR
- Adding_TCR
- Adding_BCR
- Int_Norm_DimRed_Eval_GE
- Int_Clust_Markers_Annot_GE
- Int_Adding_ADT
- Int_Adding_TCR
- Int_Adding_BCR
- Grp_Norm_DimRed_Eval_GE
- Grp_Clust_Markers_Annot_GE
- Grp_Adding_ADT
- Grp_Adding_TCR
- Grp_Adding_BCR
- Additional files
Results help
- Arborescence of all results
-
Observations and weird results
- Not a threshold by emptyDrops
- Large and small cells into the same sample
- emptyDrops does't work well
- More than 15% mitochondrial RNA while I filtered them out at 15%
- Impact of empty droplets on umap
- Choose the right number of dimensions
- Be careful with the colors, they are sometimes misleading
- Impact of bias correction on umap
Complete Examples of school cases
Individual analysis :
1 sample (scRNA-seq + ADT + TCR + BCR)
Grouped/Integrated analysis :
2 samples (scRNA-seq + ADT + TCR + BCR)