Additional files

Here is listed optional additional files for the pipeline.

1. The "markfile" for Clust_Markers_Annot_GE step

This file contains all the genes you want to plot on the umap.
It is a excel file (.xslx), composed of 2 columns: the first contains the names of the genes and the second corresponds to the pathway or to the signature of the gene.

Genes	Signatures
IL7R	Naive_CD4_T
CCR7	Naive_CD4_T
CD14	CD14_Mono
LYZ	CD14_Mono
IL7R	Memory_CD4
S100A4	Memory_CD4
MS4A1	B
CD8A	CD8_T
FCGR3A	FCGR3A_Mono
MS4A7	FCGR3A_Mono
GNLY	NK
NKG7	NK
FCER1A	DC
CST3	DC
PPBP	Platelet

You can put as many genes and have as many files as you want.

2. The custom references for Clust_Markers_Annot_GE, Int_Clust_Markers_Annot_GE and Grp_Clust_Markers_Annot_GE steps

You can make your own references to annotate cells and clusters.

Two kinds of references are accepted:

by expression genes:
This kind of references will be used by SingleR and ClustifyR. The format is a SingleCellExperiment object, save in file.rda, with cell types in sce$labels slot.

##############################################################################
### create our single-cell reference
##############################################################################
>library(SingleCellExperiment)

## Reading data
#logcounts of reference: columns = cells/bulk/microarray; lines = genes; with colnames and rownames
>expr_ref <- read.table("./logcount.txt", sep = ";", header = TRUE, row.names = 1)
>expr_ref[1:5,1:5]
              AAAGCAATCTTGCAAG AAAGTAGAGGAGCGAG AAAGTAGAGTCATCCA
RP11-34P13.7                 0                0                0
CICP27                       0                0                0
RP11-34P13.15                0                0                0
RP11-34P13.13                0                0                0
FO538757.1                   0                0                0
              AAAGTAGCACAACGTT AACCATGCAATCGGTT
RP11-34P13.7                 0                0
CICP27                       0                0
RP11-34P13.15                0                0
RP11-34P13.13                0                0
FO538757.1                   0                0

#labels of the reference (list of cell types corresponding to the logcount table): first column = cell types, no column name, 1 line = 1 cell/bulk/microarray label
>labels_ref <- read.table("./labels.csv", header = FALSE)
>labels_ref[1:5]
                V1
1     B_Lymphocyte
2 CD8_T_Lymphocyte
3 CD4_T_Lymphocyte
4       Macrophage
5         Monocyte

## Make the sce object
>custom_ref <- SingleCellExperiment(assays = list(logcounts = expr_ref)) #if gene expressions are not normalized logcounts but are raw counts, you can use "custom_ref <- SingleCellExperiment(assays = list(counts=expr_ref))" then "custom_ref <- scater::logNormCounts(custom_ref)"
>custom_ref$labels = as.character(labels_ref$V1)

## Save sce object
>save(custom_ref,file="our_ref.rda")

by markers genes (signature):
This kind of references will be used by Cell-ID. Similar of the markfile, it is a excel file (.xslx), composed of 2 columns: the first contains the names of the genes and the second corresponds to the cell type or to the cell state.

Genes	Cell Types
CTRB1	Acinar cells (Endoderm, Pancreas)
KLK1	Acinar cells (Endoderm, Pancreas)
RBPJL	Acinar cells (Endoderm, Pancreas)
FGF10	Adipocyte progenitor cells (Mesoderm, Connective tissue)
WT1	Adipocyte progenitor cells (Mesoderm, Connective tissue)
SCARA5	Adipocyte progenitor cells (Mesoderm, Connective tissue)

At least 10 genes per cell type are required for Cell-ID (and a maximum of 100 genes is recommended).

3. The "metadata" for all Gene Expression ("_GE") steps

This file contains all the metadata you want to add in the analysis. They will be added in the seurat object then in the cerebro object. This can be useful for regressing bias, doing custom differential analysis, integrating data or just see them on the umap thanks to cerebro.

Two kinds of .csv files are accepted:

by cellular barcodes:
The first column must be named "barcodes".

barcodes;Lab_orig;Days
AAACATACAACCAC-1;MOA;1
AAACATTGAGCTAC-1;ILIO5A;2
AAACATTGATCAGC-1;ILIO5A;2
AAACCGTGCTTCCG-1;MOA;3
AAACCGTGTATGCG-1;MOA;1
AAACGCACTGGTAC-1;ILIO5A;2
AAACGCTGACCAGT-1;MOA;3
AAACGCTGGTTCTT-1;MOA;1
...

by orig.ident (i.e. sample name):
The first column must be named "orig.ident".

orig.ident;tech;conditions
sample1_GE;10X;CTRL
sample2_GE;10X;COND

You can put as many files as you want. They will be processed in the order listed.
Warning: if a column is present in several files, it will be overwritten.
Warning: the metadata file must contain all the cellular barcodes (or orig.ident), present in the analyzed samples, but the metadata file can contain information on the non-analyzed samples.

4. Make the ADT reference index

You need the list of protein names and sequence of the ADT tag (example with 3 proteins).

1. Make the fasta file

INDEX_ADT/ADT.fa

>CD3
CTCATTGTAACTCCT
>CD19
CTGGGCAATTACTCG
>CD45RA
TCAATCCTTCCGCTT

2. Make the tr2gs file

/INDEX_ADT/ADT_tr2gs.txt

transcript gene gene_name
CD3 CD3 CD3
CD19 CD19 CD19
CD45RA CD45RA CD45RA

3. Make the index

kallisto index 'ADT.fa' -i 'ADT.kidx' --kmer-size=15

Home

Resources of the Theory of single cell RNA-seq

v1.3

Pipeline details

Installation

Usage

Configuration

Results help

Complete Examples of school cases

Individual analysis :
1 sample (scRNA-seq + ADT + TCR + BCR)

Grouped/Integrated analysis :
2 samples (scRNA-seq + ADT + TCR + BCR)

The datasets
Preparation of the analysis
- Make the ADT reference index
- Make the Markfile
General information
Make the integrated analysis
- Integration, Normalization, Dimension Reduction, Biases and Clustering Evaluation
- Clustering, Marker Genes, Annotation, ADT, TCR, BCR and Cerebro
Make the grouped analysis
- Merge, Normalization, Dimension Reduction, Biases and Clustering Evaluation
- Clustering, Marker Genes, Annotation, ADT, TCR, BCR and Cerebro

Provide feedback

Saved searches

Use saved searches to filter your results more quickly