Skip to content

Additional files

mAGLAVE edited this page Sep 10, 2021 · 11 revisions

Here is listed optional additional files for the pipeline.

1. The "markfile" for Clust_Markers_Annot_GE step

This file contains all the genes you want to plot on the umap.
It is a excel file (.xslx), composed of 2 columns: the first contains the names of the genes and the second corresponds to the pathway or to the signature of the gene.

Genes Signatures
IL7R Naive_CD4_T
CCR7 Naive_CD4_T
CD14 CD14_Mono
LYZ CD14_Mono
IL7R Memory_CD4
S100A4 Memory_CD4
MS4A1 B
CD8A CD8_T
FCGR3A FCGR3A_Mono
MS4A7 FCGR3A_Mono
GNLY NK
NKG7 NK
FCER1A DC
CST3 DC
PPBP Platelet

You can put as many genes and have as many files as you want.

2. The custom references for Clust_Markers_Annot_GE, Int_Clust_Markers_Annot_GE and Grp_Clust_Markers_Annot_GE steps

You can make your own references to annotate cells and clusters.

Two kinds of references are accepted:

  • by expression genes:
    This kind of references will be used by SingleR and ClustifyR. The format is a SingleCellExperiment object, save in file.rda, with cell types in sce$labels slot.
##############################################################################
### create our single-cell reference
##############################################################################
>library(SingleCellExperiment)

## Reading data
#logcounts of reference: columns = cells/bulk/microarray; lines = genes; with colnames and rownames
>expr_ref <- read.table("./logcount.txt", sep = ";", header = TRUE, row.names = 1)
>expr_ref[1:5,1:5]
              AAAGCAATCTTGCAAG AAAGTAGAGGAGCGAG AAAGTAGAGTCATCCA
RP11-34P13.7                 0                0                0
CICP27                       0                0                0
RP11-34P13.15                0                0                0
RP11-34P13.13                0                0                0
FO538757.1                   0                0                0
              AAAGTAGCACAACGTT AACCATGCAATCGGTT
RP11-34P13.7                 0                0
CICP27                       0                0
RP11-34P13.15                0                0
RP11-34P13.13                0                0
FO538757.1                   0                0

#labels of the reference (list of cell types corresponding to the logcount table): first column = cell types, no column name, 1 line = 1 cell/bulk/microarray label
>labels_ref <- read.table("./labels.csv", header = FALSE)
>labels_ref[1:5]
                V1
1     B_Lymphocyte
2 CD8_T_Lymphocyte
3 CD4_T_Lymphocyte
4       Macrophage
5         Monocyte

## Make the sce object
>custom_ref <- SingleCellExperiment(assays = list(logcounts = expr_ref)) #if gene expressions are not normalized logcounts but are raw counts, you can use "custom_ref <- SingleCellExperiment(assays = list(counts=expr_ref))" then "custom_ref <- scater::logNormCounts(custom_ref)"
>custom_ref$labels = as.character(labels_ref$V1)

## Save sce object
>save(custom_ref,file="our_ref.rda")
  • by markers genes (signature):
    This kind of references will be used by Cell-ID. Similar of the markfile, it is a excel file (.xslx), composed of 2 columns: the first contains the names of the genes and the second corresponds to the cell type or to the cell state.
Genes Cell Types
CTRB1 Acinar cells (Endoderm, Pancreas)
KLK1 Acinar cells (Endoderm, Pancreas)
RBPJL Acinar cells (Endoderm, Pancreas)
FGF10 Adipocyte progenitor cells (Mesoderm, Connective tissue)
WT1 Adipocyte progenitor cells (Mesoderm, Connective tissue)
SCARA5 Adipocyte progenitor cells (Mesoderm, Connective tissue)

At least 10 genes per cell type are required for Cell-ID (and a maximum of 100 genes is recommended).

3. The "metadata" for all Gene Expression ("_GE") steps

This file contains all the metadata you want to add in the analysis. They will be added in the seurat object then in the cerebro object. This can be useful for regressing bias, doing custom differential analysis, integrating data or just see them on the umap thanks to cerebro.

Two kinds of .csv files are accepted:

  • by cellular barcodes:
    The first column must be named "barcodes".
barcodes;Lab_orig;Days
AAACATACAACCAC-1;MOA;1
AAACATTGAGCTAC-1;ILIO5A;2
AAACATTGATCAGC-1;ILIO5A;2
AAACCGTGCTTCCG-1;MOA;3
AAACCGTGTATGCG-1;MOA;1
AAACGCACTGGTAC-1;ILIO5A;2
AAACGCTGACCAGT-1;MOA;3
AAACGCTGGTTCTT-1;MOA;1
...
  • by orig.ident (i.e. sample name):
    The first column must be named "orig.ident".
orig.ident;tech;conditions
sample1_GE;10X;CTRL
sample2_GE;10X;COND

You can put as many files as you want. They will be processed in the order listed.
Warning: if a column is present in several files, it will be overwritten.
Warning: the metadata file must contain all the cellular barcodes (or orig.ident), present in the analyzed samples, but the metadata file can contain information on the non-analyzed samples.

4. Make the ADT reference index

You need the list of protein names and sequence of the ADT tag (example with 3 proteins).

1. Make the fasta file

INDEX_ADT/ADT.fa
>CD3
CTCATTGTAACTCCT
>CD19
CTGGGCAATTACTCG
>CD45RA
TCAATCCTTCCGCTT

2. Make the tr2gs file

/INDEX_ADT/ADT_tr2gs.txt
transcript gene gene_name
CD3 CD3 CD3
CD19 CD19 CD19
CD45RA CD45RA CD45RA

3. Make the index

kallisto index 'ADT.fa' -i 'ADT.kidx' --kmer-size=15


Home

Resources of the Theory of single cell RNA-seq

v1.3
Pipeline details
Installation
Usage
Configuration
Results help
Complete Examples of school cases
Individual analysis :
1 sample (scRNA-seq + ADT + TCR + BCR)
Grouped/Integrated analysis :
2 samples (scRNA-seq + ADT + TCR + BCR)

Clone this wiki locally