-
Notifications
You must be signed in to change notification settings - Fork 4
Example of results of Integrated or Grouped Analysis for a school case
mAGLAVE edited this page Jul 6, 2021
·
5 revisions
These results were generated on the Flamingo computing cluster of GR.
For this example, I need of 2 datasets (at least) of single-cell RNA-seq:
- I reuse the sc5p_v2_hs_PBMC_1k_5 sample from 10XGenomics (found here: https://support.10xgenomics.com/single-cell-vdj/datasets/4.0.0/sc5p_v2_hs_PBMC_1k) used in the example of individual analysis of this wiki.
- I use the sc5p_v2_hs_PBMC_10k_5 sample from 10XGenomics (found here: https://support.10xgenomics.com/single-cell-vdj/datasets/5.0.0/sc5p_v2_hs_PBMC_10k).
This 2 datasets are Human PBMC sample from Healthy Donors with:
- Gene Expression,
- BCR,
- TCR,
- ADT (named Feature Barcoding by 10XGenomics).
This sample contains average 1k and 10k cells and it was made by v2 chemistry technology.
- I reuse the sc5p_v2_hs_PBMC_1k_5 sample used in the example of individual analysis of this wiki.
- I made the individual analysis for the sc5p_v2_hs_PBMC_10k_5 sample. I will not detail this individual analysis but you can see the parameters file Params_indiv.yaml and the launcher file launcher_indiv.sh. (Notes: The .rda files and gene expression matrix, are not present in the github results because their sizes exceed the threshold of 50 mb of github.)
I reuse the ADT reference index from the individual analysis of samples. (see Make the ADT reference index)
I reuse the Markfile from the individual analysis of samples. (see Make the Markfile)
As the individual analysis and specified in the Pipeline phylosophy, single-cell data analysis cannot be done through a one-shot pipeline. So, the rest of the explanation will be a repetition of several phases:
- Make the Configuration Parameter file
- Launch of the analysis
- Interpretation of Results (and Choice of Next Parameters)
- See Arborescence of all results for understand the arborecence.
- The .rda files are snapshots of data at key points in the analysis.
- The .rda files and gene expression matrix, are not present in the github results because their sizes exceed the threshold of 50 mb of github.
- Here I present the integrated analysis and the grouped analysis separately, but we could have done both types of analysis with a single configuration file.
Resources of the Theory of single cell RNA-seq
v1.3
Pipeline details
Configuration
-
Parameter file
- Steps
- Alignment_countTable_GE
- Droplets_QC_GE
- Filtering_GE
- Norm_DimRed_Eval_GE
- Clust_Markers_Annot_GE
- Cerebro
- Alignment_countTable_ADT
- Adding_ADT
- Alignment_annotations_TCR_BCR
- Adding_TCR
- Adding_BCR
- Int_Norm_DimRed_Eval_GE
- Int_Clust_Markers_Annot_GE
- Int_Adding_ADT
- Int_Adding_TCR
- Int_Adding_BCR
- Grp_Norm_DimRed_Eval_GE
- Grp_Clust_Markers_Annot_GE
- Grp_Adding_ADT
- Grp_Adding_TCR
- Grp_Adding_BCR
- Additional files
Results help
- Arborescence of all results
-
Observations and weird results
- Not a threshold by emptyDrops
- Large and small cells into the same sample
- emptyDrops does't work well
- More than 15% mitochondrial RNA while I filtered them out at 15%
- Impact of empty droplets on umap
- Choose the right number of dimensions
- Be careful with the colors, they are sometimes misleading
- Impact of bias correction on umap
Complete Examples of school cases
Individual analysis :
1 sample (scRNA-seq + ADT + TCR + BCR)
Grouped/Integrated analysis :
2 samples (scRNA-seq + ADT + TCR + BCR)