Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Reproducing a Plot in the Paper Introducing the SCTransform (Hafemeister, C., & Satija, R. #1999) #136

Open
mcap91 opened this issue Apr 26, 2022 · 2 comments

Comments

@mcap91
Copy link

mcap91 commented Apr 26, 2022

I am copying the issue from sainadfensi summited on Aug 19, 2019 to the Seurat repo in regards to the x and y axes in Figures 1C, D and Fig 3A from Hafemeister, C., & Satija, R.

I can not find an answer or any information pertaining to the topic and I would also like to reproduce these figures.

Thank you

Dear Team,

I am learning the single-cell RNA-seq with the Seurat package and I highly appreciate the comprehensive package with interpretable visualizations and easy operation. Thank you for building up this package with such consideration and preciseness. Also, I value the new released Seurat v3 with the novel statistical approach for normalization called sctransform.

In the paper[1] about the sctransform, multiple statistical methods are applied to show the convincing effectiveness with loads of plots. In particular, I am interested in the figures in the paper which show the relationship between gene expression and the cell sequencing depth (Fig 1C, D and Fig 3A in the paper). However, when I try to repeat the plot by myself, the result turns out to be different. I believe that there must be some misunderstandings in the steps of plotting the figure. Therefore, it would be so kind if you could help me to go into details on the steps of plotting the trends in the data before and after normalization (Methods 4.4 in the paper).

The method described in the paper is:
> 4.4 Trends in the data before and after normalization [1]
We grouped genes into six bins based on log10-transformed mean UMI count, using bins of equal width. To show the overall trends in the data, for every gene we fit the expression (UMI counts, scaled log-normalized expression, scaled Pearson residuals) as a function of log10-transformed mean UMI count using kernel regression (ksmooth function) with normal kernel and large bandwidth (20 times the size suggested by R function bw.SJ). For visualization, we only used the central 90% of cells based on total UMI. For every gene group, we show the expression range after smoothing from the first to third quartile at 200 equidistant cell UMI values

63309022-89438800-c338-11e9-9ceb-2780659aa435 63309026-8c3e7880-c338-11e9-99ff-7cb5f004b1cb
Figrue 1. Figures that show the relationship between gene expression and the cell sequencing depth (Fig 1C, D and Fig 3A in the paper[1]).

Questions:

  1. If I understand it correctly, genes are grouped genes into six bins based on log10-transformed mean UMI count. Do we need to regroup genes after log-normalization or sctransforms?
  2. Is the x values for the kernel regression the log10-transformed total UMI count of each cell, which is, in other words, x = log10(PBMC$nCount_RNA)?
  3. For doing kernel regressing on different types of data, like log-normalized data and Pearson residuals, are they using the same values of x which is the original total UMI count of each cell, or the new total counts based on newly calculated data, for example, the PBMC$nCount_SCT for Pearson residuals?
  4. For plotting, Should I use the geom_smooth from ggplot2 to draw the colored region, or simply use the values of quartiles to mark the boundary of the region and filled it with a chosen color?
  5. Will the shape(/trend) of the curve change after doing the z-score ?
  6. Besides, I also notice that a small number of features are removed after the sctransform and I cannot find explanations about it online. Is there any automatic filtering in the sctransform?
@ChristophH
Copy link
Collaborator

ChristophH commented Apr 27, 2022 via email

@mcap91
Copy link
Author

mcap91 commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants