-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about Reproducing a Plot in the Paper Introducing the SCTransform (Hafemeister, C., & Satija, R. #1999) #136
Comments
Hi Michael,
Re: Fig1, please have a look at my comment at
#34 (comment) -
even if you are not familiar with R code, the comments might already help.
If there are still specific questions that you have, I'll be happy to
answer them.
…On Tue, Apr 26, 2022 at 9:21 PM Michael D Caponegro < ***@***.***> wrote:
I am copying the issue from sainadfensi
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sainadfensi&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=XCHYNDv-5VpEZb-B_ukfLFN8bn1dcPYSrzaQz2O9s20&e=>
summited on Aug 19, 2019
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_satijalab_seurat_issues_1999-23issue-2D482593314&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=C-kgdQaRuauvXl2B3sEmnwbfqLeX1DRbQeqY7Ubt3kU&e=>
to the Seurat repo
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_satijalab_seurat&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=D1xhpsmXuNyszCe9uxGQUkV2LfuVUfMbhiyFv4xmKOY&e=>
in regards to the x and y axes in Figures 1C, D and Fig 3A from
Hafemeister, C., & Satija, R.
I can not find an answer or any information pertaining to the topic and I
would also like to reproduce these figures.
Thank you
Dear Team,
I am learning the single-cell RNA-seq with the Seurat package and I highly
appreciate the comprehensive package with interpretable visualizations and
easy operation. Thank you for building up this package with such
consideration and preciseness. Also, I value the new released Seurat v3
with the novel statistical approach for normalization called sctransform.
In the paper[1] about the sctransform, multiple statistical methods are
applied to show the convincing effectiveness with loads of plots. In
particular, I am interested in the figures in the paper which show the
relationship between gene expression and the cell sequencing depth (Fig 1C,
D and Fig 3A in the paper). However, when I try to repeat the plot by
myself, the result turns out to be different. I believe that there must be
some misunderstandings in the steps of plotting the figure. Therefore, it
would be so kind if you could help me to go into details on the steps of
plotting the trends in the data before and after normalization (Methods 4.4
in the paper).
The method described in the paper is:
*> 4.4 Trends in the data before and after normalization [1] We grouped
genes into six bins based on log10-transformed mean UMI count, using bins
of equal width. To show the overall trends in the data, for every gene we
fit the expression (UMI counts, scaled log-normalized expression, scaled
Pearson residuals) as a function of log10-transformed mean UMI count using
kernel regression (ksmooth function) with normal kernel and large bandwidth
(20 times the size suggested by R function bw.SJ). For visualization, we
only used the central 90% of cells based on total UMI. For every gene
group, we show the expression range after smoothing from the first to third
quartile at 200 equidistant cell UMI values*
[image: 63309022-89438800-c338-11e9-9ceb-2780659aa435]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_36866996_165375942-2Dfc77ad26-2D388c-2D43d4-2D8a9f-2D427457af104f.png&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=KPyxwUb8qQcfBGxrJITtQhTDszjX4J4Lh_PUErIw1kY&e=> [image:
63309026-8c3e7880-c338-11e9-99ff-7cb5f004b1cb]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_36866996_165375956-2D22c6a567-2D4d0b-2D48e8-2Daaee-2D3959765d9ec5.png&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=42QcV7n_AuPfDyu6SiwoonlqIU-0CCmvS13-DmtZjJ4&e=>
Figrue 1. Figures that show the relationship between gene expression and
the cell sequencing depth (Fig 1C, D and Fig 3A in the paper[1]).
Questions:
1. If I understand it correctly, genes are grouped genes into six bins
based on log10-transformed mean UMI count. Do we need to regroup genes
after log-normalization or sctransforms?
2. Is the x values for the kernel regression the log10-transformed
total UMI count of each cell, which is, in other words, x =
log10(PBMC$nCount_RNA)?
3. For doing kernel regressing on different types of data, like
log-normalized data and Pearson residuals, are they using the same values
of x which is the original total UMI count of each cell, or the new total
counts based on newly calculated data, for example, the PBMC$nCount_SCT for
Pearson residuals?
4. For plotting, Should I use the geom_smooth from ggplot2 to draw the
colored region, or simply use the values of quartiles to mark the boundary
of the region and filled it with a chosen color?
5. Will the shape(/trend) of the curve change after doing the z-score ?
6. Besides, I also notice that a small number of features are removed
after the sctransform and I cannot find explanations about it online. Is
there any automatic filtering in the sctransform?
—
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_satijalab_sctransform_issues_136&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=clhV8qJNn0k8zrnk_1IrvYVWx7biN23LkOIMoEt8Z1w&e=>,
or unsubscribe
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AA4O4VMNZ65INJGHJ3YXA4DVHA64RANCNFSM5UM24K6Q&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=waba3pYHmPFWqfMvXc_0LKs0EqdgjiZrdy7f2aitkgA&e=>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you
On Wed, Apr 27, 2022 at 12:35 AM Christoph Hafemeister <
***@***.***> wrote:
… Hi Michael,
Re: Fig1, please have a look at my comment at
#34 (comment)
-
even if you are not familiar with R code, the comments might already help.
If there are still specific questions that you have, I'll be happy to
answer them.
On Tue, Apr 26, 2022 at 9:21 PM Michael D Caponegro <
***@***.***> wrote:
> I am copying the issue from sainadfensi
> <
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sainadfensi&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=XCHYNDv-5VpEZb-B_ukfLFN8bn1dcPYSrzaQz2O9s20&e=
>
> summited on Aug 19, 2019
> <
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_satijalab_seurat_issues_1999-23issue-2D482593314&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=C-kgdQaRuauvXl2B3sEmnwbfqLeX1DRbQeqY7Ubt3kU&e=
>
> to the Seurat repo
> <
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_satijalab_seurat&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=D1xhpsmXuNyszCe9uxGQUkV2LfuVUfMbhiyFv4xmKOY&e=
>
> in regards to the x and y axes in Figures 1C, D and Fig 3A from
> Hafemeister, C., & Satija, R.
>
> I can not find an answer or any information pertaining to the topic and I
> would also like to reproduce these figures.
>
> Thank you
>
> Dear Team,
>
> I am learning the single-cell RNA-seq with the Seurat package and I
highly
> appreciate the comprehensive package with interpretable visualizations
and
> easy operation. Thank you for building up this package with such
> consideration and preciseness. Also, I value the new released Seurat v3
> with the novel statistical approach for normalization called sctransform.
>
> In the paper[1] about the sctransform, multiple statistical methods are
> applied to show the convincing effectiveness with loads of plots. In
> particular, I am interested in the figures in the paper which show the
> relationship between gene expression and the cell sequencing depth (Fig
1C,
> D and Fig 3A in the paper). However, when I try to repeat the plot by
> myself, the result turns out to be different. I believe that there must
be
> some misunderstandings in the steps of plotting the figure. Therefore, it
> would be so kind if you could help me to go into details on the steps of
> plotting the trends in the data before and after normalization (Methods
4.4
> in the paper).
>
> The method described in the paper is:
>
> *> 4.4 Trends in the data before and after normalization [1] We grouped
> genes into six bins based on log10-transformed mean UMI count, using bins
> of equal width. To show the overall trends in the data, for every gene we
> fit the expression (UMI counts, scaled log-normalized expression, scaled
> Pearson residuals) as a function of log10-transformed mean UMI count
using
> kernel regression (ksmooth function) with normal kernel and large
bandwidth
> (20 times the size suggested by R function bw.SJ). For visualization, we
> only used the central 90% of cells based on total UMI. For every gene
> group, we show the expression range after smoothing from the first to
third
> quartile at 200 equidistant cell UMI values*
>
> [image: 63309022-89438800-c338-11e9-9ceb-2780659aa435]
> <
https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_36866996_165375942-2Dfc77ad26-2D388c-2D43d4-2D8a9f-2D427457af104f.png&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=KPyxwUb8qQcfBGxrJITtQhTDszjX4J4Lh_PUErIw1kY&e=>
[image:
> 63309026-8c3e7880-c338-11e9-99ff-7cb5f004b1cb]
> <
https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_36866996_165375956-2D22c6a567-2D4d0b-2D48e8-2Daaee-2D3959765d9ec5.png&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=42QcV7n_AuPfDyu6SiwoonlqIU-0CCmvS13-DmtZjJ4&e=
>
> Figrue 1. Figures that show the relationship between gene expression and
> the cell sequencing depth (Fig 1C, D and Fig 3A in the paper[1]).
>
> Questions:
>
> 1. If I understand it correctly, genes are grouped genes into six bins
> based on log10-transformed mean UMI count. Do we need to regroup genes
> after log-normalization or sctransforms?
> 2. Is the x values for the kernel regression the log10-transformed
> total UMI count of each cell, which is, in other words, x =
> log10(PBMC$nCount_RNA)?
> 3. For doing kernel regressing on different types of data, like
> log-normalized data and Pearson residuals, are they using the same values
> of x which is the original total UMI count of each cell, or the new total
> counts based on newly calculated data, for example, the PBMC$nCount_SCT
for
> Pearson residuals?
> 4. For plotting, Should I use the geom_smooth from ggplot2 to draw the
> colored region, or simply use the values of quartiles to mark the
boundary
> of the region and filled it with a chosen color?
> 5. Will the shape(/trend) of the curve change after doing the z-score ?
> 6. Besides, I also notice that a small number of features are removed
> after the sctransform and I cannot find explanations about it online. Is
> there any automatic filtering in the sctransform?
>
> —
> Reply to this email directly, view it on GitHub
> <
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_satijalab_sctransform_issues_136&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=clhV8qJNn0k8zrnk_1IrvYVWx7biN23LkOIMoEt8Z1w&e=
>,
> or unsubscribe
> <
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AA4O4VMNZ65INJGHJ3YXA4DVHA64RANCNFSM5UM24K6Q&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=e2kPyDtZAzQoyPfLTjZ5kw&m=8niBKFJ186KcO6Caa0SLx_y69GlMFR_0c9VYTvd4j7p47MVVPPWTHp3E6s1K7I7I&s=waba3pYHmPFWqfMvXc_0LKs0EqdgjiZrdy7f2aitkgA&e=
>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#136 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIZIXNCV3EVZDFV7OQMKYIDVHDU5TANCNFSM5UM24K6Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am copying the issue from sainadfensi summited on Aug 19, 2019 to the Seurat repo in regards to the x and y axes in Figures 1C, D and Fig 3A from Hafemeister, C., & Satija, R.
I can not find an answer or any information pertaining to the topic and I would also like to reproduce these figures.
Thank you
The text was updated successfully, but these errors were encountered: