-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addressing Batch Effects in Datasets with SUPPA2 #182
Comments
Dear Xi Xu,
We have recently handled batch effects using a linear model with co-factors
(see https://pubmed.ncbi.nlm.nih.gov/36518527/)
In this case, rather than performing a test between conditions, we try to
fit a linear model between the conditions. To that model, you can add a
list of cofactors, each described as a vector with the same number of
components as your patients. The cofactors could be numerical values (e.g.
age), nominal value (sex), another experimental variable (source,
post-mortem, …), or even values obtained from other methods that estimate
batch effects (e.g. SVA). This model will give you the events that best
correlate with the conditions accounting for all those sources of batch
effect.
An alternative might be correcting the read counts / TPM values for these
batch effects, and then running SUPPA. We have not tried this, so I would
not know if this is effective.
I hope this helps
Please do not hesitate to write back with more questions
Thanks a lot for using SUPPA
Best
Eduardo
…On Thu, 8 Feb 2024 at 03:35, XXuxi ***@***.***> wrote:
Dear SUPPA2 Development Team,
I am in the process of using SUPPA2 for differential splicing analysis and
have encountered an issue with batch effects in my dataset, which includes
sequencing samples from different batches. Upon analyzing TPM values
through PCA, it became evident that batch effects are present.
My query is: Should batch effect correction be performed on the TPM values
before running SUPPA2?
Thanks for your assistance.
Best regards,
Xi Xu
—
Reply to this email directly, view it on GitHub
<#182>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCZKB7O542RZUQI6L5ZEMLYSOUO7AVCNFSM6AAAAABC6EXRC2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZDGNBTGU4DONI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Dear Eduardo, I just wanted to ask, based on your response, I looked into this paper you attached and it does indeed mention a batch regression approach, but I can't seem to find the code. Can you please point me to where the code was deposited for this batch regression technique? Or, can you please explain exactly how the PSI values were re-modeled using linear regression? Also, does this regression approach still produce PSI values that range from 0 to 1? Thanks in advance! -Jay |
hi,
sorry for the delayed reply. It is a standard lm() function, correcting
with co-variables. We used those that we observed had the strongest
confounding effect. It is a fairly standard function in R. There should be
enough tutorials available or coding co-pilots that could help you identify
the syntax to do it. We'll try to make the code available in the SUPPA page.
Thanks
E.
…On Mon, 11 Mar 2024 at 14:15, jdee3 ***@***.***> wrote:
Dear Eduardo,
I just wanted to ask, based on your response, I looked into this paper you
attached and it does indeed mention a batch regression approach, but I
can't seem to find the code. Can you please point me to where the code was
deposited for this batch regression technique? Or, can you please explain
exactly how the PSI values were re-modeled using linear regression? Thanks
in advance!
-Jay
—
Reply to this email directly, view it on GitHub
<#182 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCZKB6EIFTLWCLSMTIT6Z3YXUOWRAVCNFSM6AAAAABC6EXRC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXGU2DSNZQGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I also have a question about the batch effects. I have RNA-seq data from brain tissue samples of epilepsy patients, but I don't have RNA-seq data from normal control individuals. I also haven't been able to find any case-control RNA-seq studies on epilepsy brain tissue. I would like to use external normal controls, such as those from GTEx, to identify differential splicing events. PSI value is a ratio. Suppose a batch effect influences the expression of a gene. Such batch effect will influence the quantification of each intron in the same direction. Taking the ratio between junction reads and exon reads may effectively cancels out the batch effect? I don't know if the batch effect has a significant impact on alternative splicing in the real data. Thanks in advanced! |
Thanks for the question
This is a recurrent issue when dealing with patient data.
In this case the condition (epilepsy vs control) would be confounded by
potential batch effects (data source, technology platform, etc).
Any non-biological effect will probably come from different sequencing
rates, coverage, etc...
One simple approach could be to start with genes that have similar
coverage, gene-expression values, ... and sufficient coverage across
samples within the same condition, so that PSI estimates in each condition
are reliable. That should give you a good set to test between conditions.
If you expect patient-to-patient variability, you could test each case vs
the distribution in controls, as we did here
https://www.cell.com/cell-reports/fulltext/S2211-1247(17)31104-X
This will give you a patient-based estimate of the splicing alterations
w.r.t. normal tissues.
The control of the read coverage at the level of junctions to address this
issue was implemented in MOCCASIN
(https://www.nature.com/articles/s41467-021-23608-9)
I haven't tried it myself, but it could help with controlling possible
batch effects in this case.
Please let me know how it goes
Eduardo
…On Fri, 7 Feb 2025 at 20:03, zhangpicb ***@***.***> wrote:
Hi @EduEyras <https://github.com/EduEyras> @jdee3
<https://github.com/jdee3>
I also have a question about the batch effects.
I have RNA-seq data from brain tissue samples of epilepsy patients, but I
don't have RNA-seq data from normal control individuals. I also haven't
been able to find any case-control RNA-seq studies on epilepsy brain
tissue. I would like to use external normal controls, such as those from
GTEx, to identify differential splicing events.
PSI value is a ratio. Suppose a batch effect influences the expression of
a gene. Such batch effect will influence the quantification of each intron
in the same direction. Taking the ratio between junction reads and exon
reads may effectively cancels out the batch effect?
I don't know if the batch effect has a significant impact on alternative
splicing in the real data.
Thanks in advanced!
—
Reply to this email directly, view it on GitHub
<#182 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCZKB6ICINQK3OU7NU24DL2ORZGXAVCNFSM6AAAAABWVO74RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBSGMZDMNBZGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Dear SUPPA2 Development Team,
I am in the process of using SUPPA2 for differential splicing analysis and have encountered an issue with batch effects in my dataset, which includes sequencing samples from different batches. Upon analyzing TPM values through PCA, it became evident that batch effects are present.
My query is: Should batch effect correction be performed on the TPM values before running SUPPA2?
Thanks for your assistance.
Best regards,
Xi Xu
The text was updated successfully, but these errors were encountered: