Addressing Batch Effects in Datasets with SUPPA2 #182

XXuxi · 2024-02-07T16:35:47Z

Dear SUPPA2 Development Team,

I am in the process of using SUPPA2 for differential splicing analysis and have encountered an issue with batch effects in my dataset, which includes sequencing samples from different batches. Upon analyzing TPM values through PCA, it became evident that batch effects are present.

My query is: Should batch effect correction be performed on the TPM values before running SUPPA2?

Thanks for your assistance.

Best regards,
Xi Xu

EduEyras · 2024-02-14T10:41:27Z

Dear Xi Xu, We have recently handled batch effects using a linear model with co-factors (see https://pubmed.ncbi.nlm.nih.gov/36518527/) In this case, rather than performing a test between conditions, we try to fit a linear model between the conditions. To that model, you can add a list of cofactors, each described as a vector with the same number of components as your patients. The cofactors could be numerical values (e.g. age), nominal value (sex), another experimental variable (source, post-mortem, …), or even values obtained from other methods that estimate batch effects (e.g. SVA). This model will give you the events that best correlate with the conditions accounting for all those sources of batch effect. An alternative might be correcting the read counts / TPM values for these batch effects, and then running SUPPA. We have not tried this, so I would not know if this is effective. I hope this helps Please do not hesitate to write back with more questions Thanks a lot for using SUPPA Best Eduardo

…

On Thu, 8 Feb 2024 at 03:35, XXuxi ***@***.***> wrote: Dear SUPPA2 Development Team, I am in the process of using SUPPA2 for differential splicing analysis and have encountered an issue with batch effects in my dataset, which includes sequencing samples from different batches. Upon analyzing TPM values through PCA, it became evident that batch effects are present. My query is: Should batch effect correction be performed on the TPM values before running SUPPA2? Thanks for your assistance. Best regards, Xi Xu — Reply to this email directly, view it on GitHub <#182>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCZKB7O542RZUQI6L5ZEMLYSOUO7AVCNFSM6AAAAABC6EXRC2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZDGNBTGU4DONI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

jdee3 · 2024-03-11T03:15:31Z

Dear Eduardo,

I just wanted to ask, based on your response, I looked into this paper you attached and it does indeed mention a batch regression approach, but I can't seem to find the code. Can you please point me to where the code was deposited for this batch regression technique? Or, can you please explain exactly how the PSI values were re-modeled using linear regression?

Also, does this regression approach still produce PSI values that range from 0 to 1? Thanks in advance!

-Jay

EduEyras · 2024-03-19T12:30:42Z

hi, sorry for the delayed reply. It is a standard lm() function, correcting with co-variables. We used those that we observed had the strongest confounding effect. It is a fairly standard function in R. There should be enough tutorials available or coding co-pilots that could help you identify the syntax to do it. We'll try to make the code available in the SUPPA page. Thanks E.

…

On Mon, 11 Mar 2024 at 14:15, jdee3 ***@***.***> wrote: Dear Eduardo, I just wanted to ask, based on your response, I looked into this paper you attached and it does indeed mention a batch regression approach, but I can't seem to find the code. Can you please point me to where the code was deposited for this batch regression technique? Or, can you please explain exactly how the PSI values were re-modeled using linear regression? Thanks in advance! -Jay — Reply to this email directly, view it on GitHub <#182 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCZKB6EIFTLWCLSMTIT6Z3YXUOWRAVCNFSM6AAAAABC6EXRC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXGU2DSNZQGE> . You are receiving this because you commented.Message ID: ***@***.***>

zhangpicb · 2025-02-07T09:03:15Z

Hi @EduEyras @jdee3

I also have a question about the batch effects.

I have RNA-seq data from brain tissue samples of epilepsy patients, but I don't have RNA-seq data from normal control individuals. I also haven't been able to find any case-control RNA-seq studies on epilepsy brain tissue. I would like to use external normal controls, such as those from GTEx, to identify differential splicing events.

PSI value is a ratio. Suppose a batch effect influences the expression of a gene. Such batch effect will influence the quantification of each intron in the same direction. Taking the ratio between junction reads and exon reads may effectively cancels out the batch effect?

I don't know if the batch effect has a significant impact on alternative splicing in the real data.

Thanks in advanced!

EduEyras · 2025-02-10T00:39:58Z

Thanks for the question This is a recurrent issue when dealing with patient data. In this case the condition (epilepsy vs control) would be confounded by potential batch effects (data source, technology platform, etc). Any non-biological effect will probably come from different sequencing rates, coverage, etc... One simple approach could be to start with genes that have similar coverage, gene-expression values, ... and sufficient coverage across samples within the same condition, so that PSI estimates in each condition are reliable. That should give you a good set to test between conditions. If you expect patient-to-patient variability, you could test each case vs the distribution in controls, as we did here https://www.cell.com/cell-reports/fulltext/S2211-1247(17)31104-X This will give you a patient-based estimate of the splicing alterations w.r.t. normal tissues. The control of the read coverage at the level of junctions to address this issue was implemented in MOCCASIN (https://www.nature.com/articles/s41467-021-23608-9) I haven't tried it myself, but it could help with controlling possible batch effects in this case. Please let me know how it goes Eduardo

…

On Fri, 7 Feb 2025 at 20:03, zhangpicb ***@***.***> wrote: Hi @EduEyras <https://github.com/EduEyras> @jdee3 <https://github.com/jdee3> I also have a question about the batch effects. I have RNA-seq data from brain tissue samples of epilepsy patients, but I don't have RNA-seq data from normal control individuals. I also haven't been able to find any case-control RNA-seq studies on epilepsy brain tissue. I would like to use external normal controls, such as those from GTEx, to identify differential splicing events. PSI value is a ratio. Suppose a batch effect influences the expression of a gene. Such batch effect will influence the quantification of each intron in the same direction. Taking the ratio between junction reads and exon reads may effectively cancels out the batch effect? I don't know if the batch effect has a significant impact on alternative splicing in the real data. Thanks in advanced! — Reply to this email directly, view it on GitHub <#182 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCZKB6ICINQK3OU7NU24DL2ORZGXAVCNFSM6AAAAABWVO74RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBSGMZDMNBZGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addressing Batch Effects in Datasets with SUPPA2 #182

Addressing Batch Effects in Datasets with SUPPA2 #182

XXuxi commented Feb 7, 2024

EduEyras commented Feb 14, 2024 via email

jdee3 commented Mar 11, 2024 •

edited

Loading

EduEyras commented Mar 19, 2024 via email

zhangpicb commented Feb 7, 2025

EduEyras commented Feb 10, 2025 via email

Addressing Batch Effects in Datasets with SUPPA2 #182

Addressing Batch Effects in Datasets with SUPPA2 #182

Comments

XXuxi commented Feb 7, 2024

EduEyras commented Feb 14, 2024 via email

jdee3 commented Mar 11, 2024 • edited Loading

EduEyras commented Mar 19, 2024 via email

zhangpicb commented Feb 7, 2025

EduEyras commented Feb 10, 2025 via email

jdee3 commented Mar 11, 2024 •

edited

Loading