Applying CellRank to multiple samples (with possible batch effects) #1141
Replies: 5 comments
-
Hi @YitengDang, thanks for your question. Could you please let me know which Cellrank 2 kernel you were planning to apply? That influences how to approach data integration in this context. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the reply! I'm not 100% sure which kernel will work best, but for now my plan is to try the RealTimeKernel and VelocityKernel. I have libraries across 3 developmental time points, for each time point I have 2 conditions each (and multiple replicates per time+condition), but there are strong batch effects between these conditions. Preliminary RNA velocity results were confusing (inferred directions do not match biological knowledge, but this could be due to streamlines not being captured well in a low-dimensional representation). In any case, if results seem inconsistent then I will use or combine kernels with PseudotimeKernel. |
Beta Was this translation helpful? Give feedback.
-
Hi @YitengDang, if you have a time-series dataset, then I think using the RealTimeKernel, in combination with either the ConnectivityKernel or the PseudoTime kernel, to include within-time point dynamics, might work best for you, and would also make the integration problem easier, as you won't have to worry about integrating spliced/unspliced counts. There would be two parts to this challenge:
Alternatively, you could compute one cross-time point trajectory for each condition, i.e. not integrate the data and then make comparisions between conditions on the level of these trajectories. Hope this helps! |
Beta Was this translation helpful? Give feedback.
-
Hi @Marius1311,
|
Beta Was this translation helpful? Give feedback.
-
I think that depends on whether transitions within these libraries would make sense biologically - if these are really just replicates that describe essentially the same biology, I think I would allow transitions across replicates, hence first combining the replicates, then applying the kernel. |
Beta Was this translation helpful? Give feedback.
-
I have a large scRNAseq data set with multiple libraries for different experimental conditions (time, tissue, perturbation), which I have integrated using Scanorama.
Now I would like to do trajectory inference on this combined data set with CellRank, but was wondering what is the best way of approaching this.
Apply trajectory inference to separate samples, then merge together results. Problem is that the inferred time from one sample might not be comparable to that of another sample.
Apply trajectory inference to integrated data. This only makes sense to me if the data integration directly transforms the counts (e.g. with scVI). Otherwise two biologically similar cell populations from different libraries might still be assigned very different pseudotimes if the data integration is only at the dimensionality reduction level (e.g. with Scanorama).
I found one paper discussing a method for dealing with exactly this problem, but I’m not sure how well it works: https://www.biorxiv.org/content/10.1101/2021.03.09.433671v1
If you have any thoughts or suggestions on how to approach this, let me know!
Beta Was this translation helpful? Give feedback.
All reactions