Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expectations for UMAP of embeddings #10

Open
Thapeachydude opened this issue Dec 4, 2023 · 2 comments
Open

Expectations for UMAP of embeddings #10

Thapeachydude opened this issue Dec 4, 2023 · 2 comments

Comments

@Thapeachydude
Copy link

Hi,

interesting tool (and the use of the single-cell experiment class is much appreciated). We have single-cell RNA-Seq data from multiple samples at various conditions. This far, I've seen the most sensible results by creating pseudo-bulks per cell type, sample and condition and fitting a linear model ~ sample + Condition using edgeR.

So, I was very curious when I saw your tool. This far, however, I'm not sure I understand the output.
I ran:

fit <- lemur(sce, design = ~ sampleID + Treatment, n_embedding = 20)

set.seed(100)
fit <- runUMAP(fit, dimred = "embedding", n_neighbors = 15, min_dist = 0.25, name = "UMAP_embedding", BPPARAM = mcparam)

to get an overview of the embeddings. I would expect at least some separation based on the conditions (since for some of them the pseudo-bulk results are quite strong, and we can even appreciate them in a UMAP of PCA loadings). But I see a big blob of cells, and some very small individual groups. But no "Treatment-shifts".

Is this what you would expect?

Btw. is there a way to limit the memory of the lemur function call? It is very fast but super memory intesive. Ideally, I don't always need to run it on a HPC.

Best,
M

@const-ae
Copy link
Owner

const-ae commented Dec 4, 2023

Hi M,

thanks for your interest and reaching out.

I would expect at least some separation based on the conditions (since for some of them the pseudo-bulk results are quite strong, and we can even appreciate them in a UMAP of PCA loadings)

LEMUR tries to absorb as much of the variation in the data associated with the known covariates into $R(x)$ and $S(x)$ so that the embedding ($Z$) will show you the residual variance (i.e., everything that is varying not due to sampleID or Treatment). This would typically be different cell states.

But I see a big blob of cells, and some very small individual groups. But no "Treatment-shifts".

Depending on your data this might be a reasonable outcome. If there is not much latent heterogeneity (is your data from a cell line for example?) you would expect to see one big blob and with cells from all conditions intermixed.

Best,
Constantin

@const-ae
Copy link
Owner

const-ae commented Dec 4, 2023

Btw. is there a way to limit the memory of the lemur function call? It is very fast but super memory intesive. Ideally, I don't always need to run it on a HPC.

There currently is no easy way to limit the memory requirements beyond subsampling your cells and subsetting to a reasonable set of highly variable genes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants