Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing mit with multiple conditions and details on model parameters #17

Open
Thapeachydude opened this issue Aug 12, 2023 · 1 comment

Comments

@Thapeachydude
Copy link

Thapeachydude commented Aug 12, 2023

Good morning,

I'd like to try your tool, I don't have much experience with scVI and mainly work with the single-cell experiment class in Bioconductor, so it would be grea to have your insights on how to best run it.

1). Multiple conditions:
The data set I have is not unlike the MIX-Seq one, but I'm dealing with multiple conditions, not two. Would you run them all together or rather separately pairwise (control vs treatment 1, then control vs treatment 2 etc)?

2). Raw or normalized counts:
Does the tool require raw or normalized counts? (assuming raw, but wanted to make sure).

3). Model parameters:
How do parameters such as the number of latent layers, n_hidden and batch size affect the output? I tried running my dataset using the parameters below (using all conditions vs dmso). The model finished, but looking at the salient UMAP it was one big blob.

## Initialize raw counts
adata.raw = adata
adata.layers["counts"] = adata.X.toarray() # keep raw counts for scdef
model = ContrastiveVI(
    adata,
    n_batch = 0, # no batch correction
    n_layers = 1, 
    n_hidden = 128,
    n_salient_latent=10,
    n_background_latent=10,
    use_observed_lib_size=False
)
background_indices = np.where(adata.obs["Treatment"] == "DMSO")[0]
target_indices = np.where(adata.obs["Treatment"] != "DMSO")[0]
model.train(
    check_val_every_n_epoch = 1,
    train_size = 0.8, # 0 to 1
    background_indices = background_indices,
    target_indices = target_indices,
    use_gpu = True,
    early_stopping = True,
    max_epochs = 1000,
    batch_size = 512
)
## Convert things back into R to continue with SCE class
reducedDim(sce, "salient") <- scd$obsm[["salient_rep"]] # assign as dimensional reduction to sce
set.seed(100)
sce <- runUMAP(sce, dimred = "salient", n_neighbors = 5, name = paste0("UMAP_salient"), BPPARAM = mcparam) # assuming euclidean distance works as a metric

Many thanks : )

@ethanweinberger
Copy link
Member

Hey @Thapeachydude. To answer your questions:

  1. I think the answer here really depends on what your exact goal is when analyzing your data. For example, if you want to zoom in on heterogeneity in response to a single treatment, then it might be easier to work with a single treatment at a time (as in the MIX-seq example in our manuscript). On the other hand, if you want to get more insights into the relationships between different treatments and any similarities/differences in their induced responses, then using data from all treatments together would make more sense. For example, with the CRISPR dataset considered in our manuscript we trained contrastiveVI with data from all the perturbations at once, and found that they grouped into previously identified categories of perturbation effects.
  2. Similar to scVI, contrastiveVI operates on raw counts.
  3. During our initial experiments we tried varying the numbers of hidden layers/hidden layer dimensions/batch size but didn't see huge differences (and thus, we stuck with the scvi-tools defaults for the experiments presented in the manuscript). As for your specific dataset- do you happen to have any sanity checks for expected separation in the salient space based on prior biological knowledge? My first idea based on your description is that the treatments could induce a strong effect, but these effects are pretty homogenous across the different treatments. In such a case we'd expect to see the single blob that you're observing. Also, did you try inspecting the shared latent space to see if the results make sense there? I'm happy to take a quick look if you're able to share an anndata file with me.

Best,
Ethan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants