Dealing mit with multiple conditions and details on model parameters #17

Thapeachydude · 2023-08-12T09:08:46Z

Good morning,

I'd like to try your tool, I don't have much experience with scVI and mainly work with the single-cell experiment class in Bioconductor, so it would be grea to have your insights on how to best run it.

1). Multiple conditions:
The data set I have is not unlike the MIX-Seq one, but I'm dealing with multiple conditions, not two. Would you run them all together or rather separately pairwise (control vs treatment 1, then control vs treatment 2 etc)?

2). Raw or normalized counts:
Does the tool require raw or normalized counts? (assuming raw, but wanted to make sure).

3). Model parameters:
How do parameters such as the number of latent layers, n_hidden and batch size affect the output? I tried running my dataset using the parameters below (using all conditions vs dmso). The model finished, but looking at the salient UMAP it was one big blob.

## Initialize raw counts
adata.raw = adata
adata.layers["counts"] = adata.X.toarray() # keep raw counts for scdef

model = ContrastiveVI(
    adata,
    n_batch = 0, # no batch correction
    n_layers = 1, 
    n_hidden = 128,
    n_salient_latent=10,
    n_background_latent=10,
    use_observed_lib_size=False
)

background_indices = np.where(adata.obs["Treatment"] == "DMSO")[0]
target_indices = np.where(adata.obs["Treatment"] != "DMSO")[0]

model.train(
    check_val_every_n_epoch = 1,
    train_size = 0.8, # 0 to 1
    background_indices = background_indices,
    target_indices = target_indices,
    use_gpu = True,
    early_stopping = True,
    max_epochs = 1000,
    batch_size = 512
)

## Convert things back into R to continue with SCE class
reducedDim(sce, "salient") <- scd$obsm[["salient_rep"]] # assign as dimensional reduction to sce
set.seed(100)
sce <- runUMAP(sce, dimred = "salient", n_neighbors = 5, name = paste0("UMAP_salient"), BPPARAM = mcparam) # assuming euclidean distance works as a metric

Many thanks : )

The text was updated successfully, but these errors were encountered:

ethanweinberger · 2023-08-28T18:05:31Z

Hey @Thapeachydude. To answer your questions:

I think the answer here really depends on what your exact goal is when analyzing your data. For example, if you want to zoom in on heterogeneity in response to a single treatment, then it might be easier to work with a single treatment at a time (as in the MIX-seq example in our manuscript). On the other hand, if you want to get more insights into the relationships between different treatments and any similarities/differences in their induced responses, then using data from all treatments together would make more sense. For example, with the CRISPR dataset considered in our manuscript we trained contrastiveVI with data from all the perturbations at once, and found that they grouped into previously identified categories of perturbation effects.
Similar to scVI, contrastiveVI operates on raw counts.
During our initial experiments we tried varying the numbers of hidden layers/hidden layer dimensions/batch size but didn't see huge differences (and thus, we stuck with the scvi-tools defaults for the experiments presented in the manuscript). As for your specific dataset- do you happen to have any sanity checks for expected separation in the salient space based on prior biological knowledge? My first idea based on your description is that the treatments could induce a strong effect, but these effects are pretty homogenous across the different treatments. In such a case we'd expect to see the single blob that you're observing. Also, did you try inspecting the shared latent space to see if the results make sense there? I'm happy to take a quick look if you're able to share an anndata file with me.

Best,
Ethan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing mit with multiple conditions and details on model parameters #17

Dealing mit with multiple conditions and details on model parameters #17

Thapeachydude commented Aug 12, 2023 •

edited

Loading

ethanweinberger commented Aug 28, 2023

Dealing mit with multiple conditions and details on model parameters #17

Dealing mit with multiple conditions and details on model parameters #17

Comments

Thapeachydude commented Aug 12, 2023 • edited Loading

ethanweinberger commented Aug 28, 2023

Thapeachydude commented Aug 12, 2023 •

edited

Loading