Skip to content

deepomicslab/StrInt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StrInt

The software implementation of the method in Deciphering more accurate cell-cell interactions by modeling cells and their interactions.

StrInt-Main

Pre-requirements

  • numpy, pandas==1.5.2
  • scipy, scanpy, umap
  • loess
  • smurf-imputation

Installation

pip install pyStrint

Input file format

1. DataFrame format

Expand section
  • Spatial Transcriptomics (ST) Count Data

    • st_exp dataframe with spots as rows and genes as columns
  • Spatial coordinates

    • st_coord dataframe with spot as rows, axis x and y as columns
  • Cell-type deconvoluted spatial matrix

    • st_decon dataframe with spot as rows and cell-type as columns
  • Single-cell RNA-seq Count Data

    • sc_exp dataframe with cells as rows and genes as columns
  • Single-cell RNA-seq Metadata

    • sc_meta dataframe with cells as rows and cell types as columns
    • cell_type_key column name of the celltype identity in sc_meta
  • Single-cell RNA-seq distribution Data

    • sc_distribution dataframe with cells as rows and genes as columns
  • Ligand and Receptor Data (optional)

    • lr_df user provided dataframe with ligand-receptor pairs as rows, ligand, receptor and its weight as columns

Convert to adata format

sc_adata, st_adata, sc_distribution, lr_df = pp.prep_adata(sc_exp = sc_exp, st_exp = st_exp, sc_distribution = sc_smurf, 
                            sc_meta = sc_meta, st_coord = st_coord, SP = species)

2. Adata format

Expand section
  • Spatial Transcriptomics (ST) Count Data

    • st_adata adata.X with spots as rows and genes as columns
    • st_adata.obs dataframe with spot as rows, spot coordinates x and y as columns
  • Cell-type deconvoluted spatial matrix

    • st_decon dataframe with spot as rows and cell-type as columns
  • Single-cell RNA-seq Count Data

    • sc_adata adata.X dataframe with cells as rows and genes as columns
    • sc_adata.obs dataframe with cells as rows and cell types as columns
  • Single-cell RNA-seq distribution Data

    • sc_distribution dataframe with cells as rows and genes as columns

Usages

Prep object

Expand section
obj = spamint.spaMint(save_path = outDir, st_adata = st_adata, weight = st_decon, 
                 sc_distribution = sc_distribution, sc_adata = sc_adata, cell_type_key = 'celltype', 
                 st_tp = st_tp)
obj.prep()

Parameters

  • save_path Output Dir to save results

  • st_adata adata.X Spatial Transcriptomics (ST) Count Data with spots as rows and genes as columns

    • st_adata.obs dataframe with spot as rows, spot coordinates x and y as columns
  • weight Cell-type deconvoluted spatial dataframe with spot as rows and cell-type as columns

  • sc_distribution Single-cell RNA-seq distribution dataframe with cells as rows and genes as columns

  • sc_adata adata.X Single-cell RNA-seq Count dataframe with cells as rows and genes as columns

    • sc_adata.obs dataframe with cells as rows and cell types as columns
  • cell_type_key cell type colname in sc_adata.obs

  • st_tp ST sequencing platform choose from st (ST legacy), visium (10X Visium), or slide-seq (Any single-cell resolution data)

Initial process (Cell selection)

Expand section
sc_agg_meta = select_cells(self, p = 0.1, mean_num_per_spot = 10,  max_rep = 3, repeat_penalty = 10)
  • p percentage of the interface similarity during cell selection
  • mean_num_per_spot Average number of cells per spot.
  • max_rep Maximum number of repetitions for cell selection.
  • repeat_penalty When one cell has been picked for [THIS] many times, its probability of being picked again decreases by half. Recommanded to be near (st_exp.shape[0]*num_per_spot/sc_exp.shape[0]) * 10

Refinement process (Gradient descent)

Expand section
refine_sc_exp, sc_agg_meta = gradient_descent(self, alpha = 1, beta = 0.001, gamma = 0.001, 
                delta = 0.1, eta = 0.0005, 
               init_sc_embed = None,
               iteration = 20, k = 2, W_HVG = 2,
               left_range = 0, right_range = 8, steps = 1, dim = 2)
  • alpha, beta, gamma, delta Hyperparameters for the loss function.

    alpha: the weight of the term that maintains the expression similarity between cells and their respective gamma distribution models, default: 1.

    beta: the weight of adjusting cell locations based on cell-cell affinity.

    gamma: the weight of optimizing interface profile similarity between pseudo-spots and their corresponding ST spots, default: 0.001.

    delta: the weight of the regularization term.

  • eta float, default: 0.0005

    Learning rate for gradient descent.

  • init_sc_embed DataFrame, optional, default: None

    Initial embedding for single-cell data.

  • iteration int, optional, default: 20

    The number of iterations for optimization.

  • k int, optional, default: 2

    The number of neighbors in each adjacent spot.

  • W_HVG int, optional, default: 2

    Weight for highly variable genes.

  • left_range int, optional, default: 0

  • right_range int, optional, default: 8

    The index range for the neighbor number in the embedding process, the actual neighbor number is (i+1)*10

  • steps int, optional, default: 1

    The iteration number for each neighbor

  • dim int, optional, default: 2

    The embedding dimension of the reconstruction

More details in demo_tutorial.ipynb
tutorial file can be downloaded at: https://drive.google.com/drive/folders/1FYa4hzg3vVo6y2BOzlJbXhPTmdEcjD4O?usp=sharing

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published