steml
is a python module with tools to load, transform, and analyze spatial transcriptomics data. The spatial transcriptomics technology currently used is 10x Genomics Visium.
The module must first be installed by cloning the repo, initializing and installing the conda environment provided by env.yml
, and finally installing the module itself with pip install .
at the root of the repo. Updates to the codebase require the module to be reinstalled with pip install .
.
The repo is divided into several subdirectories to organize code, scripts, and experiments:
- Useful scripts which call external software are contained within
scripts
(e.g. runningspaceranger count
to process the raw FastQ data). - The machine learning pipeline source code is contained within the
steml
subdirectory. This is the actual python module which is divided into submodules:- The
recipes
submodule contains the code for functions which should be called directly by a user of the module. These include data preprocessing and model training recipes. - The
data
submodule contains the code for the data loading tools. Notably, the data preprocessing tools are not contained here, but rather in therecipes
submodule. Further data preprocessing tools are jupyter notebooks in thenotebooks
subdirectory at the root of the repo. - The
models
submodule contains the code for instantiating machine learning models. The model architecture currently implemented is ResNet18 with the capability to predict categorical outputs or regress continuous values. - The
plots
submodule contains the code for some limited plotting functionality for modeling results. Many more plots are currently generated by notebooks in thenotebooks
subdirectory at the root of the repo.
- The
notebooks
contain several useful jupyter notebooks which preprocess data, perform axis alignment, and plot results from the machine learning pipeline.experiments
contains the python scripts which call thesteml
pipeline. The scripts are titled their experiment ID. Exact details for each experiment are recorded in a table atexperiments.csv
.
This repo depends on code and data from the following sources: