Skip to content

Find structured-datasets from within the Sequence Read Archive

Notifications You must be signed in to change notification settings

mbernste/hypothesis-driven-SRA-queries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Matthew BernsteinMatthew Bernstein
Matthew Bernstein
and
Matthew Bernstein
Jul 23, 2020
09ae56b · Jul 23, 2020

History

36 Commits
Jul 23, 2020
Mar 23, 2020
Jul 23, 2020
Jul 23, 2020
Jul 23, 2020
Jul 20, 2020
Jul 23, 2020
Jul 23, 2020

Repository files navigation

Jupyter notebook-based tools for building structured datasets from the Sequence Read Archive

This repository implements a pair of Jupyter notebook-based tools that utilize the MetaSRA for building structured datasets from the SRA in order to facilitate secondary analyses of the SRA’s human RNA-seq data:

  • Case-Control Finder: Finds suitable case and control samples for a given disease or condition where the cases and controls are matched by tissue or cell type.
  • Series Finder: Finds ordered sets of samples for the purpose of addressing biological questions pertaining to changes over a numerical property such as time.

For a more in depth description of these tools, please see our publication:

Bernstein, M.N. et al. (2020). Jupyter notebook-based tools for building structured datasets from the Sequence Read Archive. F1000Research, 9:376

A few things to note:

Running the notebooks on Google Colab

The notebooks can be executed in the cloud via Google Colab:

Running the notebooks locally

Setup

The dependencies for these notebooks are described in requirements.txt. To install these dependencies, please run

pip install -r requirements.txt

Furthermore, before running the notebook, you must unpack the static metadata files from data.tar.gz. To do so, run the following command:

tar -zcf data.tar.gz

Running the notebooks

To run the Case-Control Finder, run:

jupyter notebook case_control_finder.ipynb

To run the Series Finder, run:

jupyter notebook series_finder.ipynb

Contributors

  • Matthew Bernstein
  • Emily Clough
  • Ariella Gladstein
  • Khun Zaw Latt
  • Ben Busby
  • Allissa Dillman