-
Notifications
You must be signed in to change notification settings - Fork 615
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit c22e48a
Showing
37 changed files
with
9,157 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Scanpy | ||
data/ | ||
figs/ | ||
*/figs/ | ||
write/ | ||
|
||
# always-ignore extensions | ||
*~ | ||
|
||
# Python / Byte-compiled / optimized / DLL | ||
__pycache__/ | ||
*.py[cod] | ||
|
||
# OS or Editor files and folders | ||
.DS_Store | ||
Thumbs.db | ||
.ipynb_checkpoints/ | ||
.directory | ||
/.idea/ | ||
|
||
# always-ignore directories | ||
/dist/ | ||
/build/ | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,274 @@ | ||
[Example Use](#examples) | | ||
[Tools](#tools) | | ||
[Installation](#install) | | ||
[References](#references) | ||
|
||
# Scanpy - Single-Cell Analysis in Python | ||
|
||
Tools for analyzing and simulating single-cell data that aim at an understanding | ||
of dynamic biological processes from snapshots of transcriptome or | ||
proteome. | ||
|
||
* [preprocess](scanpy/tools/preprocess.py) - Filter, subsample, normalize data, | ||
and identify highly variable genes. | ||
|
||
* [pca](scanpy/tools/pca.py) - Visualize data using PCA. | ||
|
||
* [diffmap](scanpy/tools/diffmap.py) - Visualize data using Diffusion Maps | ||
([Coifman *et al.*, 2005](#ref_coifman05); [Haghverdi *et al.*, | ||
2015](#ref_haghverdi15)). | ||
|
||
* [tsne](scanpy/tools/tsne.py) - Visualize data using t-SNE ([van | ||
der Maaten & Hinton, 2008](#ref_vandermaaten08); [Amir *et al.*, 2013](#ref_amir13)). | ||
|
||
* [dpt](scanpy/tools/dpt.py) - Infer progression of cells, identify *branching* | ||
subgroups ([Haghverdi *et al.*, 2016](#ref_haghverdi16)). | ||
|
||
* [difftest](scanpy/tools/difftest.py) - Test for differential gene | ||
expression. | ||
|
||
* [sim](scanpy/tools/sim.py) - Simulate dynamic gene expression data ([Wittmann | ||
*et al.*, 2009](#ref_wittmann09)). | ||
|
||
Scanpy features a high number of use cases that can be called directly | ||
from the command line. | ||
|
||
* [examples](scanpy/exs/builtin.py) - Summarizes example data and use | ||
cases. Allows to call data reading, annotating and preprocessing steps as well | ||
as tool paramters using a single key. | ||
|
||
The draft [Wolf & Theis (2017)](http://falexwolf.de/docs/scanpy.pdf) explains | ||
conceptual ideas and usage as a library. Potential coauthors who would like to | ||
work on software and manuscript are welcome! Contributors, who merely want to | ||
add their own example or tool are welcome, too! Any comments are appreciated! | ||
|
||
## Example Use <a id="examples"></a> | ||
|
||
The following command-line examples call the wrapper | ||
[scripts/scanpy.py](scripts/scanpy.py), which works **without** | ||
[installation](#install). To get all required packages (all default in | ||
[Anaconda](https://www.continuum.io/downloads)), install | ||
[miniconda](http://conda.pydata.org/miniconda.html) and run `conda install scipy | ||
matplotlib h5py pandas xlrd`. | ||
|
||
Download or clone the repository - green button on top of the page - and `cd` | ||
into its root directory. | ||
|
||
#### Data of [Moignard *et al.* (2015)](#ref_moignard15) <a id="moignard15"></a> | ||
|
||
Early mesoderm cells in mouse differentiate through three subsequent stages (PS, | ||
NP, HF) and then branch into erythorytes (4SG) and endothelial cells (4SFG). | ||
```shell | ||
python scripts/scanpy.py moignard15 pca | ||
python scripts/scanpy.py moignard15 tsne | ||
python scripts/scanpy.py moignard15 diffmap | ||
``` | ||
<img src="http://falexwolf.de/scanpy/figs/moignard15_pca.png" height="175"> | ||
<img src="http://falexwolf.de/scanpy/figs/moignard15_tsne.png" height="175"> | ||
<img src="http://falexwolf.de/scanpy/figs/moignard15_diffmap.png" height="175"> | ||
|
||
Diffusion Pseudotime (DPT) analysis reveals differentation and branching. It | ||
detects the *trunk* of progenitor cells (segment 0) and the *branches* of endothelial | ||
cells (segment 1/2) and erythrocytes (segment 3). The inferred *pseudotime* | ||
traces the degree of cells' progression in the differentiation process. | ||
```shell | ||
python scripts/scanpy.py moignard15 dpt | ||
``` | ||
<img src="http://falexwolf.de/scanpy/figs/moignard15_dpt_diffmap.png" height="175"> | ||
<img src="http://falexwolf.de/scanpy/figs/moignard15_dpt_segpt.png" height="175"> | ||
|
||
This orders cells by segment, and within each segment, by pseudotime and outputs | ||
|
||
<img src="http://falexwolf.de/scanpy/figs/moignard15_dpt_heatmap.png" height="250"> | ||
|
||
With this, we reproduced Fig. 1 from [Haghverdi *et al.* | ||
(2016)](#ref_haghverdi16). See this [notebook](examples/moignard15.ipynb) for | ||
more information. | ||
|
||
#### Get help | ||
|
||
Show available example data and example use cases, respectively | ||
```shell | ||
python scripts/scanpy.py exdata | ||
python scripts/scanpy.py examples | ||
``` | ||
|
||
Get general and tool-specific help, respectively, | ||
```shell | ||
python scripts/scanpy.py --help | ||
python scripts/scanpy.py dpt --help | ||
``` | ||
|
||
#### Add your own examples | ||
|
||
To add your own examples, simply modify the content of the function | ||
`myexample()` in [scanpy/exs/user.py](scanpy/exs/user.py). Consider using copy | ||
and paste from [scanpy/exs/builtin.py](scanpy/exs/builtin.py). Call your example | ||
using | ||
```shell | ||
python scripts/scanpy.py pca myexample | ||
``` | ||
|
||
Also, it'd be awesome if you provide your working example for the public Scanpy | ||
examples: copy your example from [scanpy/exs/user.py](scanpy/exs/user.py) to | ||
[scanpy/exs/builtin.py](scanpy/exs/builtin.py) with a link to the public data and | ||
make a pull request. If you have questions or prefer sending your script by | ||
email, contact Alex. | ||
|
||
#### Data of [Paul *et al.* (2015)](#ref_paul15) | ||
|
||
Diffusion Pseudotime (DPT) analysis detects the branch of granulocyte/macrophage | ||
progenitors (GMP), and the branch of megakaryocyte/erythrocyte progenitors | ||
(MEP). There are two small further subgroups (*segments* 0 and 2). | ||
```shell | ||
python scripts/scanpy.py paul15 dpt | ||
``` | ||
<img src="http://falexwolf.de/scanpy/figs/paul15_dpt_diffmap.png" height="175"> | ||
<img src="http://falexwolf.de/scanpy/figs/paul15_dpt_segpt.png" height="175"> | ||
|
||
We can now test for differential gene expression. | ||
```shell | ||
python scripts/scanpy.py paul15 difftest | ||
``` | ||
<img src="http://falexwolf.de/scanpy/figs/paul15_difftest.png" height="175"> | ||
|
||
See the [notebook](examples/paul15.ipynb) for more information. | ||
|
||
#### Simulated myeloid progenitor data ([Krumsiek *et al.*, 2011](#ref_krumsiek11)) <a id="krumsiek11"></a> | ||
|
||
Here, we are going to simulate some data using a literature-curated boolean gene | ||
regulatory network, which is believed to describe myeloid differentiation | ||
([Krumsiek *et al.*, 2011](#ref_krumsiek11)). Using [sim.py](scanpy/sim.py), the | ||
[boolean model](models/krumsiek11.txt) is translated into a stochastic | ||
differential equation ([Wittmann *et al.*, 2009](#ref_wittmann09)). Simulations result | ||
in branching time series of gene expression, where each branch corresponds to a | ||
certain cell fate of common myeloid progenitors (megakaryocytes, erythrocytes, | ||
granulocytes and monocytes). | ||
```shell | ||
python scripts/scanpy.py krumsiek11 sim | ||
``` | ||
<img src="http://falexwolf.de/scanpy/figs/krumsiek11_sim.png" height="175"> | ||
<img src="http://falexwolf.de/scanpy/figs/krumsiek11_sim_shuffled.png" height="175"> | ||
|
||
If the order is shuffled, as in a snapshot, the same data looks as on the right. | ||
Let us reconstruct the process using DPT and obtain the branching lineages | ||
```shell | ||
python scripts/scanpy.py krumsiek11 dpt --plotparams layout 3d | ||
``` | ||
<img src="http://falexwolf.de/scanpy/figs/krumsiek11_dpt_vsorder.png" height="175"> | ||
<img src="http://falexwolf.de/scanpy/figs/krumsiek11_dpt_segpt.png" height="175"> | ||
|
||
The left panel illustrates how the data is organized according to a *pseudotime* | ||
and different *segments*. Pseudotime 'estimates geodesic distance on the | ||
manifold' from a root cell. Segments are discrete partitions of the data. | ||
|
||
<img src="http://falexwolf.de/scanpy/figs/krumsiek11_dpt_diffmap.png" height="175"> | ||
|
||
See the [notebook](examples/krumsiek11.ipynb) for more. | ||
|
||
|
||
## Tools <a id="tools"></a> | ||
|
||
### diffmap <a id="diffmap"></a> | ||
|
||
[diffmap](scanpy/tools/diffmap.py) implements *diffusion maps* ([Coifman *et al.*, | ||
2005)](#ref_coifman05)), which has been proposed for visualizing single-cell data by | ||
[Haghverdi *et al.* (2015)](#ref_haghverdi15). Also, [diffmap](scanpy/tools/diffmap.py) | ||
accounts for modifications to the original algorithm proposed by [Haghverdi *et | ||
al.* (2016)](#ref_haghverdi16). | ||
|
||
### dpt <a id="dpt"></a> | ||
|
||
[dpt](scanpy/tools/dpt.py) implements Diffusion Pseudotime as introduced by [Haghverdi *et | ||
al.* (2016)](#ref_haghverdi16). | ||
|
||
The functions of these diffmap and dpt compare to the R package | ||
[destiny](http://bioconductor.org/packages/release/bioc/html/destiny.html) of | ||
[Angerer *et al.* (2015)](#ref_angerer16). | ||
|
||
### sim <a id="sim"></a> | ||
|
||
[sim](scanpy/_sim.py) provides a way to sample from continuous stochastic | ||
differential equations that correspond to simple, qualitiative, | ||
literature-curated boolean gene regulatory networks, as suggested by [Wittmann | ||
*et al.* (2009)](#ref_wittmann09). | ||
|
||
The tool compares to the Matlab tool *Odefy* of [Krumsiek *et al.* | ||
(2010)](#ref_krumsiek10). | ||
|
||
## Installation <a id="install"></a> | ||
|
||
Command-line use through the wrapper [scripts/scanpy.py](scripts/scanpy.py) | ||
works **without** installation. To get all required packages (all default | ||
in [Anaconda](https://www.continuum.io/downloads)), install | ||
[miniconda](http://conda.pydata.org/miniconda.html) and run `conda install scipy | ||
matplotlib h5py pandas xlrd`. | ||
|
||
You can install Scanpy, hence `import scanpy` from anywhere on your system, via | ||
```shell | ||
pip install . | ||
``` | ||
Then, `scanpy` becomes a top-level command available from anywhere in the | ||
system, call `scanpy --help`. The package is | ||
[registered](https://pypi.python.org/pypi/scanpy/0.1) in the [Python Packaging | ||
Index](https://pypi.python.org/pypi), but versioning has not started yet. In the | ||
future, installation will be possible without reference to GitHub via | ||
`pip install scanpy`. | ||
|
||
## References <a id="references"></a> | ||
|
||
<a id="ref_amir13"></a> | ||
Amir *et al.* (2013), | ||
*viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia* | ||
[Nature Biotechnology 31, 545](http://dx.doi.org/10.1038/nbt.2594). | ||
|
||
<a id="ref_angerer15"></a> | ||
Angerer *et al.* (2015), *destiny - diffusion maps for large-scale single-cell | ||
data in R*, [Bioinformatics 32, 1241](http://dx.doi.org/10.1038/nmeth.3971). | ||
|
||
<a id="ref_coifman05"></a> | ||
Coifman *et al.* (2005), *Geometric diffusions as a tool for harmonic analysis and | ||
structure definition of data: Diffusion maps*, [PNAS 102, 7426]( | ||
http://dx.doi.org/10.1038/nmeth.3971). | ||
|
||
<a id="ref_haghverdi15"></a> | ||
Haghverdi *et al.* (2015), *Diffusion maps for high-dimensional single-cell | ||
analysis of differentiation data*, [Bioinformatics 31, 2989]( | ||
http://dx.doi.org/10.1093/bioinformatics/btv325). | ||
|
||
<a id="ref_haghverdi16"></a> | ||
Haghverdi *et al.* (2016), *Diffusion pseudotime robustly | ||
reconstructs branching cellular lineages*, [Nature Methods 13, 845]( | ||
http://dx.doi.org/10.1038/nmeth.3971). | ||
|
||
<a id="ref_krumsiek10"></a> | ||
Krumsiek *et al.* (2010), | ||
*Odefy - From discrete to continuous models*, | ||
[BMC Bioinformatics 11, 233](http://dx.doi.org/10.1186/1471-2105-11-233). | ||
|
||
<a id="ref_krumsiek11"></a> | ||
Krumsiek *et al.* (2011), | ||
*Hierarchical Differentiation of Myeloid Progenitors Is Encoded in the Transcription Factor Network*, | ||
[PLoS ONE 6, e22649](http://dx.doi.org/10.1371/journal.pone.0022649). | ||
|
||
<a id="ref_moignard15"></a> | ||
Moignard *et al.* (2015), *Decoding the regulatory network of early blood | ||
development from single-cell gene expression measurements*, [Nature Biotechnology 33, | ||
269]( | ||
http://dx.doi.org/10.1038/nbt.3154). | ||
|
||
<a id="ref_paul15"></a> | ||
Paul *et al.* (2015), *Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors*, [Cell 163, | ||
1663]( | ||
http://dx.doi.org/10.1016/j.cell.2015.11.013). | ||
|
||
<a id="ref_vandermaaten08"></a> | ||
van der Maaten & Hinton (2008), *Visualizing data using t-SNE*, [JMLR 9, 2579]( | ||
http://www.jmlr.org/papers/v9/vandermaaten08a.html). | ||
|
||
<a id="ref_wittmann09"></a> | ||
Wittmann *et al.* (2009), *Transforming Boolean models to | ||
continuous models: methodology and application to T-cell receptor signaling*, | ||
[BMC Systems Biology 3, 98]( | ||
http://dx.doi.org/10.1186/1752-0509-3-98). | ||
|
Oops, something went wrong.