Skip to content

Commit

Permalink
Sphinx documentation (broadinstitute#20)
Browse files Browse the repository at this point in the history
* added license
* moved Dockerfile to a docker directory
* basic Sphinx doc structure
* basic documentation
* added argparse to sphinx doc
* some refactoring of command_line.py
  • Loading branch information
mbabadi committed Jul 21, 2019
1 parent d562279 commit e606931
Show file tree
Hide file tree
Showing 23 changed files with 462 additions and 102 deletions.
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
.DS_Store
*.pyc
cellbender.egg-info/
*.ipynb_checkpoints
.coverage
.coverage.*
.mypy_cache/
.pytest_cache/
build/
dist/
.venv/
.eggs/
.idea/
26 changes: 26 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Copyright (c) 2019, Broad Institute, Inc. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name Broad Institute, Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
11 changes: 11 additions & 0 deletions REQUIREMENTS.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
numpy
scipy
tables
pandas
pyro-ppl>=0.3.2
scikit-learn
matplotlib
sphinx-rtd-theme
sphinx-autodoc-typehints
sphinx-argparse
sphinxcontrib-programoutput
57 changes: 35 additions & 22 deletions cellbender/command_line.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@
import sys
import argparse
from abc import ABC, abstractmethod
from typing import Dict
import importlib


# New tools should be added to this list.
TOOL_LIST = ['remove-background']
TOOL_NAME_LIST = ['remove-background']


class AbstractCLI(ABC):
Expand All @@ -29,12 +30,12 @@ def get_name(self) -> str:
pass

@abstractmethod
def add_subparser_args(self, parser: argparse) -> argparse:
def add_subparser_args(self, parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
"""Add tool-specific arguments, returning a parser."""
pass

@abstractmethod
def validate_args(self, parser: argparse):
def validate_args(self, parser: argparse.ArgumentParser):
"""Do tool-specific argument validation, returning args."""
pass

Expand All @@ -44,14 +45,25 @@ def run(self, args):
pass


def main():
"""Parse command-line arguments and run specified tool.
def generate_cli_dictionary() -> Dict[str, AbstractCLI]:
# Add the tool-specific arguments using sub-parsers.
cli_dict = dict(keys=TOOL_NAME_LIST)
for tool_name in TOOL_NAME_LIST:
# Note: tool name contains a dash, while folder name uses an underscore.
# Generate the name of the module that contains the tool.
module_str_list = ["cellbender", tool_name.replace("-", "_"), "command_line"]

Note: Does not take explicit input arguments, but uses sys.argv inputs
from the command line.
# Import the module.
module = importlib.import_module('.'.join(module_str_list))

# Note: the module must have a file named command_line.py in the main
# directory, containing a class named CLI, which implements AbstractCLI.
cli_dict[tool_name] = module.CLI()

return cli_dict

"""

def get_populated_argparser() -> argparse.ArgumentParser:
# Set up argument parser.
parser = argparse.ArgumentParser(
prog="cellbender",
Expand All @@ -64,32 +76,33 @@ def main():
description="valid cellbender commands",
dest="tool")

# Add the tool-specific arguments using sub-parsers.
cli = dict(keys=TOOL_LIST)
for tool in TOOL_LIST:
cli_dict = generate_cli_dictionary()
for tool_name in TOOL_NAME_LIST:
subparsers = cli_dict[tool_name].add_subparser_args(subparsers)

# Note: tool name contains a dash, while folder name uses an underscore.
return parser

# Generate the name of the module that contains the tool.
module_str_list = ["cellbender", tool.replace("-", "_"), "command_line"]

# Import the module.
module = importlib.import_module('.'.join(module_str_list))
def main():
"""Parse command-line arguments and run specified tool.
# Note: the module must have a file named command_line.py in the main
# directory, containing a class named CLI, which implements AbstractCLI.
cli[tool] = module.CLI()
subparsers = cli[tool].add_subparser_args(subparsers)
Note: Does not take explicit input arguments, but uses sys.argv inputs
from the command line.
"""

parser = get_populated_argparser()
cli_dict = generate_cli_dictionary()

# Parse arguments.
if len(sys.argv) > 1:
args = parser.parse_args(sys.argv[1:])

# Validate arguments.
args = cli[args.tool].validate_args(args)
args = cli_dict[args.tool].validate_args(args)

# Run the tool.
cli[args.tool].run(args)
cli_dict[args.tool].run(args)

else:

Expand Down
File renamed without changes.
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
Binary file added docs/source/_static/design/favicon.ico
Binary file not shown.
Binary file added docs/source/_static/design/logo_1024_1024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions docs/source/citation/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. _citation:

Citation
========

If you use CellBender in your research (and we hope you will!), please consider citing our papers. The relevant
papers to cite will be published here soon.

65 changes: 65 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# http://www.sphinx-doc.org/en/master/config

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
dir_, _ = os.path.split(__file__)
root_dir = os.path.abspath(os.path.join(dir_, '..', '..'))
sys.path.insert(0, root_dir)

# -- Project information -----------------------------------------------------

project = 'CellBender'
copyright = '2019, Data Sciences Platform (DSP), Broad Institute'
author = 'Stephen Fleming, Mehrtash Babadi'

# The full version, including alpha/beta/rc tags
version = ''
release = ''

# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.mathjax',
'sphinx.ext.viewcode',
'sphinxarg.ext',
'sphinx_autodoc_typehints',
'sphinxcontrib.programoutput',
'sphinx.ext.intersphinx'
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []

# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_favicon = '_static/design/favicon.ico'

12 changes: 12 additions & 0 deletions docs/source/contributing/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.. _contributing:

Contributing
============

We aspire to make CellBender an easy-to-use, robust, and accurate software package for the bioinformatics community.
While we test and improve CellBender together with our research collaborators, your feedback is
invaluable to us and allow us to steer CellBender in the direction that you find most useful in your research. If you
have an interesting idea or suggestion, please do not hesitate to reach out to us.

If you encounter a bug, please file a detailed github `issue <https://github.com/broadinstitute/CellBender/issues>`_
and we will get back to you as soon as possible.
11 changes: 11 additions & 0 deletions docs/source/getting_started/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. _getting started:

Getting Started
===============

This section contains quick start tutorials for different CellBender modules.

.. toctree::
:maxdepth: 1

remove_background/index
53 changes: 53 additions & 0 deletions docs/source/getting_started/remove_background/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
.. _remove background tutorial:

remove-background
=================

In this tutorial, we will run ``remove-background`` on a small dataset derived from the 10x Genomics
``pbmc4k`` scRNA-seq dataset (v2 Chemistry, CellRanger 2.1.0).

As a first step, we download the full dataset and generate a smaller `trimmed` copy by selecting 500 barcodes
with high UMI count (likely non-empty) and an additional 50'000 barcodes with small UMI count (likely empty). Note
that the trimming step is performed in order to allow us go through this tutorial in a matter of minutes on a
typical personal computer. Processing the full untrimmed dataset requires a CUDA-enabled GPU (e.g. NVIDIA Testla K80)
and takes about 30 minutes to finish.

We have created a python script to download and trim the dataset. Navigate to ``examples/remove_background/``
under your CellBender installation root directory and run the following command in the console:

.. code-block:: bash
$ python generate_tiny_10x_pbmc.py
After successful completion of the script, you should have a new directory named `tiny_raw_gene_bc_matrices`
containing `GRCh38/matrix.mtx`, `GRCh38/genes.tsv`, and `GRCh38/barcodes.tsv`.

We proceed to run `remove-background` on the trimmed dataset using the following command:

.. code-block:: bash
$ cellbender remove-background \
--input ./tiny_raw_gene_bc_matrices/GRCh38 \
--output ./tiny_10x_pbmc.h5 \
--expected-cells 500 \
--total-droplets-included 5000
The computation will finish within a minute or two (after ~ 150 epochs). The tool outputs the following files:

* ``tiny_10x_pbmc.h5``: An HDF5 file containing a detailed output of the inference procedure, including the
normalized abundance of ambient transcripts, contamination fraction of each droplet, a low-dimensional
embedding of the background-corrected gene expression, and the background-corrected counts matrix (in CSC sparse
format). Please refer to the full documentation for a detailed description of these and other fields.

* ``tiny_10x_pbmc_filtered.h5``: Same as above, though, only including droplets with a posterior cell probability
exceeding 0.5.

* ``tiny_10x_pbmc_cell_barcodes.csv``: The list of barcodes with a posterior cell probability exceeding 0.5.

* ``tiny_10x_pbmc.pdf``: A PDF summary of the results showing (1) the evolution of the loss function during training,
(2) a ranked-ordered total UMI plot along with posterior cell probabilities, and (3) a two-dimensional PCA
scatter plot of the latent embedding of the expressions in cell-containing droplets. Notice the rapid drop in
the cell probability after UMI rank ~ 500.

Finally, try running the tool with ``--expected-cells 100`` and ``--expected-cells 1000``. You should find that
the output remains virtually the same.
10 changes: 10 additions & 0 deletions docs/source/help_and_reference/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.. _help and reference:

Help & Reference
================

.. toctree::
:maxdepth: 1

remove_background/index
license/index
7 changes: 7 additions & 0 deletions docs/source/help_and_reference/license/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.. _license:

License
=======

.. literalinclude:: ../../../../LICENSE
:language: text
Loading

0 comments on commit e606931

Please sign in to comment.