Merge pull request #53 from rgommers/sphinx-site

Add content for a Sphinx site specifically for the protocol
data-apis · Sep 2, 2021 · 27b8e1c · 27b8e1c
2 parents 8498cf1 + 070b9cf
commit 27b8e1c
Show file tree

Hide file tree

Showing 9 changed files with 636 additions and 174 deletions.
diff --git a/protocol/API.md b/protocol/API.md
@@ -0,0 +1,72 @@
+# API of the `__dataframe__` protocol
+
+Specification for objects to be accessed, for the purpose of dataframe
+interchange between libraries, via the `__dataframe__` method on a libraries'
+data frame object.
+
+For guiding requirements, see {ref}`design-requirements`.
+
+
+## Concepts in this design
+
+1. A `Buffer` class. A *buffer* is a contiguous block of memory - this is the
+   only thing that actually maps to a 1-D array in a sense that it could be
+   converted to NumPy, CuPy, et al.
+2. A `Column` class. A *column* has a single dtype. It can consist
+   of multiple *chunks*. A single chunk of a column (which may be the whole
+   column if ``num_chunks == 1``) is modeled as again a `Column` instance, and
+   contains 1 data *buffer* and (optionally) one *mask* for missing data.
+3. A `DataFrame` class. A *data frame* is an ordered collection of *columns*,
+   which are identified with names that are unique strings.  All the data
+   frame's rows are the same length. It can consist of multiple *chunks*. A
+   single chunk of a data frame is modeled as again a `DataFrame` instance.
+4. A *mask* concept. A *mask* of a single-chunk column is a *buffer*.
+5. A *chunk* concept. A *chunk* is a sub-dividing element that can be applied
+   to a *data frame* or a *column*.
+
+Note that the only way to access these objects is through a call to
+`__dataframe__` on a data frame object. This is NOT meant as public API;
+only think of instances of the different classes here to describe the API of
+what is returned by a call to `__dataframe__`. They are the concepts needed
+to capture the memory layout and data access of a data frame.
+
+
+## Design decisions
+
+1. Use a separate column abstraction in addition to a dataframe interface.
+
+   Rationales:
+
+   - This is how it works in R, Julia and Apache Arrow.
+   - Semantically most existing applications and users treat a column similar to a 1-D array
+   - We should be able to connect a column to the array data interchange mechanism(s)
+
+   Note that this does not imply a library must have such a public user-facing
+   abstraction (ex. ``pandas.Series``) - it can only be accessed via
+   ``__dataframe__``.
+
+2. Use methods and properties on an opaque object rather than returning
+   hierarchical dictionaries describing memory.
+
+   This is better for implementations that may rely on, for example, lazy
+   computation.
+
+3. No row names. If a library uses row names, use a regular column for them.
+
+   See discussion at
+   [wesm/dataframe-protocol/pull/1](https://github.com/wesm/dataframe-protocol/pull/1/files#r394316241)
+   Optional row names are not a good idea, because people will assume they're
+   present (see cuDF experience, forced to add because pandas has them).
+   Requiring row names seems worse than leaving them out.  Note that row labels
+   could be added in the future - right now there's no clear requirements for
+   more complex row labels that cannot be represented by a single column. These
+   do exist, for example Modin has has table and tree-based row labels.
+
+## Interface
+
+
+
+```{literalinclude} dataframe_protocol.py
+---
+language: python
+---
diff --git a/protocol/Makefile b/protocol/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/...col/images/dataframe_conceptual_model.png → ...tic/images/dataframe_conceptual_model.png b/...col/images/dataframe_conceptual_model.png → ...tic/images/dataframe_conceptual_model.png
diff --git a/protocol/conf.py b/protocol/conf.py
@@ -0,0 +1,146 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+import sphinx_material
+
+# -- Project information -----------------------------------------------------
+
+project = 'Python dataframe interchange protocol'
+copyright = '2021, Consortium for Python Data API Standards'
+author = 'Consortium for Python Data API Standards'
+
+# The full version, including alpha/beta/rc tags
+release = '2021-DRAFT'
+
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'myst_parser',
+    'sphinx.ext.extlinks',
+    'sphinx.ext.intersphinx',
+    'sphinx.ext.todo',
+    'sphinx_markdown_tables',
+    'sphinx_copybutton',
+]
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+
+# MyST options
+myst_heading_anchors = 3
+myst_enable_extensions = ["colon_fence"]
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+extensions.append("sphinx_material")
+html_theme_path = sphinx_material.html_theme_path()
+html_context = sphinx_material.get_html_context()
+html_theme = 'sphinx_material'
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+
+# -- Material theme options (see theme.conf for more information) ------------
+html_show_sourcelink = False
+html_sidebars = {
+    "**": ["logo-text.html", "globaltoc.html", "localtoc.html", "searchbox.html"]
+}
+
+html_theme_options = {
+
+    # Set the name of the project to appear in the navigation.
+    'nav_title': 'Python dataframe interchange protocol',
+
+    # Set you GA account ID to enable tracking
+    #'google_analytics_account': 'UA-XXXXX',
+
+    # Specify a base_url used to generate sitemap.xml. If not
+    # specified, then no sitemap will be built.
+    #'base_url': 'https://project.github.io/project',
+
+    # Set the color and the accent color (see
+    # https://material.io/design/color/the-color-system.html)
+    'color_primary': 'indigo',
+    'color_accent': 'green',
+
+    # Set the repo location to get a badge with stats
+    #'repo_url': 'https://github.com/project/project/',
+    #'repo_name': 'Project',
+
+    "html_minify": False,
+    "html_prettify": True,
+    "css_minify": True,
+    "logo_icon": "&#xe869",
+    "repo_type": "github",
+    "touch_icon": "images/apple-icon-152x152.png",
+    "theme_color": "#2196f3",
+    "master_doc": False,
+
+    # Visible levels of the global TOC; -1 means unlimited
+    'globaltoc_depth': 2,
+    # If False, expand all TOC entries
+    'globaltoc_collapse': True,
+    # If True, show hidden TOC entries
+    'globaltoc_includehidden': True,
+
+    "nav_links": [
+        {"href": "index", "internal": True, "title": "Dataframe interchange protcol"},
+        {
+            "href": "https://data-apis.org",
+            "internal": False,
+            "title": "Consortium for Python Data API Standards",
+        },
+    ],
+    "heroes": {
+        "index": "A protocol for zero-copy data interchange between Python dataframe libraries",
+        #"customization": "Configuration options to personalize your site.",
+    },
+
+    #"version_dropdown": True,
+    #"version_json": "_static/versions.json",
+    "table_classes": ["plain"],
+}
+
+
+todo_include_todos = True
+#html_favicon = "images/favicon.ico"
+
+html_use_index = True
+html_domain_indices = True
+
+extlinks = {
+    "duref": (
+        "http://docutils.sourceforge.net/docs/ref/rst/" "restructuredtext.html#%s",
+        "",
+    ),
+    "durole": ("http://docutils.sourceforge.net/docs/ref/rst/" "roles.html#%s", ""),
+    "dudir": ("http://docutils.sourceforge.net/docs/ref/rst/" "directives.html#%s", ""),
+}
diff --git a/protocol/dataframe_protocol.py b/protocol/dataframe_protocol.py
@@ -1,70 +1,3 @@
-"""
-Specification for objects to be accessed, for the purpose of dataframe
-interchange between libraries, via the ``__dataframe__`` method on a libraries'
-data frame object.
-
-For guiding requirements, see https://github.com/data-apis/dataframe-api/pull/35
-
-
-Concepts in this design
------------------------
-
-1. A `Buffer` class. A *buffer* is a contiguous block of memory - this is the
-  only thing that actually maps to a 1-D array in a sense that it could be
-  converted to NumPy, CuPy, et al.
-2. A `Column` class. A *column* has a single dtype. It can consist
-   of multiple *chunks*. A single chunk of a column (which may be the whole
-   column if ``num_chunks == 1``) is modeled as again a `Column` instance, and
-   contains 1 data *buffer* and (optionally) one *mask* for missing data.
-3. A `DataFrame` class. A *data frame* is an ordered collection of *columns*,
-   which are identified with names that are unique strings.  All the data
-   frame's rows are the same length. It can consist of multiple *chunks*. A
-   single chunk of a data frame is modeled as again a `DataFrame` instance.
-4. A *mask* concept. A *mask* of a single-chunk column is a *buffer*.
-5. A *chunk* concept. A *chunk* is a sub-dividing element that can be applied
-   to a *data frame* or a *column*.
-
-Note that the only way to access these objects is through a call to
-``__dataframe__`` on a data frame object. This is NOT meant as public API;
-only think of instances of the different classes here to describe the API of
-what is returned by a call to ``__dataframe__``. They are the concepts needed
-to capture the memory layout and data access of a data frame.
-
-
-Design decisions
-----------------
-
-**1. Use a separate column abstraction in addition to a dataframe interface.**
-
-Rationales:
-- This is how it works in R, Julia and Apache Arrow.
-- Semantically most existing applications and users treat a column similar to a 1-D array
-- We should be able to connect a column to the array data interchange mechanism(s)
-
-Note that this does not imply a library must have such a public user-facing
-abstraction (ex. ``pandas.Series``) - it can only be accessed via ``__dataframe__``.
-
-**2. Use methods and properties on an opaque object rather than returning
-hierarchical dictionaries describing memory**
-
-This is better for implementations that may rely on, for example, lazy
-computation.
-
-**3. No row names. If a library uses row names, use a regular column for them.**
-
-See discussion at https://github.com/wesm/dataframe-protocol/pull/1/files#r394316241
-Optional row names are not a good idea, because people will assume they're present
-(see cuDF experience, forced to add because pandas has them).
-Requiring row names seems worse than leaving them out.
-
-Note that row labels could be added in the future - right now there's no clear
-requirements for more complex row labels that cannot be represented by a single
-column. These do exist, for example Modin has has table and tree-based row
-labels.
-
-"""
-
-
 class Buffer:
     """
     Data in the buffer is guaranteed to be contiguous in memory.