-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #53 from rgommers/sphinx-site
Add content for a Sphinx site specifically for the protocol
- Loading branch information
Showing
9 changed files
with
636 additions
and
174 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# API of the `__dataframe__` protocol | ||
|
||
Specification for objects to be accessed, for the purpose of dataframe | ||
interchange between libraries, via the `__dataframe__` method on a libraries' | ||
data frame object. | ||
|
||
For guiding requirements, see {ref}`design-requirements`. | ||
|
||
|
||
## Concepts in this design | ||
|
||
1. A `Buffer` class. A *buffer* is a contiguous block of memory - this is the | ||
only thing that actually maps to a 1-D array in a sense that it could be | ||
converted to NumPy, CuPy, et al. | ||
2. A `Column` class. A *column* has a single dtype. It can consist | ||
of multiple *chunks*. A single chunk of a column (which may be the whole | ||
column if ``num_chunks == 1``) is modeled as again a `Column` instance, and | ||
contains 1 data *buffer* and (optionally) one *mask* for missing data. | ||
3. A `DataFrame` class. A *data frame* is an ordered collection of *columns*, | ||
which are identified with names that are unique strings. All the data | ||
frame's rows are the same length. It can consist of multiple *chunks*. A | ||
single chunk of a data frame is modeled as again a `DataFrame` instance. | ||
4. A *mask* concept. A *mask* of a single-chunk column is a *buffer*. | ||
5. A *chunk* concept. A *chunk* is a sub-dividing element that can be applied | ||
to a *data frame* or a *column*. | ||
|
||
Note that the only way to access these objects is through a call to | ||
`__dataframe__` on a data frame object. This is NOT meant as public API; | ||
only think of instances of the different classes here to describe the API of | ||
what is returned by a call to `__dataframe__`. They are the concepts needed | ||
to capture the memory layout and data access of a data frame. | ||
|
||
|
||
## Design decisions | ||
|
||
1. Use a separate column abstraction in addition to a dataframe interface. | ||
|
||
Rationales: | ||
|
||
- This is how it works in R, Julia and Apache Arrow. | ||
- Semantically most existing applications and users treat a column similar to a 1-D array | ||
- We should be able to connect a column to the array data interchange mechanism(s) | ||
|
||
Note that this does not imply a library must have such a public user-facing | ||
abstraction (ex. ``pandas.Series``) - it can only be accessed via | ||
``__dataframe__``. | ||
|
||
2. Use methods and properties on an opaque object rather than returning | ||
hierarchical dictionaries describing memory. | ||
|
||
This is better for implementations that may rely on, for example, lazy | ||
computation. | ||
|
||
3. No row names. If a library uses row names, use a regular column for them. | ||
|
||
See discussion at | ||
[wesm/dataframe-protocol/pull/1](https://github.com/wesm/dataframe-protocol/pull/1/files#r394316241) | ||
Optional row names are not a good idea, because people will assume they're | ||
present (see cuDF experience, forced to add because pandas has them). | ||
Requiring row names seems worse than leaving them out. Note that row labels | ||
could be added in the future - right now there's no clear requirements for | ||
more complex row labels that cannot be represented by a single column. These | ||
do exist, for example Modin has has table and tree-based row labels. | ||
|
||
## Interface | ||
|
||
|
||
|
||
```{literalinclude} dataframe_protocol.py | ||
--- | ||
language: python | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = . | ||
BUILDDIR = _build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
# Configuration file for the Sphinx documentation builder. | ||
# | ||
# This file only contains a selection of the most common options. For a full | ||
# list see the documentation: | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html | ||
|
||
# -- Path setup -------------------------------------------------------------- | ||
|
||
# If extensions (or modules to document with autodoc) are in another directory, | ||
# add these directories to sys.path here. If the directory is relative to the | ||
# documentation root, use os.path.abspath to make it absolute, like shown here. | ||
# | ||
# import os | ||
# import sys | ||
# sys.path.insert(0, os.path.abspath('.')) | ||
|
||
import sphinx_material | ||
|
||
# -- Project information ----------------------------------------------------- | ||
|
||
project = 'Python dataframe interchange protocol' | ||
copyright = '2021, Consortium for Python Data API Standards' | ||
author = 'Consortium for Python Data API Standards' | ||
|
||
# The full version, including alpha/beta/rc tags | ||
release = '2021-DRAFT' | ||
|
||
|
||
# -- General configuration --------------------------------------------------- | ||
|
||
# Add any Sphinx extension module names here, as strings. They can be | ||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom | ||
# ones. | ||
extensions = [ | ||
'myst_parser', | ||
'sphinx.ext.extlinks', | ||
'sphinx.ext.intersphinx', | ||
'sphinx.ext.todo', | ||
'sphinx_markdown_tables', | ||
'sphinx_copybutton', | ||
] | ||
|
||
# Add any paths that contain templates here, relative to this directory. | ||
templates_path = ['_templates'] | ||
|
||
# List of patterns, relative to source directory, that match files and | ||
# directories to ignore when looking for source files. | ||
# This pattern also affects html_static_path and html_extra_path. | ||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] | ||
|
||
# MyST options | ||
myst_heading_anchors = 3 | ||
myst_enable_extensions = ["colon_fence"] | ||
|
||
# -- Options for HTML output ------------------------------------------------- | ||
|
||
# The theme to use for HTML and HTML Help pages. See the documentation for | ||
# a list of builtin themes. | ||
# | ||
extensions.append("sphinx_material") | ||
html_theme_path = sphinx_material.html_theme_path() | ||
html_context = sphinx_material.get_html_context() | ||
html_theme = 'sphinx_material' | ||
|
||
# Add any paths that contain custom static files (such as style sheets) here, | ||
# relative to this directory. They are copied after the builtin static files, | ||
# so a file named "default.css" will overwrite the builtin "default.css". | ||
html_static_path = ['_static'] | ||
|
||
|
||
# -- Material theme options (see theme.conf for more information) ------------ | ||
html_show_sourcelink = False | ||
html_sidebars = { | ||
"**": ["logo-text.html", "globaltoc.html", "localtoc.html", "searchbox.html"] | ||
} | ||
|
||
html_theme_options = { | ||
|
||
# Set the name of the project to appear in the navigation. | ||
'nav_title': 'Python dataframe interchange protocol', | ||
|
||
# Set you GA account ID to enable tracking | ||
#'google_analytics_account': 'UA-XXXXX', | ||
|
||
# Specify a base_url used to generate sitemap.xml. If not | ||
# specified, then no sitemap will be built. | ||
#'base_url': 'https://project.github.io/project', | ||
|
||
# Set the color and the accent color (see | ||
# https://material.io/design/color/the-color-system.html) | ||
'color_primary': 'indigo', | ||
'color_accent': 'green', | ||
|
||
# Set the repo location to get a badge with stats | ||
#'repo_url': 'https://github.com/project/project/', | ||
#'repo_name': 'Project', | ||
|
||
"html_minify": False, | ||
"html_prettify": True, | ||
"css_minify": True, | ||
"logo_icon": "", | ||
"repo_type": "github", | ||
"touch_icon": "images/apple-icon-152x152.png", | ||
"theme_color": "#2196f3", | ||
"master_doc": False, | ||
|
||
# Visible levels of the global TOC; -1 means unlimited | ||
'globaltoc_depth': 2, | ||
# If False, expand all TOC entries | ||
'globaltoc_collapse': True, | ||
# If True, show hidden TOC entries | ||
'globaltoc_includehidden': True, | ||
|
||
"nav_links": [ | ||
{"href": "index", "internal": True, "title": "Dataframe interchange protcol"}, | ||
{ | ||
"href": "https://data-apis.org", | ||
"internal": False, | ||
"title": "Consortium for Python Data API Standards", | ||
}, | ||
], | ||
"heroes": { | ||
"index": "A protocol for zero-copy data interchange between Python dataframe libraries", | ||
#"customization": "Configuration options to personalize your site.", | ||
}, | ||
|
||
#"version_dropdown": True, | ||
#"version_json": "_static/versions.json", | ||
"table_classes": ["plain"], | ||
} | ||
|
||
|
||
todo_include_todos = True | ||
#html_favicon = "images/favicon.ico" | ||
|
||
html_use_index = True | ||
html_domain_indices = True | ||
|
||
extlinks = { | ||
"duref": ( | ||
"http://docutils.sourceforge.net/docs/ref/rst/" "restructuredtext.html#%s", | ||
"", | ||
), | ||
"durole": ("http://docutils.sourceforge.net/docs/ref/rst/" "roles.html#%s", ""), | ||
"dudir": ("http://docutils.sourceforge.net/docs/ref/rst/" "directives.html#%s", ""), | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.