sentier_data_tools
, in combination with the common vocabularies at https://vocab.sentier.dev/, for a proof of concept for the next generation of sustainability assessment that Départ de Sentier is building.
- Instead of linear system and fixed single coefficients, we have computational models which gather data vectors from a data store
- Instead of a custom taxonomy, we build on existing standards in an open and collaborative fashion
sentier_data_tools
provides:
- A local data store backend for storing LCA, MFA, industrial ecology, and other data
- A set of classes to interact with the Sentier online vocabulary, including graph traversal and unit conversion
- A base class for models which will be deployed to the
sentier.dev
platform - A complete example with electrolysis; see {TBD}
You can install sentier_data_tools via [pip] from [PyPI]:
$ pip install sentier_data_tools
It is also available on anaconda.
sentier_data_tools
assumes that all data is stored in Pandas DataFrames. Because we store additional metadata about column units, comments, and assemblies using pandas_flavor, the use of other DataFrame alternatives probably won't work.
There are several kinds of dataframes which can be stored in the local data store:
All column labels are IRIs in https://vocab.sentier.dev/
This is an absolute requirement of sentier_data_tools
- we need to know what each column means, and that data must be stored in our common vocabulary. An IRI is like a long URL - for example, our IRI for "Ferro-silico-magnesium" is http://data.europa.eu/xsp/cn2024/720299300080
. This looks unwieldy, and you certainly don't want to type it by hand. But this resource identifier allows us to learn a lot about this product, including its label in many languages, links to IRIs for the same concept in other taxonomies, and its tariff codes. And we're just getting started - here's the kind of data we aspire to.
To make working with these concepts easier sentier_data_tools
allows you to specify aliases in your models.
This is the most common kind of dataframe. It provides technical performance specifications or other input values needed to run functions. These dataframes are often wide, with 20 or more columns, and their columns will commonly refer to modelling terms instead of products or processes.
Here are some examples:
- Model-specific electrolyzer performance values such as output pressure, electrical efficiency, maintenance period, and required input water quality
- Waste treatment stage material recovery fractions by material, treatment stage, and treatment plant
- The maximum power generation, net annual generation, and beneficial owner of every coal-fired power plant on Earth
PARAMETERS
are not intended to be used in proportion to another number. They do require specifying a spatial and temporal validity context (default is Earth and last five years to next five years), as well as a product. Because the sentier.dev
vocabulary is hierarchical, this product can be general or very specific. We use this product value to find the data we need for a given model.
Broad data about society and economy, including future scenarios. Can include installed capacities or other data which can be used directly for creating virtual markets. Can include prices.
Not intended to be used in proportion to another number. Requires a spatial and temporal context, but doesn't require a or preclude a product.
Bill of materials (including energy and services) to produce a good or assembly. Always given in relation to a fixed output value, such as 2 kilograms of steel per one bicycle frame.
Requires a spatial and temporal context and a product.
Specific data on composition on composition of goods and wastes. Different than model parameters in that these can be individual elements (e.g. for MFA), but are related to a given output (product or economic sector). Not completely developed yet.
sentier.dev
models should inherit sentier_data_tools.SentierModel
for the time being.
Making good choices when building models is hard! We want power and simplicity at the same time.
Contributions are very welcome. To learn more, see the Contributor Guide.
Distributed under the terms of the MIT license, sentier_data_tools is free and open source software.
If you encounter any problems, please file an issue along with a detailed description.
You can build the documentation locally by installing the documentation Conda environment:
conda env create -f docs/environment.yaml
activating the environment
conda activate sphinx_sentier_data_tools
and running the build command:
sphinx-build docs _build/html --builder=html --jobs=auto --write-all; open _build/html/index.html