LINDT units of measure #129

VladimirAlexiev · 2020-11-03T11:57:50Z

Why?

It's hard to work with quantities (value + UoM) in RDF and SPARQL.

There are about 10 UoM ontologies:

the worse of them are just a list of units
the better of them also add dimensionality analysis (eg length is L^1, area is L^2) and conversion factors (eg from cm to m, from degF to degC)

Working with units in SPARQL is quite hard. Comparing compatible units or doing arithmetics on units is possible if you are working with one of the better ontologies, but difficult. You have to fetch the dimension vectors and conversion factors and work with them, and the queries become very complex.

SHACL's modest arithmetic capabilities (eg minInclusive to compare to constant, lessThan to compare two props) borrow from SPARQL, so it's impossible to state "temperature should be between 0 and 10 degC", see https://lists.w3.org/Archives/Public/public-shacl/2020Nov/0001.html

But there is one approach that solves these problems.

Previous work

The Unified Code for Units of Measure (UCUM) http://unitsofmeasure.org/ucum.html codifies all kinds of units and guarantees unambiguous interpretation (no unit label conflicts).
The Java UCUM library implements this system
Linked Data Types (LINDT) uses this and SPARQL datatype handlers to implement it in SPARQL

LINDT is unique in that it encodes both value and unit in one literal, eg "1 m"^^cdt:ucum, "100 cm"^^cdt:ucum. This is economical, but more importantly you can compare such quantities, and you can also do arithmetic operations on quantities.

home: https://ci.mines-stetienne.fr/lindt/index.html
playground: https://ci.mines-stetienne.fr/lindt/playground.html
spec: https://ci.mines-stetienne.fr/lindt/v2/custom_datatypes.html, https://ci.mines-stetienne.fr/lindt/spec.html
implemented in a Jena fork: https://github.com/thesmartenergy/jena, https://github.com/OpenSensingCity/jena-ucum

This would be very useful for any sort of application in engineering, smart cities, semantic sensor networks, WoT, etc.

@maximelefrancois86 tells me it even supports complex numbers, which are important in some electricity applications, is that correct? Can you give an example?

Features https://ci.mines-stetienne.fr/lindt/v2/custom_datatypes.html#on-apache-jena

Overload of SPARQL operators (=, <, etc.) to compare measurement literals;
Overload of algebraic functions (+, -, *, /) to manipulate measurement literals:
- Add two commensurable measurement literals
- Subtract a measurement literals to a commensurable one
- Multiply two measurement literals, or a measurement literal and a scalar (xsd:int, xsd:decimal, xsd:float, xsd:double)
- Divide a measurement literal by a measurement literal, a measurement literal by a scalar, or a scalar by a measurement literal
Custom SPARQL function lindt:sameDimension(arg1,arg2) to check if two measurement literals are commensurable (returns a xsd:boolean).
Cast to XSD numeric datatypes
dynamic loading of new datatypes/units

LINDT is very ingenious and it's a pity that it hasn't found a wider following.

It's implemented as a Jena branch but hasn't been merged into trunk: "This branch is 14 commits ahead, 5325 commits behind apache:master."
We have been looking for a pretext (i.e. client) to implement it in rdf4j.
It's adopted in some ontologies, but these are very few:
- CoCoOn: Cloud Computing Ontology for IaaS Price and Performance Comparison
- VSSo: Vehicle Signal and Attribute Ontology
- others?

Proposed solution

Adopt LINDT as a best practice for representing units.
Work with other communities (WoT, semantic sensors) to also adopt it.

Considerations for backward compatibility

No direct consequences because it uses custom datatype handlers to do its work. I.e. if you don't use the CDT datatypes (cdt:ucum, cdt:length, etc) you'll see no difference.

However, guidance and solution templates for migrating from other systems for representing units should be provided

The text was updated successfully, but these errors were encountered:

JervenBolleman · 2020-11-03T15:55:28Z

I have been thinking about this exact thing. However, I was thinking that all of UCUM might be to much work to demand, plus currently awkward licensing wise (section 7 of the license). So I was wondering if the 7 base Units of the International Systems of Units plus the 21 coherent derived named units might be a sufficient lower bound for implementation.

unit	proposed datatype	example of idea
second	unit:s	60^^unit:s
meter	unit:m	1.99^^unit:m
kilogram	unit:kg	88^^unit:kg
Ampere	unit:A
Kelvin	unit:K	273.1^^unit:K
mol	unit:mol
candelad	unit:J
hertz	unit:Hz
radian	unit:rad
steradian	unit:sr
newton	unit:N
pascal	unit:Pa
joule	unit:J
watt	unit:W
coulomb	unit:C
volt	unit:V
farad	unit:F
ohm	unit:Ω
siemens	unit:S
weber	unit:Wb
tesla	unit:T
henry	unit:H
lumen	unit:lm
lux	unit:lx
becquerel	unit:Bq
gray	unit:Gy
sievert	unit:Sv
katal	unit:kat

Implementation advantage is these are simple numeric values so as long as datatype is the same cast to decimal and compare.

There are no prefixes, all must be converted to the base unit. e.g. 600km should be stored as "60000"^^unit:m.
Derived units must always be in coherent form for this to work (i.e. also not have prefixes).
Advantage of scaling down to base units is simpler comparison functions, and easier to generate indexes.

The list misses Celsius, which is trivial to add, but would need to be comparable to Kelvin.
Also ohm symbol Ω is outside of ascii so unit:Ohm could be an option.

Basically, the cost is in converting to base units at storage. However, consistent storage makes it easier to build indexes and ensure query results are correct.

This of course leaves out all the derived units e.g. kg/m m/s^2 etc. which I think would be very worth while to have but might be to much work to implement all commonly used ones. Unless they follow a straightforward pattern in IRI encoding that can be decoded and generated by stores on the fly. e.g. an option division "1"^^unit:kg / "1"^^unit:m => "1"^^unit:kg-per-m and "2"^^unit:m * "2"^^unit:m => "4"^^unit:m2

Also the use of data types avoids one of the issues with UCUM is that conflicts in coding exists, and that coding for customary units is not straightforward. See Fluid Ounce. Where a datatype pointing to the fluid ounce definition would be a clearer option.
Specifically as there are more legal redefinitions of fluid ounce for legal reasons. It's 30 ml or 23 1/3 grams of pure alcohol in some US food standards (TODO: find again an article showing the many redefinitions of US fluid ounce). UCUM is widely specified in clinical settings but in reality not always used (even where it was specified).

Side notes

all unit values should inherit from xsd:decimal.
we should have power and square root operators

VladimirAlexiev · 2020-11-03T23:09:46Z

@JervenBolleman what licensing problems do you see?

UCUM has customary units that are important in many disciplines. If you ask users to always store angstroms and light-years as meters, you are shifting burden to them.

And even "decorative" units like 123 {rbc} which is a count (dimensionless) but of "red blood cells".

LINDT is implemented in Jena, and @jeenbroekstra says won't be too hard to port to rdf4j. So "Downgrading" to your proposal will be more work for java devs.

So that leaves other languages. The suggestion is whether there are UCUM libraries in other languages?

HolgerKnublauch · 2020-11-03T23:10:43Z

I much prefer using explicit datatypes to encoding the unit into the string. E.g. "1"^^unit:m is better than "1 m"^^ucum:unit.

But a more general solution might be to offer a declarative extension point so that anyone can define custom datatypes and those datatypes actually can be used consistently. This could work similar to user-defined SHACL constraint components or SHACL-AF functions. Just some very quick thoughts, a datatype might need to be able to respond to questions "can I compare my value to another datatype" (e.g. yes for mm to m comparison), and then a normalize function that would bring all datatypes from a group to a common base unit, e.g. meter. Then things like < comparison in SPARQL can be automated. The actual business logic can probably be covered declaratively through a couple of properties that are attached to the units as done (comprehensively) in the QUDT vocabulary.

The advantage here is that a SPARQL 1.2 would only need to implement a few generic building blocks while the details of the specific datatypes are irrelevant, and we don't even need to discuss the specific catalog of datatypes that need to be implemented.

VladimirAlexiev · 2020-11-03T23:16:53Z

Connecting UCUM unit symbols to ontologies like QUDT or OM is important and useful because they expose as triples info that is within the UCUM library (eg the dimension vector if Newton and the conversion factors of Farenheit). Plus extra info, eg that Inch is an imperial unit, grouping of units by discipline, etc. I believe QUDT or OM already has ucum codes, so that should not be hard.

So we could spec custom functions to parse out a unit from a quantity, and connect to a structured unit node in such an ontology.

VladimirAlexiev · 2020-11-05T14:01:01Z

@HolgerKnublauch

"1"^^unit:m is better than "1 m"^^ucum:unit

In addition to the generic cdt:ucum, LINDT also has datatypes ucum:length, ucum:mass etc that represent quantities with fixed/known dimension.

But I see some problems with having distinct datatypes for each unit:

there are just too many. It's nearly a combinatorial explosion. Eg "barrels per day", "US barrels per hour", etc etc.
- Consider that just for dimensionless units, there are many variations such as percent, promile, ppm (parts per million...).
- There are also annotations (advisory customary pieces), eg {rbc} (red blood cells), {pair} or {pairs} for socks, {packs} vs {masterboxes} for cigarettes, s {0..100 km/h} for car acceleration expressed as time to reach that speed, etc: see https://ucum.org/ucum.html#para-6
units use special symbols that will be unwieldy in URL local names or will become unreadable if you URL-encode them. Eg what datatype URLs would you translate the following units to? (they happen to express the same unit):

"km.h-1"^^cdt:ucumunit
"km/h"^^cdt:ucumunit
"(1000m)/(60min)"^^cdt:ucumunit

offer a declarative extension point

LINDT does that:

see https://ci.mines-stetienne.fr/lindt/spec.html and in particular https://ci.mines-stetienne.fr/lindt/spec.html#the-application-programming-interface
see https://ci.mines-stetienne.fr/lindt/v1/custom_datatypes.ttl for a declaration of a datatype cdt:length, and https://ci.mines-stetienne.fr/lindt/v1/custom_datatypes.js for an implementation in JS

BTW @maximelefrancois86 there are Broken links at https://ci.mines-stetienne.fr/lindt/v1/custom_datatypes:

http://w3id.org/lindt/custom_datatypes.ttl redirects to https://ci.mines-stetienne.fr/lindt/v3/custom_datatypes.ttl, which does not exist
http://w3id.org/lindt/custom_datatypes.js redirects to https://ci.mines-stetienne.fr/lindt/v3/custom_datatypes.js, which does not exist

In contrast, both of https://ci.mines-stetienne.fr/lindt/v1/custom_datatypes and https://ci.mines-stetienne.fr/lindt/v3/custom_datatypes exist.

namedgraph · 2020-11-05T14:10:54Z

"(1000m)/(60min)"^^cdt:ucumunit -- this is simply not a structured way of describing units, which goes against the RDF practice.

VladimirAlexiev · 2020-11-05T14:30:27Z

@namedgraph This unit is well structured and well defined according to https://ucum.org/ucum.html (and implemented in Java UCUM and consequently in LINDT).
It's just not structured in RDF.

NOT everything needs or should be structured in RDF. Eg are you against GeoSPARQL literals (WKT and GML)?

maximelefrancois86 · 2020-11-05T14:32:44Z

Dear all,

@namedgraph , there are sometimes good rationale to encode complex values using literals instead of relying on RDF structures and basic datatypes. The OGC GeoSPARQL datatype geo:WKTLiteral is a great example.

"<http://www.opengis.net/def/crs/OGC/1.3/CRS84> Polygon((-83.6 34.1, -83.6 34.5, -83.2 34.5, -83.2 34.1, -83.6 34.1))”^^geo:WKTLiteral

I back up the thoughts of @VladimirAlexiev :

I think having a unique datatype cdt:ucum would be the most simple choice. SPARQL engines would only need to recognise one additional datatype IRI, and could hand on to the UCUM specification for the list of base units, and how compound units can be formed. There exist implementations of UCUM in common programming languages.
In our implementation on apache Jena, we also included:
- overload of SPARQL operators (=, <, etc.) to compare measurement literals;
- overload of algebraic function (+, -, *, /) to manipulate measurement literals:
- a custom SPARQL function with IRI: http://w3id.org/lindt/custom_datatypes#sameDimension(arg1, arg2) to check if two measurement literals are commensurable (returns a xsd:boolean).
- cast to XSD numeric datatypes

namedgraph · 2020-11-05T14:53:23Z

OK fine, WKT literals is a counter-example. But GeoSPARQL is an additional standard, not part of the SPARQL spec. And you seem to have done the same with units. So what's the problem? Why does it need to be in SPARQL 1.2 proper?

maximelefrancois86 · 2020-11-05T14:56:13Z

I am neutral on this. I also think it would be fine to have such a datatype specified in a separate document.

VladimirAlexiev · 2020-11-05T14:59:26Z

I don't see this becoming part of SPARQL 1.2. As I said "Adopt LINDT as a best practice" and work with other communities to adopt it.

@maximelefrancois86

can you give an example of complex numbers used for electrical quantities?
see "Broken links" above

maximelefrancois86 · 2020-11-05T15:07:14Z

Thank you @VladimirAlexiev , we are on the same page.

For complex numbers: we just had a very good first year Master student that worked on this during her 3-months internship this year: Yana Soares de Paula. https://www.linkedin.com/in/yanaspaula/ She did an excellent job in just three months, but more work would be needed to augment cdt:ucum with complex numbers. She would probably happy to share her report with you if you wish

About the broken links in lindt v1, I'll create an issue and check asap. Thanks for the notice.

VladimirAlexiev · 2020-11-05T15:13:36Z

There exist implementations of UCUM in common programming languages

Eg there seem to be 2 for JS:

It appears UCUM is the dominant UoM system in life sciences.

https://ucum.nlm.nih.gov/ says "UCUM has been adopted internationally by many organizations such as IEEE, DICOM, LOINC, and HL7, and is also in the ISO 11240:2012 standard". (and in addition to the JS implementation has more resources like validation and autocomplete_
https://www.hl7.org/fhir/ucum.html
https://danielvreeman.com/units-of-measure-conversion-validation/

maximelefrancois86 · 2020-11-05T15:23:16Z

See also

Python https://pypi.org/project/pyucum/
Java https://github.com/unitsofmeasurement/uom-systems/
Java https://github.com/FHIR/Ucum-java
C# https://github.com/mnisl/OD
Rust https://github.com/agrian-inc/wise_units

Maybe there are more

dr-shorthair · 2020-11-06T00:49:47Z

Connecting UCUM unit symbols to ontologies like QUDT

@VladimirAlexiev Most QUDT units now have UCUM codes in their description, so correlation of these two systems is already available. (I did this work in the last few months.) The ones that are missing do not have equivalent UCUM codes, so there is nothing on that side to correlate with.

QUDT gives you explicit dimension-vectors, and conversion factors (and offsets, where appropriate).

I'm also on the UCUM Advisory Board, and the licensing issue is high on the agenda.
Though the current UCUM Terms of Use look a bit fierce at first glance, I have been assured that the kind of usage that is envisaged here is totally fine, and the intention is to make this more clear in the license.

dr-shorthair · 2020-11-06T00:56:29Z

On the matter of style: I vote with @HolgerKnublauch in favour of

273^^ucum:K

compared with

"273 K"^^cdt:ucum

It does not require a string to be parsed, so basic SPARQL queries can be used, detecting the datatype, but without regexing strings.

dr-shorthair · 2020-11-06T01:00:22Z

@JervenBolleman I think your table matches the UCUM codes, except for Ohm for Ω (note case). That is no surprise as UCUM was designed to use the common codes as far as possible.

This XML representation is the reference for the UCUM terminals.

sa-bpelakh · 2020-11-06T03:35:41Z

It does seem like the best way to adhere to standard and industry-wide use but avoid mixing domain-specific aspects into the generic SPARQL standard, this should be an auxiliary standard, like GeoSPARQL. The complex cdt:ucum literals are quite similar to WKT in this aspect.

dr-shorthair · 2020-11-06T03:59:02Z

Scaling factors for scalar quantities are not domain specific at all.

In fact I'd argue it is a notable failure of almost all computer languages that this is not built-in.
There are very few pure 'floating point' numbers, or 'decimals' that can be understood without knowing the unit-of-measure.

I'm totally fine with embedding coordinate sequences in a microformat, since they have no meaning considered independently. I was in the team that standardized GeoSPARQL and am very comfortable with the design choice. But scalar quantities are a very different matter, and much more simple.

HolgerKnublauch · 2020-11-06T05:54:24Z

I also think that units are a different topic than GeoSPARQL. Users should expect to perform comparisons using the built-in < and > operators, and possibly to do arithmetics such as + and * on unit'ed (is this a word?) values. This might of course just become a matter of enough implementations agreeing on a de-facto standard, but it shouldn't be too hard to agree on a mechanism at least for the most common units in a SPARQL 1.2. Once it's in SPARQL then related standards such as SHACL would automatically "inherit" these features, e.g. for sh:minInclusive.

JervenBolleman · 2020-11-06T16:19:22Z

@dr-shorthair I expanded my comment, changed to unit:Ohm, whose casing is inconsistent over standards.

@HolgerKnublauch I also think easier support by stores for custom datatypes would be very nice. And would make implementing this feature cheaper for everyone. Let's open a separate issue for easier custom datatypes. (Also easier sharing of custom function definitions).

@VladimirAlexiev I would love a full UCUM support for some projects I am involved in. I am just worried that it would be to large a code base for independent smaller SPARQL communities to implement. Also I think we end up with a downstream licensing issue with UCUM until their license is changed. Which might take a long time.

kasei · 2020-11-06T16:26:29Z

@JervenBolleman I could see standardization of service description vocabulary terms for describing which custom datatypes are supported. Beyond that, though, wouldn't "easier custom datatypes" be an issue for individual implementations (and not something the spec can/should concern itself with)? What would spec involvement in this area look like?

VladimirAlexiev · 2020-11-06T18:08:42Z

@dr-shorthair where do you have mapping tables QUDT-UCUM showing in particular the gaps on either side?

As I wrote above, it's useful to have in RDF (QUDT) what UCUM libraries provide in code.

Please comment on how you would represent the variety of UCUM strings (including annotations in curlies) as datatype URLs. You picked the easiest case K.

@sa-bpelakh Agreed! As I said above, this can only be a recommended best practice, can't be part of the SPARQL spec.

@HolgerKnublauch

perform comparisons using the built-in < and > operators, and possibly to do arithmetics such as + and * on unit'ed (is this a word?)

For comparison, + and - you need Commensurate quantities (having same dimensionality).
You can apply * and / to any quantities, and also between quantities and simple numbers.

LINDT does all that.

sh:minInclusive

Yes! And sh:lessThan

@JervenBolleman

too large a code base for independent smaller SPARQL communities to implement

UCUM has implementations in many languages, they should leverage such implementations.
LINDT uses UCUM Java and hooks up into Jena datatype handlers to override SPARQL operators.

kasei · 2020-11-06T18:16:58Z

too large a code base for independent smaller SPARQL communities to implement

UCUM has implementations in many languages, they should leverage such implementations.
LINDT uses UCUM Java and hooks up into Jena datatype handlers to override SPARQL operators.

As an implementor of several SPARQL systems in less popular languages, I join @JervenBolleman in concern at the implementation burden. Just because something like this has implementations in several languages does not mean there wouldn't be a real cost added to many existing (and possibly future!) systems.

ashleysommer · 2020-11-06T21:54:24Z

I'm a maintainer of the Python RDFLib (including its SPARQL executor) and developer of PySHACL.

I agree with @VladimirAlexiev on this one. After reading the UCUM Spec I don't see how individual 273^^ucum:K could work for all of the possible combinations of units of measurement allowed by UCUM.

You'd need a string representation like "273 K"^^cdt:ucum, a simple example is 10 millimeters of mercury (for pressure measurement) "10 mm[Hg]"^^cdt:ucum and for a more extreme example "ventricular stroke work" in "gramforce-meter per heartbeat per square meter" "4 gf.m/({hb}.m2)"^^cdt:ucum.

While the set of units defined in UCUM is closed, the microformat is created in such a way that adding new units (in a subsequent version) is easy and predictable. If every unit in the current spec was pulled out into a discrete datatype in the ucum ontology, that would need to be updated whenever a new unit is added to UCUM.

VladimirAlexiev · 2020-11-06T22:34:13Z

Here is the offending license clause:

Subject to Section 1 and the other restrictions hereof, users may incorporate portions of the UCUM table and definitions into another master term dictionary (e.g. laboratory test definition database), or software program for distribution outside of the user's corporation or organization, provided that any such master term dictionary or software program includes the following fields reproduced in their entirety from the UCUM table: UCUM code, definition value and unit. Every copy of the UCUM table incorporated into or distributed in conjunction with another database or software program must include the following notice:

“This product includes all or a portion of the UCUM table, UCUM codes, and UCUM definitions or is derived from it, subject to a license from Regenstrief Institute, Inc. and The UCUM Organization. Your use of the UCUM table, UCUM codes, UCUM definitions also is subject to this license, a copy of which is available at http://unisofmeasure.org. The current complete UCUM table, UCUM Specification are available for download at http://unitsofmeasure.org. The UCUM table and UCUM codes are copyright © 1995-2013, Regenstrief Institute, Inc. and the Unified Codes for Units of Measures (UCUM) Organization. All rights reserved.

THE UCUM TABLE (IN ALL FORMATS), UCUM DEFINITIONS, AND SPECIFICATION ARE PROVIDED "AS IS." ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.”

If the master term dictionary or software program containing the UCUM table, UCUM definitions and/or UCUM specification is distributed with a printed license, this statement must appear in the printed license. Where the master term dictionary or software program containing the UCUM table, UCUM definitions, and/or UCUM specification is distributed on a fixed storage medium, a text file containing this information also must be stored on the storage medium in a file called "UCUM_short_license.txt". Where the master term dictionary or software program containing the UCUM table, UCUM definitions, and/or UCUM specification is distributed via the Internet, this information must be accessible on the same Internet page from which the product is available for download.

HOWEVER, see the comment above that the UCUM committee says it's ok to use UCUM as described in this issue. I.e. not to get too hung up on this legalese (which is indeed pretty bad, compared to modern open licences)

dr-shorthair · 2020-11-06T22:50:24Z

@VladimirAlexiev

where do you have mapping tables QUDT-UCUM showing in particular the gaps on either side?

SELECT *
WHERE {
	?p a qudt:Unit .
	MINUS { ?p a qudt:CurrencyUnit . }
	OPTIONAL { ?p qudt:ucumCode ?u . }
}

I don't think there are any gaps on the QUDT side relative to the UCUM terminals, but since UCUM does not define a closed set (novel combinations are always possible) there will not be a member in the QUDT catalogue for every arbitrary UCUM code.

dr-shorthair · 2020-11-06T22:57:22Z

Regarding the license, remember that (notwithstanding the use of XML for the reference data) UCUM was developed in a pre-linked-data, and pre-CC world. And since UCUM is widely built in to medical and clinical software, there was a concern to ensure that there not be any muddling of libraries and conversions. That would be a real problem. As I am in contact with both the QUDT and UCUM maintainers, clarifying the license is a priority to clear up any issues about the appearance of UCUM codes alongside a separately derived set of conversion factors. But this should definitely NOT impede mentioning UCUM codes in RDF and SPARQL.

Note that the US National Library of Medicine is now providing the main support for UCUM, including an API and a Javascript library.

ericprud · 2020-11-20T10:38:47Z

@VladimirAlexiev

@JervenBolleman

we don't support FILTER("2+2"^^xsd:integer = 4) and that is natural

But we support FILTER("2"^^xsd:integer + 2 = 4).
What is unnatural is that we don't support "1 m"^^cdt:ucum + "100 cm"^^cdt:ucum or "1"^^ucum:m + "100"^^ucum:cm.

I think a more apt analogy for "1"^^ucum:m + "100"^^ucum:cm would be FILTER(2 + 2.0 = 4). The fact that "2"^^xsd:integer parses to the same internal representation as 2 is just an feature of the parser semantics. The ability to add a double and an integer and compare the result to an integer (in fact, the comparison substitutes the double 4.0) is orchestrated by XPath's numeric type promotion and type substitution. Extrapolating that to apply to units would give us that same functionality and some nice unit analysis as a side benefit. I can see a couple ways to do that:

Canonical units

For every dimension we specify (length, charge, mass...), pick a canonical unit. MKS would be practical and would add another attractor tugging the US forward to the 18th century). Enumerate all of the compatible units with linear functions mapping them to the canonical:
ucum:m -> +0, *1 ucum:m
ucum:in -> +0, *.0254 ucum:m
ucum:f -> -32, *1.8 ucum:c

Any evaluation requiring the promotion of the left column to the right column applies the transformation and leaves you with the canonical units. Where the current operator table has entries like

Operator	Type(A)	Type(B)	Function	Result type
A + B	numeric	numeric	op:numeric-add(A, B)	numeric

we could add entries for the dimensions:

Operator	Type(A)	Type(B)	Function	Result type
A + B	length	length	op:numeric-add(A, B)	length

This is cool because the operator table prevents us from adding a length to a time. It's a little funny because everything gets metrified, e.g. (BIND "1"^^ucum:ft + "1"^^ucum:in AS ?x) will give you ".3302"^^ucum:m.

Unit ladder

We could ameliorate that a bit by group entries in the type promotion hierarchy so that known imperial units stay imperial and get promoted to the smallest imperial unit, so (BIND "1"^^ucum:ft + "1"^^ucum:in AS ?x) will give you "13"^^ucum:in. Things that don't fit into one of those groups would still get metrified (yes, i made that word up), e.g. (BIND "1"^^ucum:lightyear + "1"^^ucum:parsec AS ?x) will give you "4.0318165349E16"^^ucum:m.

P.S.

It would be lovely extend the grammar so we could write 1ft instead of "1"^^ucum:foot (which as a parser feature, is orthogonal to the "1"^^ucum:foot vs. "1ft"^^ucum:length debate. I guess feasibility comes down to how crazy the lexical strings for the units are.

sa-bpelakh · 2020-11-20T13:01:36Z

@VladimirAlexiev

Canonical units

I like the design for canonical units, and the implementation is well defined. I definitely prefer "1"^^ucum:foot instead of "1ft"^^ucum:length, because the unit implies the dimension, and avoids a micro-grammar in the literal value.

I think the complexity of the unit ladder could be avoided if you allow casting conversions, e.g. bind(ucum:foot(?a + ?b +?c) as ?length_in_feet)) to guarantee a specific unit (and do dimension checking in the process)

kasei · 2020-11-20T16:22:03Z

@ericprud

It's a little funny because everything gets metrified, e.g. (BIND "1"^^ucum:ft + "1"^^ucum:in AS ?x) will give you ".3302"^^ucum:m.

I would think this could be handled just like the XPath constructor functions:

ucum:in("1"^^ucum:ft + "1"^^ucum:in) => "13"^^ucum:in

(Though there might be some funny floating point error issues to consider.)

ericprud · 2020-11-20T23:49:33Z

@kasei , that makes sense to me. I think your example converts 1ft and 1in both to meters and then back to inches. If you knew the types (e.g. they weren't plucked from some heterogeneous attribute in the data), you could avoid that by narrowing the scope of the cast:

ucum:in("1"^^ucum:ft) + "1"^^ucum:in => "13"^^ucum:in

I mention this because an alternative would be that casting functions override the type promotion of their arguments but applying the cast to each of the contained atoms. This sounds terribly contrived but useful enough to have a moment of collective consideration.

dr-shorthair · 2020-11-21T01:30:30Z

All the libraries or catalogues that I've looked at record conversion factors to SI.
So any comparison of non-SI scaled quantities would necessarily trip through a conversion to SI.

ericprud · 2020-11-25T16:24:28Z

i think we can drop this notion of having a clairvoyant cast function that operates over the operands of any nested operators.

VladimirAlexiev · 2020-11-28T12:35:11Z

@ericprud and @sa-bpelakh

For clarity: LINDT has datatypes cdt:ucum and per-kind units like cdt:length, but not per-unti datatypes like ucum:m
LINDT does arithmetic operations and comparisons, and no other library does that
feasibility comes down to how crazy the lexical strings for the units are: some are "crazy" indeed!

Several people have proposed to use per-kind units like "1"^^ucum:m instead of LIND's approach eg "1 m"^^cdt:ucum. But nobody has yet proposed how to handle the variety of "crazy" units.

UCUM defines a countably infinite list of units. Any RDF approach is necessarily finite.
UCUM us more dynamic. Eg "fuel flow" in USAF/NASA is sometimes measured in pounds per hour. With UCUM I can write "10 [lb_av]/h" right away, whereas with QUDT I have to propose a new unit new: unit:LB-PER-HR qudt/qudt-public-repo#285. But I can live with that.
However, I cannot live with having to use URL escapes in datatype URLs
And I think the medical community needs their "unit annotations", eg {rbc}

dr-shorthair · 2020-11-29T00:31:32Z

I understand your concern about URL escapes. The following 'reserved' characters may appear in UCUM codes:

* ' ( ) + / [ ]

Of these, [ ] are commonly used and can't be easily worked around.
' appear in the codes for minutes and seconds, and in some qualified units like [in_i'H2O].
Parentheses ( ) can be used to group codes 'under the solidus' /, both of which can be avoided by using dots and negative exponents. + is only necessary for some power-of-ten factors.

I don't believe { } are reserved.

I think QUDT is a separate issue at this point. Yes, it may be useful as it provides an RDF-based model for describing units. But I would expect that it would be invoked through a call like
give me the QUDT description of the UOM with the UCUM symbol AAAaaaAAA
or similar. The UCUM symbol is the key.

HolgerKnublauch · 2020-11-29T01:25:51Z

Vladimir, I still don't have strong opinion against string-encoding. And I do agree that this flexible string encoding has some advantages, because it is more open-ended than having URIs and, as you point out, URL escapes can be ugly.

I do wonder though whether those complex compound units are important enough and whether they should dictate how the rest of the solution should work. Arguably the vast majority of use cases will be covered by a static set of predictable and well-established URIs for the commonly used units. Much will be gained if there is at least a solution for those. As long as there is a generic machinery to get from a Unit URI to the base units, conversion factors etc, even a URI mechanism would cover the more unusual cases. If units are URIs then these resources can hold additional metadata for this effect.

maximelefrancois86 · 2020-12-14T14:32:50Z

Dear all,

The BIPM (Bureau International des Poids et des Mesures - the intergovernmental organization through which Member States act together on matters related to measurement science and measurement standards) is organizing an on-line workshop Feb.
22-26 2021: The International System of Units (SI) in FAIR digital data

https://www.bipm.org/en/conference-centre/bipm-workshops/digital-si/

See a Draft - Grand Vision: Transforming the International System of Units for a Digital World

I was invited to present there. I aim to summarize the different approaches that have been discussed in the W3C groups I was involved, and other approaches I am aware of in the SemWeb community, with the identified pros/cons

You are welcome to attend this workshop too, the pre-registration form is here: https://form.jotform.com/BIPM/Workshop-SI-2021

VladimirAlexiev · 2020-12-16T12:20:26Z

@HolgerKnublauch and @maximelefrancois86 and @dr-shorthair I think we need both belt and suspenders:

LINDT for the speed, convenience and infinite on-demand extensibility
QUDT for the metadata: descriptions, dimensionality, scientific disciplines, and cross-links to other ontologies

BTW I'm now dealing with IEC and eClass units

eg see IEC quantity "mass density" and its units: https://cdd.iec.ch/cdd/iec61360/cdddev.nsf/ListsOfUnitsAllVersions/0112-2---62720%23UAD106?opendocument
It's a huge list but has no dimensionality, conversion factors, etc.
QUDT has links to some of these
Would be an interesting exercise to see how much of this is covered by UCUM. Eg degree Baume (US light) sounds like density of crude oil: no way UCUM has that

dr-shorthair · 2020-12-19T07:20:56Z

QUDT has links to some of these

Note that QUDT is now quite responsive to requests and bug reports, information supplementation etc.
Log an issue here - https://github.com/qudt/qudt-public-repo/issues
Better still: fork and make a PR.

maximelefrancois86 · 2021-02-18T15:24:14Z

As a matter of fact, theoretically in https://ucum.org/ucum.html

From 2.1§3■1 UCUM atom characters are in the ASCII range 33-126, minus a few characters. The following UCUM atom characters are forbidden in IRIs: <>|^`\ or need to be escaped in IRI local names: ~!$&'*,;?#@%_

From 2.1§6■1 UCUM characters for annotation { } are forbidden characters for IRIs

From 2.1§7■1 characters for operators . / need to be escaped in IRI local names

So encoding UCUM units in datatype IRIs, one would end up:

forbidding UCUM unit annotations
escaping many characters in IRI local names

I understand your concern about URL escapes. The following 'reserved' characters may appear in UCUM codes:

* ' ( ) + / [ ]

Of these, [ ] are commonly used and can't be easily worked around.
' appear in the codes for minutes and seconds, and in some qualified units like [in_i'H2O].
Parentheses ( ) can be used to group codes 'under the solidus' /, both of which can be avoided by using dots and negative exponents. + is only necessary for some power-of-ten factors.

I don't believe { } are reserved.

nichtich · 2024-08-01T08:42:03Z

I can assure that real world RDF data with units of measure happens to be given in at least these forms (with varying namespaces and ontologies):

@prefix cdt: <https://w3id.org/cdt/>
@prefix om: <http://www.ontology-of-units-of-measure.org/resource/om-2/>

# 1. Custom plain string without any reference (most common, good luck)
_:x my:weight "10 KiloGram" . 

# 2. Reference to a standard notation such as UCUM (better)
_:x my:weight "10 kg"^^cdt:ucum .

# 3. Value and data type from some standard vocabulary, e.g. OM (UCUM in RDF)
_:x my:weight "10"^om:kilogram . 

# 4. Measurement node with some custom or standard vocabulary
_:x my:weight [
  my:value 10 ;
  my:unit om:kilogram
]

The last form has several variants. Here is an actual example from practice using CRM ontology (slightly simplified, it's even more complex!):

@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .

_:m crm:P39 measured _:x .
_:m [
  a crm:E16_Measurement ;
  crm:P40_observed_dimension [
    a crm:E54_Dimension ;
    crm:P90_has_value: 2.8 ;
    crm:P91_has_unit [ # this would map to an existing unit URI such as om:centimetre
      a crm:E58_Measurement_Unit ;
      crm:P3_has_note "cm"  
  ] ;
  crm:P2_has_type [ # this would need to another vocabulary with definition of "height"
     crm:P3_has_note "Höhe"  
  ]
]

To handle and clean up this ways to model data with units of measure I'd stick to:

a standard to write down measures in string form and a corresponding RDF data type: UCUM and cdt:ucum looks good!
URIs for units of measurement such as kg, cm...: some have already been proposed and people will not stop creating new URIs and their own ontologies and lists for units with their own use cases. Any approach to collect all units in one single ontology is futile.
An ontology to link units of measurement, e.g. to state that a unit my:RomanMile is 5000 times another unit my:RomanFeet. SPARQL does not need to know about actual units, just about how to process their conversion factors.

dr-shorthair · 2024-08-02T01:16:04Z

IMAO we should encourage use of pattern 3. as it provides the required information in the most usable form

Value and data type from some standard vocabulary, e.g. OM (UCUM in RDF)
_:x my:weight "10"^^om:kilogram .

Unlike patterns 1. and 2. this does not use a microformat in which a literal must be parsed and broken up into multiple items. Pattern 3. can be processed by un-modified and unsupplemented RDF libraries.

And unlike pattern 4. it does not bury a scalar inside a data structure.

Yes, pattern 3. hands off interpretation of the scale to another service, but all the proposed options appear to do that anyway.

kasei · 2024-08-03T04:13:24Z

IMAO we should encourage use of pattern 3. as it provides the required information in the most usable form

I think "most usable" is going to be use-case dependent here. The CRM modeling is the way it is for reasons important to cultural heritage use-cases. The very verbose modeling here stems mostly from using an upper ontology that can be used to address diverse use-cases (e.g. the units and/or type of value such as "weight of 10kg" are not fixed or prescribed by the ontology), and allows metadata to be added to almost any part of the data (e.g. provenance data that preserves the exact lexical form of the value that might differ from a normalized numeric value; or adding a citation to exactly where a dimension value came from). FWIW, RDF 1.2 (RDF-star) may provide some new options to address these modeling needs.

Additionally, the CRM modeling has the advantage that it actually uses numeric values that will sort naturally in SPARQL (and use optimized storage and retrieval in many systems) without any runtime casting or conversion. Encouraging best practices can be good, but to maintain these benefits you'd have to go beyond best practices and ensure LINDT datatypes were officially supported by SPARQL and underlying stores.

TallTed · 2024-08-05T18:26:28Z

Additionally, the CRM modeling has the advantage that it actually uses numeric values that will sort naturally in SPARQL (and use optimized storage and retrieval in many systems) without any runtime casting or conversion.

Of course, numeric values for mass of 1 kg, mass of 0.997 kg, and mass of 999 g, all of which are valid, will not sort as desired, unless all mass values are converted (or forced) to kg or g.

(quibble: 10kg is a measure of mass, not weight, and is the same for the same object whether it's measured on Earth or the Moon. 10lbs is a measure of weight, not mass, and differs for the same object depending on whether it's measured on Earth or the Moon.)

ericprud · 2024-08-05T18:54:41Z

UCUM defines a countably infinite list of units. Any RDF approach is necessarily finite.

I don't know that it does have to be finite. What happens if we take UCUM verbatim and simply accept that there can be an infinite expression of datatypes just as there can be an infinite expression of values that they describe.

As a thought experiment, a more self-describing "datatype namespace" could define something like (borrowing from @TallTed's quibble):

# for some reason lbf is tied to Avoirdupois. whatever
"10"^^kind_n_type:massXdistanceYtimeYtime_lbf-av

kasei · 2024-08-05T18:56:47Z

Of course, numeric values for mass of 1 kg, mass of 0.997 kg, and mass of 999 g, all of which are valid, will not sort as desired, unless all massvalues are converted (or forced) tokgorg`.

Right. In the CIDOC case, you'd likely be restricting the query to a specific unit in the graph pattern, or be casting values with arbitrary units to a known unit via SPARQL extension function (or client-side, which has it's own set of challenges). I think that's somewhat orthogonal to the storage-level advantages of having real numeric types, but again this might be use-case dependent. FWIW, I think the Wikidata modeling has some similarities here, in that you can restrict to known units in the graph pattern by using the psn predicates for normalized values, and then on to a real quantityAmount numeric value.

nichtich · 2024-08-06T05:28:26Z

@kasei thanks for mentioning Wikidata. Its model of units of measures is documented with SPARQL queries here. The list of supported quantities is configured in a table but this table could be given in RDF with a (hopefully more simple) subset of QUDT Units Vocabulary.

ashleysommer mentioned this issue Nov 5, 2020

Add support in RDFLib for UCUM and LINDT for arithmetical operations on units of measurement RDFLib/rdflib#1198

Closed

JervenBolleman mentioned this issue Nov 6, 2020

Easier addition of support for custom datatypes to SPARQL endpoints #130

Open

VladimirAlexiev mentioned this issue Dec 18, 2020

comparable units should be reduced on division unitsofmeasurement/uom-systems#182

Closed

This was referenced Feb 2, 2021

let's use github discussions #136

Closed

comparable units should be reduced on division FHIR/Ucum-java#19

Open

dr-shorthair mentioned this issue Mar 2, 2021

What is hasSpatialResolution range opengeospatial/ogc-geosparql#98

Open

VladimirAlexiev mentioned this issue Mar 31, 2021

Could QUDT vocab entries be generated dynamically? qudt/qudt-public-repo#123

Open

Fak3 mentioned this issue Jul 16, 2021

Fixed units for weight, length, temperature and duration edi3/edi3-json-ld-ndr#14

Open

VladimirAlexiev mentioned this issue Sep 28, 2022

YAML-LD datatypes (and tags for datatypes) json-ld/yaml-ld#17

Open

VladimirAlexiev mentioned this issue Mar 8, 2024

enable the use of custom datatypes (for RDF) admin-shell-io/aas-specs#380

Open

VladimirAlexiev mentioned this issue Sep 18, 2024

Consider LINDT for adding Units of Measure numerateweb/numerateweb-rdf4j#1

Open

nichtich mentioned this issue Oct 22, 2024

E58 Measurement Unit nfdi4objects/crm-rdf-ap#3

Open

LINDT units of measure #129

LINDT units of measure #129

Comments

VladimirAlexiev commented Nov 3, 2020 • edited Loading

Why?

Previous work

Proposed solution

Considerations for backward compatibility

JervenBolleman commented Nov 3, 2020 • edited Loading

VladimirAlexiev commented Nov 3, 2020

HolgerKnublauch commented Nov 3, 2020 • edited Loading

VladimirAlexiev commented Nov 3, 2020

VladimirAlexiev commented Nov 5, 2020 • edited Loading

namedgraph commented Nov 5, 2020

VladimirAlexiev commented Nov 5, 2020

maximelefrancois86 commented Nov 5, 2020

namedgraph commented Nov 5, 2020

maximelefrancois86 commented Nov 5, 2020

VladimirAlexiev commented Nov 5, 2020

maximelefrancois86 commented Nov 5, 2020

VladimirAlexiev commented Nov 5, 2020 • edited Loading

maximelefrancois86 commented Nov 5, 2020

dr-shorthair commented Nov 6, 2020 • edited Loading

dr-shorthair commented Nov 6, 2020 • edited Loading

dr-shorthair commented Nov 6, 2020 • edited Loading

sa-bpelakh commented Nov 6, 2020

dr-shorthair commented Nov 6, 2020

HolgerKnublauch commented Nov 6, 2020

JervenBolleman commented Nov 6, 2020

kasei commented Nov 6, 2020

VladimirAlexiev commented Nov 6, 2020

kasei commented Nov 6, 2020

ashleysommer commented Nov 6, 2020 • edited Loading

VladimirAlexiev commented Nov 6, 2020 • edited Loading

dr-shorthair commented Nov 6, 2020 • edited Loading

dr-shorthair commented Nov 6, 2020 • edited Loading

ericprud commented Nov 20, 2020

Canonical units

Unit ladder

P.S.

sa-bpelakh commented Nov 20, 2020

Canonical units

kasei commented Nov 20, 2020

ericprud commented Nov 20, 2020

dr-shorthair commented Nov 21, 2020

ericprud commented Nov 25, 2020

VladimirAlexiev commented Nov 28, 2020 • edited Loading

dr-shorthair commented Nov 29, 2020 • edited Loading

HolgerKnublauch commented Nov 29, 2020

maximelefrancois86 commented Dec 14, 2020

VladimirAlexiev commented Dec 16, 2020

dr-shorthair commented Dec 19, 2020

maximelefrancois86 commented Feb 18, 2021

nichtich commented Aug 1, 2024

dr-shorthair commented Aug 2, 2024 • edited Loading

kasei commented Aug 3, 2024

TallTed commented Aug 5, 2024 • edited Loading

ericprud commented Aug 5, 2024 • edited Loading

kasei commented Aug 5, 2024

nichtich commented Aug 6, 2024

VladimirAlexiev commented Nov 3, 2020 •

edited

Loading

JervenBolleman commented Nov 3, 2020 •

edited

Loading

HolgerKnublauch commented Nov 3, 2020 •

edited

Loading

VladimirAlexiev commented Nov 5, 2020 •

edited

Loading

VladimirAlexiev commented Nov 5, 2020 •

edited

Loading

dr-shorthair commented Nov 6, 2020 •

edited

Loading

dr-shorthair commented Nov 6, 2020 •

edited

Loading

dr-shorthair commented Nov 6, 2020 •

edited

Loading

ashleysommer commented Nov 6, 2020 •

edited

Loading

VladimirAlexiev commented Nov 6, 2020 •

edited

Loading

dr-shorthair commented Nov 6, 2020 •

edited

Loading

dr-shorthair commented Nov 6, 2020 •

edited

Loading

VladimirAlexiev commented Nov 28, 2020 •

edited

Loading

dr-shorthair commented Nov 29, 2020 •

edited

Loading

dr-shorthair commented Aug 2, 2024 •

edited

Loading

TallTed commented Aug 5, 2024 •

edited

Loading

ericprud commented Aug 5, 2024 •

edited

Loading