Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could QUDT vocab entries be generated dynamically? #123

Open
dr-shorthair opened this issue May 18, 2020 · 9 comments
Open

Could QUDT vocab entries be generated dynamically? #123

dr-shorthair opened this issue May 18, 2020 · 9 comments

Comments

@dr-shorthair
Copy link
Contributor

I see a lot of careful work going on to clean up the current cache of QUDT individuals, particularly in the /vocab/unit/ tree. This is good and enhances the credibility of the QUDT service. However, it is ultimately a never ending task, as the complete set of individual units of measure is essentially infinite, when you consider all the potential combinations of all the terminals and their variants in the different systems. Perhaps another approach is warranted - generate new individuals algorithmically.

For example a query could specify the dimension and system-of-units, or the UCUM symbol*, and then the QUDT service could return the QUDT representation and a URI for it. This might come from a static cache (which is what you are currently constructing) but if not found there it could be built on-the-fly.

*I mention the UCUM symbol since for the main tree of derived units, the UCUM symbol is both unique and its structure actually defines the production of the uom.

@stuchalk what do you think?

@steveraysteveray
Copy link
Collaborator

Simon,
Could you expand on your suggestion? Are you:
a) Suggesting using an algorithmic way to augment our statically coded vocabularies, or
b) Suggesting that we don't worry about maintaining a static vocabulary, and just create instances as needed?

Option a) would be nice, although we have found that there is a lot of judgment needed when creating new entries, not to mention properties like conversion multipliers, choices to be made when handling dimensionless units, and more.
Option b) sounds risky to me, for all the above reasons, plus the risk of things changing over time where one would not be as certain about what the URI was at some earlier time.

We (with a collaborator) took a run at a) for the IEC units, which is why our unit count popped up from <1000 to >1500 in the past couple of months.

@dr-shorthair
Copy link
Contributor Author

I suggest that the static vocabulary should be considered a cache, but that an API could generate new units algorithmically. The UCUM website has some java code to do this - see https://unitsofmeasure.org/trac#ImplementationSupport
Now you have explained the IEC project it is clear that you are also considering this approach.
I'm certainly not proposing to discard the existing vocabulary. But routine new units, which are merely combinations of the existing terminals, can be done, as demonstrated in UCUM.

I've grabbed the main UCUM materials and stashed it away in the /community/ branch, just in case. See #124

@steveraysteveray
Copy link
Collaborator

The idea of generating routine combinations from existing terminals sounds intriguing. An obvious example is the generation of prefixed units (Giga, Mega, etc.). Is there a specific project you know of in the ImplementationSupport link you provided?

@stuchalk
Copy link
Contributor

I think there definitely a need and to have units developed on the fly. I am going to work on building some prototype stuff for the CIPM Digital -SI and will keep this in mind as a very general (but important) use case.

@dr-shorthair
Copy link
Contributor Author

Looks like there is active code development going on here: https://github.com/lhncbc/ucum-lhc

And this service provides an insight into what the required functionality shoudl be: https://ucum.nlm.nih.gov/ucum-service.html

There is an older project here: https://code.google.com/archive/p/unitsofmeasure/source/default/source

@steveraysteveray
Copy link
Collaborator

Thanks. The nih work could be very useful for some automated validation of QUDT... I'm putting this issue on our project page.

@VladimirAlexiev
Copy link
Contributor

@dr-shorthair +1 an excellent idea though initially sounded a bit AI-ish to me.
That will add to QUDT a major benefit of UCUM/LINDT: infinite on-demand extensibility (w3c/sparql-dev#129 (comment))

@steveraysteveray could you elaborate on your IEC project? eg https://cdd.iec.ch/cdd/iec61360/cdddev.nsf/ListsOfUnitsAllVersions/0112-2---62720%23UAD106?opendocument is IEC quantity "mass density" and its units. It's a huge list but has no dimensionality, conversion factors, etc. Did you use some internal source that's not exposed on the web?

@steveraysteveray
Copy link
Collaborator

I believe we used publicly available sources for the IEC61360 codes. I will defer to @jhodgesatmb for the specifics of what was done, as he reviewed the work of our collaborator.

@jhodgesatmb
Copy link
Collaborator

jhodgesatmb commented Mar 31, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants