-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata utilities #473
Comments
Do you have in mind that the classes you suggest would also be used as domain types for workflows written with Sciline? Or are they generally not specific enough? |
They could be used as domain types. That would make it easy for the output providers to request them. (Possibly from params, because some metadata cannot be extracted from the input, e.g., Author.) |
I like the idea, and we need more brainstorming. |
Do you mean 'load with the data'? We do this anyway to an extend with NeXus. We will likely have metadata providers for Sciline that extract metadata from the input (NeXus, SciCat).
Yes, that is the idea! Basically, these types would be a sort of intermediate representation that can be constructed from different sources (SciCat, NeXus, user input, ...) and can be converted to specific representations (SciCat, ORSO, CIF, ...). So we can mix and match parsers and representers without implementing each combination separately. |
Judging by scipp/essreflectometry#27 (comment), we may also want some general input file tracking tools. Given that the sources for input data can vary (local file, pooch file, SciCat), it makes sense to have a way of getting common info such as
This probably factors into selecting inputs as well. See, e.g., scipp/essreflectometry#25 |
Made an overview of metadata in the file formats we use: Common.Metadata.Schemas.md |
Overview
We need to handle metadata in a few places. And some of that metadata is common to many techniques and workflows. In particular, we may/will have SciCat metadata in many workflows. But there are also technique-specific file formats that encode metadata, e.g., NXCanSAS, ORSO, CIF. And they each use different names and layouts even though they have many fields in common.
To provide a common ground and unify metadata handling (at least to an extend) I propose adding common classes and maybe functions that
ScippNeutron seems to me like a good place to put these utilities as it is used by all ess* packages that implement the actual workflows. We could of course also use a separate package but that seems a little overkill.
Suggested implementation
We can use Pydantic models to encode an validate metadata. For example, the following are commonly used:
And maybe
The
Beamline
model could of course be extended to include technical info about the beamline / instrument such as flight path length, choppers, sample holder, etc. But the fields above are a minimum set of strictly metadata that should be common to all large scale facilities.Example
The text was updated successfully, but these errors were encountered: