Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add enum checking for attributes #557

Closed
wants to merge 10 commits into from
Closed

Conversation

rettigl
Copy link
Collaborator

@rettigl rettigl commented Feb 18, 2025

Possible fix for #556

@rettigl
Copy link
Collaborator Author

rettigl commented Feb 18, 2025

@sherjeelshabih please use this as a starting point.

@rettigl
Copy link
Collaborator Author

rettigl commented Feb 18, 2025

@lukaspie Please check the warnings for NXxps, they are probably relevant, similar as for NXmpes_arpes.

@lukaspie
Copy link
Collaborator

@rettigl thanks for the pointer. Indeed, the @vector attributes in NXxps are also wrongly assigned as NX_CHAR. I get warnings like WARNING: The value at /ENTRY[Survey]/SAMPLE[sample]/transformations/sample_normal_tilt_azimuth_angle/@vector should be on of the following strings: ['[0, 0, -1]'], whichmakes sense.

I started making them NX_NUMBER in my refactoring branch: FAIRmat-NFDI/nexus_definitions#329. I will test again if after updating the definitions in pynxtools, I also get errors for these attributes. If I understand #555 correctly, we should continue getting errors.

@sherjeelshabih
Copy link
Collaborator

I added a small check for discussion. I now add list types (based on our convention) as enumeration objects in the validation tree. I tested with passing a list [0, 0, 1] and that works as expected now. If there are more examples, I can have a look.

@lukaspie
Copy link
Collaborator

lukaspie commented Feb 24, 2025

@sherjeelshabih Thanks for the update!

With this branch now, I got new errors because I was using values for enumerated attributes that were not in the list. I updated pynxtools-xps, so now the tests are passing again.

However, I was expecting that I still get a similar warning as before (see above) since the thee @vector attributes in NXxps are still wrongly defined as NX_CHAR and I write a list of ints like '[0, 0, -1]' into the file. The idea for such lists would be that each individual item in the list is checked against the datatypes (which should lead to an error in my case) AND that the whole list is checked against the complete list in the enum. Is that done here already or is this planned in a separate PR? Does this come with #554?

@rettigl
Copy link
Collaborator Author

rettigl commented Feb 24, 2025

However, I was expecting that I still get a similar warning as before (see above) since the thee @vector attributes in NXxps are still wrongly defined as NX_CHAR and I write a list of ints like '[0, 0, -1]' into the file. The idea for such lists would be that each individual item in the list is checked against the datatypes (which should lead to an error in my case) AND that the whole list is checked against the complete list in the enum. Is that done here already or is this planned in a separate PR? Does this come with #554?

I checked the combination of this PR and #554. This seems to work more or less. The vectors are correctly checked for the enums, and a warning is given if the dtype is wrong (ints passed for NX_CHAR).

However, the checking of list of strings seems to not work anymore now, as example the axes attribute in data. I pass a tuple with ("angular0", "angular1", "energy"), and it produces these errors:

WARNING: The value at /ENTRY[entry]/data/@axes should be one of: (<class 'str'>, <class 'numpy.str_'>, <class 'numpy.bytes_'>), as defined in the NXDL as NX_CHAR.
WARNING: Field /ENTRY[entry]/data/@axes written without documentation.

@RubelMozumder This might be due to that tuples are not considered in #554, and also maybe not here?

Copy link
Collaborator Author

@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few issues with this after the last commit, see comments. For me, the combination of this branch before the last commit, and #554 seems to work fine.

Comment on lines 659 to 661
if isinstance(value, list):
return all(is_valid_data_field(v, nxdl_type, path) for v in value), output_value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This produces one warning per element in a list, if the datatype is not correct. @RubelMozumder had a different solution for this in #554.

"ISO8601": (str),
"NX_BINARY": (bytes, bytearray, np.byte, np.ubyte, np.ndarray),
"NX_BOOLEAN": (bool, np.ndarray, np.bool_),
"NX_CHAR": (str, np.ndarray, np.chararray),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still allows to write numeric arrays to an NX_CHAR, which is not correct. This issue is also solved by #554

Comment on lines -703 to -713
try:
if accepted_types[0] is bool and isinstance(value, str):
value = convert_str_to_bool_safe(value)
if value is None:
raise ValueError
output_value = accepted_types[0](value)
except ValueError:
collector.collect_and_log(
path, ValidationProblem.InvalidType, accepted_types, nxdl_type
)
return False, value
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the boolean string to bool conversion, this certainly makes sense and We could even think about doing this for all datatypes with safe literal conversion, as you did for the enums.

@rettigl
Copy link
Collaborator Author

rettigl commented Feb 26, 2025

@sanbrock This here adds lists as possible elements of enumerations for the dataconverter. However, read_nexus apparently still does not support this. In the debug log, I don't get any warning or so about a wrong enum:

DEBUG: ===== ATTRS (//entry/sample/transformations/sample_azimuth@vector)
DEBUG: value: [0 0 2] 
DEBUG: classpath: ['NXentry', 'NXsample', 'NXtransformations', 'NX_NUMBER']
DEBUG: classes:
NXmpes_arpes.nxdl.xml:/ENTRY/SAMPLE/transformations/sample_azimuth
NXtransformations.nxdl.xml:/AXISNAME
DEBUG: NXmpes_arpes.nxdl.xml:/ENTRY/SAMPLE/transformations/sample_azimuth@vector - [NX_NUMBER]
DEBUG: <<REQUIRED>>
DEBUG: enumeration (NXmpes_arpes.nxdl.xml:/ENTRY/SAMPLE/transformations/sample_azimuth/vector):
DEBUG: -> [0, 0, 1]

Nomad, on the other hand, reports a warning when parsing the file:


WARNING  nomad.processing     2025-02-26T16:09:43 error while setting attribute sample_tilt___vector in pynxtools.nomad.schema.Mpes_arpes__ENTRY__SAMPLE__transformations:Section as pynxtools.nomad.schema.Mpes_arpes__ENTRY__SAMPLE__transformations.sample_tilt___vector:Quantity
  - exception: Traceback (most recent call last):
      File "/home/femtolab_admin/nomad/nomad-distro-dev/packages/pynxtools/src/pynxtools/nomad/parser.py", line 245, in _populate_data
        current.m_set(metainfo_def, attribute)
      File "/home/femtolab_admin/nomad/nomad-distro-dev/packages/nomad-FAIR/nomad/metainfo/metainfo.py", line 1251, in m_set
        return definition.__set__(self, value, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/femtolab_admin/nomad/nomad-distro-dev/packages/nomad-FAIR/nomad/metainfo/metainfo.py", line 720, in wrapper
        return method(self, obj, value, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/femtolab_admin/nomad/nomad-distro-dev/packages/nomad-FAIR/nomad/metainfo/metainfo.py", line 3319, in __set__
        m_quantity.value = self.type.normalize(
                           ^^^^^^^^^^^^^^^^^^^^
      File "/home/femtolab_admin/nomad/nomad-distro-dev/packages/nomad-FAIR/nomad/metainfo/data_type.py", line 787, in normalize
        return self._normalize_impl(value, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/femtolab_admin/nomad/nomad-distro-dev/packages/nomad-FAIR/nomad/metainfo/data_type.py", line 1072, in _normalize_impl
        return self._validate_value(value)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/femtolab_admin/nomad/nomad-distro-dev/packages/nomad-FAIR/nomad/metainfo/data_type.py", line 1057, in _validate_value
        raise ValueError(f'{value} is not a value of this enumeration.')
    ValueError: [1 0 0] is not a value of this enumeration.
  - exception_hash: NT_2ZeFxcIb-dn2KEP5pzWvm0WKG
  - nomad.commit:
  - nomad.deployment: oasis
  - nomad.entry_id: THOgExlK-anJLuKyUpwvjeM4JdlY
  - nomad.mainfile: Scan1496.nxs
  - nomad.processing.logger: nomad.processing
  - nomad.processing.parser: pynxtools.nomad.entrypoints:nexus_parser
  - nomad.processing.proc: Entry
  - nomad.processing.process: process_entry
  - nomad.processing.process_status: RUNNING
  - nomad.processing.process_worker_id: iBGbnDdVRq2UUr2BfJj26w
  - nomad.processing.step: pynxtools.nomad.entrypoints:nexus_parser
  - nomad.processing.target_name: vector
  - nomad.service: worker
  - nomad.upload_id: cpbp-fGeRVq7NfIbM2rMVA
  - nomad.version: 1.3.14rc2
  - taskName: None

Do you know where to fix this?

@rettigl rettigl mentioned this pull request Mar 4, 2025
@rettigl
Copy link
Collaborator Author

rettigl commented Mar 5, 2025

Closed in favor of #554

@rettigl rettigl closed this Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants