-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the nx_char type for numpy to and . #554
base: master
Are you sure you want to change the base?
Conversation
What is this solving? The linked issue is about integers, this is about strings/bytes. Why don't we allow chararrays anymore? |
Please see #555 and the discussed way for solving this. We should remove arrays generally, and check the array dtype instead. |
chararry is a data structure like |
Can you remember which issue was handled in the modification https://github.com/FAIRmat-NFDI/pynxtools/blame/nx_char_type/src/pynxtools/dataconverter/helpers.py#L729 ? Then I can also test my code for that issue as well. |
This change was addressing the following issue: #393 The example from this issue should do: nexus file with one of the datasets with type NX_FLOAT, filled with integer = 0 (such as mpes example upload file, with sp.binned.attrs["metadata"]["energy_calibration"]["tof"] = 0 instead of 0.0). The is_valid_data_field() function in helpers.py successfully converted it to float(0.0), but this result was later interpreted as False (as in, failure to convert) by whatever was calling is_valid_data_field(). I can send you the files I used for testing (to big to be attached here). |
Thanks, I see it is not from your code. Can you help me to reproduce this error? |
In the version of pynxtools that we had back then, I used dataconverter in the following way:
The first one worked fine, no errors; the second gave something like "Field /ENTRY[entry]/PROCESS_MPES[process]/energy_calibration/original_axis written without documentation." (aside from usual messages). The resulting file was missing corresponding field (or its value, I am not sure anymore) (output.nxs/entry/process/energy_calibration/original_axis) The issue was present before this PR was merged: |
Currently, verification of NX data type does not convert the data to other data type anymore rather pop up an warning for any data type inconsistency. For example data Such conversion creates issues in some cases such as for numbers |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix. Left some small comments.
Let's wait with the merge until we have the correct definitions here and the plugin tests pass.
# Not to be confused with `np.byte` and `np.ubyte`, these store | ||
# and integer of `8bit` and `unsigned 8bit` respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Not to be confused with `np.byte` and `np.ubyte`, these store | |
# and integer of `8bit` and `unsigned 8bit` respectively. | |
# Not to be confused with `np.byte` and `np.ubyte`, these store | |
# integers of `8bit` and `unsigned 8bit` respectively. |
What exactly is not to be confused? Maybe write "np.xxx is not to be confused..."
return False | ||
def check_all_children_for_callable( | ||
objects: Union[list, np.ndarray], | ||
checker: Optional[Callable] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to rename this to callable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or test_function
This function also converts bool value comes in str format. In case, it fails to | ||
convert, it raises an Exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function also converts bool value comes in str format. In case, it fails to | |
convert, it raises an Exception. | |
This function also converts boolean value that are given as strings (i.e., "True" to True). |
It doesn't really raise an Exception, but just a ValidationProblem.InvalidDatetime
warning. What were you trying to say here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the InvalidDateTime warning anywhere.
collector.collect_and_log( | ||
path, ValidationProblem.InvalidType, accepted_types, nxdl_type | ||
) | ||
return False, value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This return value False here is interpreted as "undocumented field" in the calling function. So we get an additional wrong warning if the dtype does not match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that it is just for internal logic (please see line 750-766 in validation.py
). In is_docmented
it checks the datatype, node type (is field or group). But the real type of the errors are collect and print according to the validation error type in and from collect
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you are trying to say here. My point is that is_documented returns False if the datatype does not match, which I would say is wrong, because it is documented, but just has the wrong data type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior was btw. introduced in #522, I believe. Maybe @GinzburgLev or @lukaspie can comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to explain the code a bit, to clarify that in that code, no different type validation errors are collected, as you are assuming undocumented field
warning will pop up.
By the way, I removed the second value from return statement as it is not being used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #565
Returns two values: | ||
boolean (True if the the value corresponds to nxdl_type, False otherwise) | ||
converted_value bool value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add typing annotation. The calling function expects a single bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return handling is correct, nevertheless I suggest to add typing everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc is updated. You might be talking about the return is_valid_data_field(mapping[key], node.dtype, key)[0]
in `validation.py. It returns a single first value. Extra variable is added.
tmp_arr = None | ||
if isinstance(objects, list): | ||
# Handles list and list of list | ||
tmp_arr = np.array(objects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work of the dtypes within a list are not the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will work. Because, numpy will check if all the elments have the same size of bit atleast converable to a common datatype, Otherwise it fails. If there is heterogeneous datatype (like string, nemeric) numpy will choose unicode U
/ python object O
or something like that.
@RubelMozumder @sherjeelshabih I suggest to meet and discuss how to handle this PR and #557 . It does not make sense to independently develop different solutions for the same problem. |
Agreed. I already plan to merge what Rubel has here with some changes I want to keep in the other branch. |
is_documented_flag = is_valid_data_field(mapping[key], node.dtype, key) | ||
return is_documented_flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would still tread a field with wrong datatype as undocumented. Is that really what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I am confused what is exact suggestion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make a suggestions in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are getting two warnings (as you mentioned above), one is for ... without documentation
(which is wrong) and another one is related NX data type (expected).
But I see different behavior in our test warning,
tests/dataconverter/test_helpers.py:714: AssertionError
----------------------------- Captured stderr call -----------------------------
WARNING: The value at /ENTRY[my_entry]/NXODD_name[nxodd_name]/int_value should be one of: (<class 'int'>, <class 'numpy.integer'>), as defined in the NXDL as NX_INT.
------------------------------ Captured log call -------------------------------
WARNING pynxtools:helpers.py:98 WARNING: The value at /ENTRY[my_entry]/NXODD_name[nxodd_name]/int_value should be one of: (<class 'int'>, <class 'numpy.integer'>), as defined in the NXDL as NX_INT.
There is no ... without documentation warning
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though @axes
is defined in NXmpes_arpes.nxdl.xml
(https://github.com/FAIRmat-NFDI/nexus_definitions/blob/fairmat/contributed_definitions/NXmpes_arpes.nxdl.xml#L344), but the validation check can not pick that attribute from app def. So, it raise the without documentation warning
.
This is not an issue with is_documented
function but might be an issue with algorithm how it resolves the nexus tree and node.
You can continue to comment in this PR or another PR. I just did a small research how the undocumention
is coming only for a few quantities e.g. axes
for others where we have false datatype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge these to #565 to have a working version there
…e with documented keys
Fix for checking of undocumented keys
Add more tests
@lukaspie, you can also check this issue with here.
Fix NX data type validation errors:
true
->True).