-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter away empty fields/subfields after input #165
Comments
... in other words, do we want to support MARC21 records containing "empty fields" such as:
and "empty subfields" such as:
or do we want to always remove these empty fields/subfields? CC @aw-bib @martinkoehler @fjorba @jma @basaglia CC @inveniosoftware/triagers |
Just crosschecked with our librarians to be sure not to miss esotheric cases:
As for TINDs comment: our librarians confirmed that e.g. Aleph allows to load empty fields/subfields on ingestion of external data. (I.e. |
OK. Given the above and:
|
Then we should have a specific |
Such as the general one we are using in INSPIRE? https://github.com/inspirehep/inspire-next/blob/master/inspirehep/dojson/utils/__init__.py#L245 |
Yes, I think we can close this RFC to say that empty values in fields/subfields should be "tolerated" on the input upload side, but that we can delete them internally as soon as we spot them. |
Problem
Currently,
utils.filter_values()
is filtering away keys and corresponding values from dictionaries wherevalue is None
.This concretely means, e.g. in the context of MARC21 conversion to JSON, that subfields with empty strings would be preserved, datafields with no subfields would be preserved.
Proposal
If we assume that an empty string in the bibliographic metadata context doesn't carry any valuable information, it is proposed that
filter_values
actually filters away any key whose value is:False
False
value itself (thus representing flag set to false) or the 0 numberUsecases
According to TIND, @Kennethhole reports:
Related to INSPIRE, I can confirm that we have no use for empty values and we internally went further and have implemented a function that recursive visit the whole record and strips away also empty list and empty dicts that result from having filtered values.
https://github.com/inspirehep/inspire-next/blob/master/inspirehep/dojson/utils/__init__.py#L206
See also:
The text was updated successfully, but these errors were encountered: