-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata correction #2177
base: master
Are you sure you want to change the base?
Metadata correction #2177
Conversation
Provide solution to correct the corruption of `Affiliation` JSON objects documented in dandi/dandi-schema#276
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of questions to start off with-- I'll have another look tomorrow.
ver.save() # TODO: should I use `save()` in this case? It has some custom logic. | ||
|
||
for ver, _ in changes: | ||
write_manifest_files.delay(ver.id) # TODO: Please check if this is needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if any of the file writes fail or time out?
dandi-archive/dandiapi/api/tasks/__init__.py
Lines 58 to 62 in 7728016
write_dandiset_yaml(version) | |
write_assets_yaml(version) | |
write_dandiset_jsonld(version) | |
write_assets_jsonld(version) | |
write_collection_jsonld(version) |
Maybe we should at least keep the AsyncResults and monitor that they go as planned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't know if I need to call write_manifest_files.delay(ver.id)
at all. I was trying to mimic
dandi-archive/dandiapi/api/management/commands/migrate_published_version_metadata.py
Line 52 in 7728016
write_manifest_files.delay(version.id) |
write_manifest_files()
at all.
Maybe we should at least keep the AsyncResults and monitor that they go as planned?
I think the answer is related to how important that statement gets executed successfully. In the example above, its execution status is not checked. If a success execution is important, I guess I can't interact with the DB the current way.
@dandi/dandiarchive
Note: This function corrects the corruptions described in | ||
https://github.com/dandi/dandi-schema/issues/276 | ||
""" | ||
unwanted_fields = ['contactPoint', 'includeInCitation', 'roleName'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the original intent of the change that led to this problem was to remove all fields but the allowed ones-- would it be beneficial to also have a list of allowed_fields to confirm that they are the only ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In concept, you are right. I considered do that as well. However, I want to be careful for now. I want to remove only the ones that I know are causing problems. I don't want to remove anything that we haven't analyzed previously.
This PR provides a solution to correct the corruptions in metadata documented in dandi/dandi-schema#276.
The solution is implemented in two parts.
correct_metadata
, to correct dandiset metadata.Extensive tests are provided for the helper function and its supporting function. However, because I am not familiar with this repo and Django in general, I am not able to provide tests for the command which interacts with the database. Advice and additional tests are very much appreciated.
The command can be run on a targeted dandiset at a particular version and run on all versions of all dandisets. Running on all dandiset versions will only correct the corrupted dandisets. If running only on targeted dandiset version is preferable, please let me know, and I will provide the list of corrupted dandiset versions. (Additionally, would changing the interface of the command to a file consisting of corrupted dandiset versions be better?)