-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep local copies of files in a separate mets:FLocat #1079
Conversation
# Conflicts: # ocrd_models/ocrd_models/ocrd_file.py
When the files are downloaded there is an output with relative file path per line. When I did undo the downloading it returned just |
Good point, I hadn't thought about that. Should be fixed in 0f26809 For the kant_aufklaerung_1784 test asset:
and the reverse:
|
Yes, that output is more convenient. Great! |
One early decision that has haunted us for years now is that we have been using a single
mets:FLocat
for both the original URL of amets:file
and the local copy in the workspace we use for processing.This PR tries to solve #323 by changing
OcrdFile
and the download logic inResolver
andWorkspace
:OcrdFile.url
) remains inmets:FLocat[@LOCTYPE="URL"]/xlink:href
OcrdFile.local_filename
) will now be written to an additionalmets:Flocat[@LOCTYPE="OTHER"][@OTHERLOCTYPE="FILE"]/xlink:href
Workspace.download_file
is that after calling it withOcrdFile f
,f
will have alocal_filename
attribute and that is what processors should use rather than theurl
.Resolver.download_to_directory
andWorkspace.download_file
has been adapted accordingly.The goal here is to make the OCR-D processing non-invasive. Currently, once you do
ocrd workspace find --download
, the original URL will be gone. With this PR,ocrd workspace find --download
will add an additionalmets:Flocat
which can then be removed after processing is finished (to be compliant with the DFG Viewer METS profile) withocrd workspace find --undo-download
.