You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With added provenance (#673), there is also a need for the following, with each possible original/transformed result to provide PROV metadata in various representations:
With Zip (#726 or other container) transforms from a application/directory, there is an issue where there is no way to generate "output.zip" or the various {...}.prov.[json|xml|rdf] without potentially introducing a conflict of whatever contents the output directory contains. Furthermore, doing subsequent provenance or transform requests could end up nesting the prov/alt files under an archive (eg: output.tar.gz or output.prov.json zipped within a output.zip because they were requested before).
With hard coded status, outputs, transforms, prov sub-directories, we achieve multiple advantages:
allows extending contents related to results with any future capability/representation, simply by adding a new sub-directory as needed
reduces listing of the WPS-output directories that currently duplicate 3 entries each time: {JOB_UUID}.xml, {JOB_UUID}.log, {JOB_UUID}/, since they will all be under a single {JOB_UUID}/ directory
separating outputs from transforms make it clear which one is the original vs the generated alternate contents
Considerations
This change will cause any existing job to be unable to dynamically generate an alternate transform representation, because the nested /outputs/ would not be resolved.
Generating the base /outputs/ location will have to consider the various output storages (eg: PyWPS local dir store vs AWS S3 store) that have different ways to indicate the prefix location
Job preparation would have to set up the /status/ directory for the XML and logs. This might impact however the PyWPS workers are configured, and how this information is chained across the execution pipeline.
Workflows that assume certain nested dir locations with {outputID} would have to dynamically resolve the paths.
2. Adjusting the dir result
Another approach would be to preserve the current directory structure, but only adjust results using application/directory such that it is nested within another /dir/ sub-directory:
Only application/directory outputs would need to be adjusted to consider the hardcoded /dir/.
Workflows would need to be updated, but his can be easily addressed since many parts of the code already have special handling for application/directory
Description
Problem
Directories for job outputs/results are defined as follows:
weaver/docs/source/processes.rst
Lines 1940 to 1942 in c121f8d
With potentially a nested rebase using the
X-WPS-Output-Context
:weaver/docs/source/processes.rst
Lines 1951 to 1955 in c121f8d
With the integration of result transforms (#548), there would also be (where
alt
is any extension mapped byext
transforms):With added provenance (#673), there is also a need for the following, with each possible original/transformed result to provide PROV metadata in various representations:
Another edge case is for an output
application/directory
, where the contents are:With Zip (#726 or other container) transforms from a
application/directory
, there is an issue where there is no way to generate "output.zip
" or the various{...}.prov.[json|xml|rdf]
without potentially introducing a conflict of whatever contents the output directory contains. Furthermore, doing subsequent provenance or transform requests could end up nesting the prov/alt files under an archive (eg:output.tar.gz
oroutput.prov.json
zipped within aoutput.zip
because they were requested before).Possible Solutions
1. Using nested directories
One way to address all above would be to refactor the directory structure as follows:
With hard coded
status
,outputs
,transforms
,prov
sub-directories, we achieve multiple advantages:{JOB_UUID}.xml
,{JOB_UUID}.log
,{JOB_UUID}/
, since they will all be under a single{JOB_UUID}/
directoryoutputs
fromtransforms
make it clear which one is the original vs the generated alternate contentsConsiderations
/outputs/
would not be resolved./outputs/
location will have to consider the various output storages (eg: PyWPS local dir store vs AWS S3 store) that have different ways to indicate the prefix location/status/
directory for the XML and logs. This might impact however the PyWPS workers are configured, and how this information is chained across the execution pipeline.{outputID}
would have to dynamically resolve the paths.2. Adjusting the dir result
Another approach would be to preserve the current directory structure, but only adjust results using
application/directory
such that it is nested within another/dir/
sub-directory:This would also allow other metadata to be represented as:
Considerations
application/directory
outputs would need to be adjusted to consider the hardcoded/dir/
.application/directory
References
cwltool
for OGC API - Processes IPT #673(static directory output represented by another
/stac/
subdir that nests the original results?)The text was updated successfully, but these errors were encountered: