-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-45988: Support storing init-outputs in central repo #256
Draft
kfindeisen
wants to merge
15
commits into
main
Choose a base branch
from
tickets/DM-45988
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
538262e
to
1a8b660
Compare
c11bc55
to
aa4247d
Compare
The reference to daf_butler.cli.cliLog.CliLog works fine in some contexts, but breaks in others.
The output was annotated as an iterable, specced as a sequence with "undefined order" (which defeats the main benefit of using a sequence), and implemented as a list with possible duplicates. I've changed the docs to self-consistently say that it's a collection of arbitrary type, and implemented as a set internally (though the spec still does not guarantee deduplication).
In PipelinesConfig, "all_pipelines" is used to mean "any pipeline that might be run for any visit". Since the MiddlewareInterface method merely returns all pipelines that are applicable for a specific visit (whether preprocessing or primary), _get_combined_pipeline_files is less confusing.
This method returns _all_ pipelines that could possibly be run under this configuration, which is useful for visit-independent preparatory work.
The script takes the same environment variables (with the same defaults) as the activator, but does not run as a service and does not use any visit-specific information.
Running Gunicorn already implicitly adds `/app` to the Python path, but making it explicit will make the container's behavior more consistent and reliable.
The action now takes an optional input, dockerDir, which allows builds of containers other than ${PROMPT_PROCESSING_DIR}/Dockerfile.
Unlike the service container, the initializer container runs a single script and then exits, so it is best suited for Jobs or similar objects.
The chain needs to be defined at least once (instead of exactly once like the outputs themselves), but doing it in Prompt Processing leads to race conditions. The script already needs to define all the runs that would go into the chain.
Output chains are now handled more reliably by the init-output job.
We consistently use the same pattern to handle mocks initialized in setUp; create a subclass of TestCase that does this on demand.
I'd also like to count DatasetRefs by other properties, and the code for doing so is almost completely identical.
Now that init-outputs are created once in the central repo, we should use those for all processing to ensure consistency.
aa4247d
to
f797d8f
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a new program for one-time creation of pipeline init-outputs as a Kubernetes job.