New Epic: Idempotence #768
Replies: 3 comments 6 replies
-
@jmcook1186 one thing I've though re: group. It perhaps is a re-group rather than group, as in imagine you've run it once, it has a grouping already applied. The inputs would be "grouped" Then you change the grouping in that output static manifest, file and re-run - what happens? The inputs are already grouped in the old way. I think the right approach is in the start of the group phase, just turn everything into a 1-d time series of observations (ordered by time?) then run group again on this 1-d timeseries then pass that to the compute phase. That's quite useful, then if you were given a static manifest grouped one way, you could re-group in different ways, re-run and then voila. |
Beta Was this translation helpful? Give feedback.
-
Yes! Sold :) sounds much more "green by design" Question about the 3 phases: is the idea that a manifest would be restricted to running through just a single instance of each phase? Or could we plausibly allow for multiple combinations/repetitions of group, compute, group, compute, group compute e.g.? (asking because @josh-swerdlow and I will need to rethink how the |
Beta Was this translation helpful? Give feedback.
-
Does it makes sense for a 'exhaust'/'terminal' phase to be added to this as well? One idea Andrew and I threw around for the hackathon was creating a plugin that would only be put at the end and it would analyze what had occurred in the pipeline to generate a human readable audit. While discussing the idea, we broke down the phases of work that could occur and created similar phases to the above suggestions, but also had a 'terminal' phase. This is where outputs would be created, validation checks could occur, graphs computed, or human readable audits could be written. They are always guaranteed to go at the end. |
Beta Was this translation helpful? Give feedback.
-
Here's a new epic we want to work on int he next few weeks. laying out the early thinkiong here to you can all comment and help us refine before it gets worked up into tickets in our dev sprints.
Background and problem statement**
A command is idempotent if the result is identical regardless of how many times the command is executed. In the context of IF, we want to make it so that a manifest is idempotent so that re-executing the manifest always generates the same result. We can't always guarantee this today, because many manifests have importer plugins as the first element in their pipelines, and we don't control the servers that serve the APIs called by the importers, so we can't be sure that the same request will always yield the same response and by extension we can't guarantee that the same manifest will always give the same output. We anticipate many (most?) manifests using importers in the future. We also anticipate sharing and re-executing manifests being one of the main use cases for IF in the future, and today these things are somewhat incompatible due to the lack of idempotence.
However, we can fix this issue from IF's side by separating out the IF execution into distinct phases and enabling intermediates to be exported at the end of each phase. The first phase can be the data import, which generates a static file. This static file can then be shared, archived, or passed into the next phase of execution.
Separating out distinct execution phases also helps with features such as time-sync and group-by. In our current monolithic design, we have to invoke these featuires at the right point in the pipeline in order for them to execute correctly. While it is obvious what the right sequence is for simple manifests, this isn't necessarily the case for more complex manifests and it can lead to confusion and over-complicated manifest files.
It is also inefficient to have to re-execute an entire manifest, including the requests to external APIs in the importer plugins, just to change one element later in the execution pipeline. It would be much more efficient if we could capture static files at various points in the execution flow that can then be used to re-execute specific parts of the pipeline.
Solution
observe
: the pipeline to generate, import, gather observations. The outputs from this pipeline should be a 1 d array of observations.group
: Group the 1 d array of observations into a structure which makes sense for the induction step next (and aggregation, export etc...) similar to the group-by builtin.compute
: given a set of observations, this is the pipeline to calculate impactsWe first traverse the tree and run all the plugins in the observe pipeline, these get added to the inputs (for additive run mode) or replace the inputs (in replace mode). We have the option to capture the state of the manifest after these
observe
operations have been applied and save it to yaml file.We then traverse the tree and run grouping logic on the inputs. We have the option to capture the state of the manifest after these
observe
operations have been applied and save it to yaml file.We then traverse the tree and run the induce pipeline
For the
observe
andgroup
phases only theinputs
change.compute
doesn't change theinputs
it only generatesoutputs
.These phases should all run in sequence when you run the
ie
command. This is just the normal behaviour we have today. However, you should also be able to run each of the phases independently using--observe
,--group
and--compute
commands.If you just wanted to gather observations and then not run the rest of the pipelines you might run it with just the
--observe
flag like so:ie --observe -m manifest.yml --output static-manifest.yml
Then you might run the above file with just
--group
and end up with something like so:ie --group -m static-manifest.yml -o regrouped-manifest.yml
Again this is also a static manifest file which you can then run without any flags and it will run just the
compute
pipeline which generatesoutputs
.ie --compute -m regrouped-manifest.yml -o outputs.yml
Tasks
How you can help
You can read through this post and give feedback in comments, especially if you are a plugin developer that currently relies on node-level config. Later, when the specific tasks are available as tickets on our issue board you can let us know if you want to work on one. There may be some that are reserved for core developers, but in general we are keen to open up IF development to the community.
@jawache @zanete @narekhovhannisyan @MariamKhalatova @manushak
Beta Was this translation helpful? Give feedback.
All reactions