-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow 'cylc reload' to reload global configuration #6509
base: master
Are you sure you want to change the base?
Conversation
44554e6
to
2ae8bb3
Compare
We could do with documenting what can and cannot be reloaded. The configuration docs are autobuilt from the cylc-flow/cylc/flow/cfgspec/globalcfg.py Lines 739 to 741 in 9ff50f8
We define some common messages as module constants and template them in as needed, e.g: cylc-flow/cylc/flow/cfgspec/globalcfg.py Lines 341 to 347 in 9ff50f8
That sounds pretty bad! Presumably caused by platform definitions lingering from the original config? Or is it more subtle? |
Say a job starts with platform definition
While the job is running, the global configuration is changed to point to a different host
The existing job will continue running, and once it finishes it should report back to the Cylc server fine, however if the Cylc server tries to poll or kill the task any messages will get sent to the new platform host I don't think doing this should be blocked, it's one of the original use cases in #3762, but its worth highlighting in documentation. |
If If these hosts belong to two different platforms with two discrete queues, then failure is correct. This isn't a reasonable global config change, they are replacing one platform with another but using the same name for both, errors are expected. |
How does this functionality play with environment variables defined in the See #5418 (comment) |
Any environment variables that the cylc-rose plugin sets is kept constant through the reload |
It's not clear to me why the functional test is failing. Could be a timing thing, the workflow file is trying to do a cylc pause but that fails as it's using the wrong variable name, should be
|
Yes, that environment variable doesn't look right, but how was the test passing before? @wxtim, could you take a look at that functional test when you get a chance. |
That test is dubious (but probably not fatal since it's testing reload behaviour), but well spotted. I'll fix it on master. However, it is failing correctly in this case: I've opened a PR against your PR which should fix the problem and explains a little more: ScottWales#11 |
by both Cylc VR and Reload. Mirrors practice for VALIDATE, INSTALL, PLAY and RESTART
Add a store for CLI variables for reload, which can then be accessed
Thanks Tim, that's fixed the tests |
Reload has been a notoriously difficult feature to maintain and a major source of bugs, so we need to flesh out exactly what does and does not get reloaded and ensure we pin down our desired behavior with tests. Result of a quick skim of the global config (recommend a more detailed inspection):
|
For Added a functional test for Added a integration test for platform group updates. |
reload_global = Boolean( | ||
default_value=False, | ||
description="Also reload global config") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -361,6 +362,25 @@ async def reload_workflow(schd: 'Scheduler'): | |||
# give commands time to complete | |||
sleep(1) # give any remove-init's time to complete | |||
|
|||
if reload_global: | |||
# Reload global config if requested | |||
LOG.info("Reloading the global configuration.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will update the workflow status message (shown in the GUI toolbar) to reflect what Cylc is doing:
LOG.info("Reloading the global configuration.") | |
LOG.info("Reloading the global configuration.") | |
schd.reload_pending = 'reloading the global configuration' | |
schd.update_data_store() # update workflow status msg | |
await schd.update_data_structure() |
It's useful info incase the operation gets stuck with a filesystem issue, or incase loading the config takes a long time for whatever reason.
' workflow.' | ||
'\nOtherwise, fix the configuration and attempt to reload' | ||
' again.' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this is implemented, if the global config fails to reload, we log the error, but continue and reload the workflow config.
I see cylc reload --global
as a transaction that reloads both the global config and the workflow config. Ideally, if either of these operations fail, we would rollback the transaction. I.e, either both operations happen or neither happen, the workflow is always left in an intended state.
So I would expect it to bail at this point. I.E, If the global reload fails, I would expect the logic to skip straight to this block of logic at the end of the routine:
# resume the workflow if previously paused
schd.reload_pending = False
schd.update_data_store() # update workflow status msg
schd._update_workflow_state()
if not was_paused_before_reload:
schd.resume_workflow()
schd.process_workflow_db_queue() # see #5593
That does leave us with the situation where the global config does reload, but the workflow config doesn't (e.g. a validation error). Currently, the global config change will not roll back in this situation, but we could get it to do that something along the lines of:
if reload_global:
...
try:
cfg = glbl_cfg(cached=False)
except ...:
...
...
# reload the workflow definition
...
glbl_cfg().reload(cfg)
(would require a reload method on the GlobalConfig
object)
@@ -0,0 +1 @@ | |||
Added --global flag to 'cylc reload' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added --global flag to 'cylc reload' | |
Added --global flag to 'cylc reload' which also reloads the Cylc global config. |
|
||
The majority of scheduler settings affect the server and cannot be reloaded | ||
with ``cylc reload --global``, the server must be stopped and restarted for | ||
changes to take effect except for the sections [mail] and [events] which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little bit of Cylc Sphinx magic, use single backquotes (or the log role :cylc:conf
) to reference configuration sections:
changes to take effect except for the sections [mail] and [events] which | |
changes to take effect except for the sections `[mail]` and `[events]` which |
This generates links, but also runs the references through the internal consistency checker to make sure they stay valid.
@@ -108,6 +108,13 @@ | |||
|
|||
Not to be confused with :cylc:conf:`flow.cylc[scheduling]`. | |||
|
|||
.. note:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: This note will only appear in the workflow config reference, not the global config reference due to the default_for function which only includes the first paragraph.
@@ -1919,6 +1919,10 @@ class Meta: | |||
class Arguments: | |||
workflows = graphene.List(WorkflowID, required=True) | |||
|
|||
reload_global = Boolean( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this is a breaking change that will cause errors when cylc reload
is run against workflows using a different version both on the CLI and in the GUI.
Somewhat surprisingly, we don't yet have a working pattern for adding new fields to mutations. It's one of the things on the TODO list that we haven't found time for yet!
- If the new field appears in the mutation and the scheduler doesn't know about it -> error.
- If the new field doesn't appear in the mutation but the scheduler is expecting it -> error.
I think we can get around both issues by marking this field as optional (required=False
) and omitting the new field from the cylc reload
mutation unless the --global
option is used.
This way the mutation should remain forward and backward compatible so long as the --global
option isn't used. And should only fail for older schedulers where the --global
option is used (which is correct).
Sadly, we don't have any automated inter-version testing as yet (another one for the TODO list) so this will need to be tested manually for now.
) { | ||
reload ( | ||
workflows: $wFlows | ||
reloadGlobal: $reloadGlobal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[In relation to above comment]
I think we can avoid inter-version interoperability issues providing we omit the new reloadGlobal
field when --global
was not specified.
Add a
--global
flag tocylc reload
that will reload the server's global configuration when set. This can be used to update platform definitions while a workflow is running.Note that:
[scheduler]
section (ports, process pool size, etc.) require a server restart to take effect.Closes #3762
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
(andconda-environment.yml
if present).?.?.x
branch.