Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host-select: fix compatibility with force-condemned hosts #6643

Open
wants to merge 1 commit into
base: 8.4.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions changes.d/6623.fix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Auto restart: The "force condemn" option (that tells workflows running on a
server to shutdown as opposed to migrate) hasn't worked with the host-selection
mechanism since Cylc 8.0.0. This has now been fixed and the "force condemn"
option has been restored in the documentation.
46 changes: 41 additions & 5 deletions cylc/flow/cfgspec/globalcfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -826,16 +826,52 @@ def default_for(
range.
''')
Conf('condemned', VDR.V_ABSOLUTE_HOST_LIST, desc=f'''
These hosts will not be used to run jobs.
List run hosts that workflows should *not* run on.

If workflows are already running on
condemned hosts, Cylc will shut them down and
restart them on different hosts.
These hosts will be subtracted from the
`available <global.cylc[scheduler][run hosts]>` hosts
preventing new workflows from starting on the "condemned" host.

Any workflows running on these hosts will either migrate
to another host, or shutdown according to
:py:mod:`the configuration <cylc.flow.main_loop.auto_restart>`.

This feature requires ``auto restart`` to be listed
in `global.cylc[scheduler][main loop]plugins`.

For more information, see the
:py:mod:`auto restart <cylc.flow.main_loop.auto_restart>`
plugin.

.. rubric:: Example:

.. code-block:: cylc

[scheduler]
[[main loop]]
# activate the "auto restart" plugin
plugins = auto restart
[[run hosts]]
# there are three hosts in the "pool"
available = host1, host2, host3

# however two have been taken out:
# * workflows running on "host1" will attempt to
# restart on "host3"
# * workflows running on "host2" will shutdown
condemned = host1, host2!

.. seealso::

:py:mod:`cylc.flow.main_loop.auto_restart`
:ref:`auto-stop-restart`

.. versionchanged:: 8.4.2

The force-condemn ("!") option caused issues at workflow
startup for Cylc versions between 8.0.0 and 8.4.1
inclusive.

.. versionchanged:: 8.0.0

{REPLACES}``[suite servers]condemned hosts``.
Expand Down Expand Up @@ -1336,7 +1372,7 @@ def default_for(
The means by which task progress messages are reported back to
the running workflow.

..rubric:: Options:
.. rubric:: Options:

zmq
Direct client-server TCP communication via network ports
Expand Down
11 changes: 8 additions & 3 deletions cylc/flow/host_select.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,13 @@
# be returned with the up-to-date configuration.
global_config = glbl_cfg(cached=cached)

# condemned hosts may be suffixed with an "!" to activate "force mode"
blacklist = []
for host in global_config.get(['scheduler', 'run hosts', 'condemned'], []):
if host.endswith('!'):
host = host[:-1]

Check warning on line 135 in cylc/flow/host_select.py

View check run for this annotation

Codecov / codecov/patch

cylc/flow/host_select.py#L135

Added line #L135 was not covered by tests
Comment on lines +134 to +135
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, this line is covered by the test amended in this PR (confirm by running it on master), however, that test is not run in CI due to it's shared filesystem setup.

blacklist.append(host)

return select_host(
# list of workflow hosts
global_config.get([
Expand All @@ -138,9 +145,7 @@
'scheduler', 'run hosts', 'ranking'
]),
# list of condemned hosts
blacklist=global_config.get(
['scheduler', 'run hosts', 'condemned']
),
blacklist=blacklist,
blacklist_name='condemned host'
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,10 @@ create_test_global_config '' "
${BASE_GLOBAL_CONFIG}
[scheduler]
[[run hosts]]
available = ${CYLC_TEST_HOST_1}
available = ${CYLC_TEST_HOST_1}, ${CYLC_TEST_HOST_2}
# ensure the workflow can start if a host is force-condemned
# see #6623
condemned = ${CYLC_TEST_HOST_2}!
"

set_test_number 8
Expand Down
Loading