-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hard-wired scheduler host self-identification broken #6575
Comments
I can see how this behavior is unfortunate for the given use case, however, as you have noted above, host self-identification is not a job platform setting, it is a scheduler setting. It is working correctly as documented:
There is no suggestion here that this configuration would be applied inconsistently across the distributed system. This is an unsatisfied use case, not a bug (i.e. hardwired address mode might still be working fine for other use cases). If I understand correctly (please correct if not!), this issue is about supporting systems where:
Possible solutions:
|
This feature goes right back to Cylc 5 - see #85
I've followed up on the forum to say that it is currently working as advertised in the current docs. [However, the docs on this lost some information in the transition to Cylc 8, and evidently it worked differently in Cylc 7].
Correction: as I recall this feature was specifically intended to handle scheduler host identity as seen from job hosts, for task messaging, and I think we've since let other bits of the system crash that party. [Alex R on the forum has since confirmed that it worked as he expected with Cylc 7] So I suspect in Cylc 7 the setting was only used in the job environment, which would work for the use case reported on the forum. Earlier docs confirm this setting was for job communications. E.g. from 7.9.9 (also see the "todo" below):
Actually current docs still hint at this:
Install target would probably be sufficient, but in principle network settings are aligned to hosts not filesystems, right? |
Ref: https://cylc.discourse.group/t/cylc-vr-cannot-determine-whether-workflow-is-running-on-host/1099/7
Description
We can override the value of
CYLC_WORKFLOW_HOST
in the.service/contact
file in cases where job platform hosts do not see the scheduler host via the same network settings:Unfortunately this ends up in the local
contact
file as well as on the job platform (and the two locations might see the exact same file in any case), and"local" (non job) commands such as
cylc vr
also use it, e.g. to:other-name
to see if the scheduler is still runningother-name
to issue the scheduler commandThis will almost certainly fail - if the job platform sees the scheduler host by a different name, there's no reason to think that name will be valid on the local network.
Reproducible Example
I don't have a job platform that requires this setting. I think I did a very long time ago, but in light of this bug report I'm wondering how it ever worked (maybe the code used to figure out whether or not we needed to use the self-identifier name for particular commands, rather than automatically using it?)
Anyhow, to see the problem, run a simple workflow and:
contact
filThen do, e.g. do
cylc vr --yes
- it will fail trying to ssh to the bad host (to see if the scheduler is still running)Expected Behaviour
Hardwired scheduler host self-identification should be a job platform setting, and only used for communications from the right job platform.
The text was updated successfully, but these errors were encountered: