Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add FLUX_ENCLOSING_ID to initial program environment for instances with a jobid broker attribute #6558

Merged
merged 3 commits into from
Jan 17, 2025

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Jan 17, 2025

This PR adds a FLUX_ENCLOSING_ID environment variable as an alternative to the broker jobid attribute. If an instance does not have a jobid attribute, then FLUX_ENCLOSING_ID is unset.

This environment variable can be used to refer to the jobid of the enclosing batch or alloc instance from the initial program or a job within that instance (since it will be inherited by jobs).

Problem: It is inconvenient to get the jobid of the closest enclosing
instance (e.g. the ID of a batch or alloc job in the parent instance)
because the broker drops FLUX_JOB_ID in favor of a jobid attribute.
However, a case has been made that an environment variable would be
more convenient since it can be accessed without use of system(3)
or the Flux API.

Introduce FLUX_ENCLOSING_ID, which is set by the broker whenever the
jobid attribute is set (i.e. when the broker is started as a job in
a Flux instance). This will be available in the initial program as
well as being inherited by jobs run within the instance.

Add the variable to the env_blocklist so that it is unset when the
current instance is not a job.

Fixes flux-framework#6474
Problem: No tests in the testsuite ensure FLUX_ENCLOSING_ID is set
(and unset) where applicable.

Add a couple tests to t0014-runlevel.t.
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a suggestion on expanding the description a bit.

Comment on lines 28 to 43
.. envvar:: FLUX_ENCLOSING_ID

The jobid of the closest enclosing Flux instance if it was run as a
job. This is set as an alternative to :envvar:`FLUX_JOB_ID` because
it applies only to the parent instance, not the current instance.

See :ref:`INITIAL PROGRAM ENVIRONMENT <initial_program_environment>` below
for more details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to spell this out a bit more for casual Flux user. I took a stab below but feel free to ignore or edit.

The job ID of the enclosing Flux instance, if it has one. The enclosing Flux instance is the one that ran FLUX_JOB_ID. Depending on how the enclosing Flux instance was started, it may or may not have a job ID. If it was not launched by Flux, FLUX_ENCLOSING_ID is not set.

Example 1: a batch job that runs one MPI job is submitted to a Flux system instance. In the environment of the MPI job, FLUX_ENCLOSING_ID refers to the batch job ID in the system instance.

Example 2: an MPI job is submitted directly to a Flux system instance. Since the Flux system instance was not launched by Flux, FLUX_ENCLOSING_ID is not set in the environment of the MPI job.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jim beat me to it, I agree fleshing this area out a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions. I'll take @garlick's text as is besides fixing a few nits (the rest of the document uses "jobid" not "job ID", and style guide says the word after a colon should be capitalized if it is the beginning of a sentence.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, pushed the update. I also dropped the reference to the INITIAL PROGRAM section as now it seemed superfluous since most if the detail in the JOB ENVIRONMENT section instead.

Comment on lines 183 to 192
Additionally, :envvar:`FLUX_ENCLOSING_ID` is set by the broker if it was
launched as part of a Flux job, and contains the jobid of the instance
in F58 form.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just "launched as a Flux job", not sure "part of" is necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, a single broker is not "a Flux job" unless the job has only one node.. but either way gets the point across if you feel strongly about it.

Copy link
Member

@chu11 chu11 Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh i see your context on it now. Perhaps just "is set", no need to say "set by the broker"? Dunno if the latter is a detail the user needs to know. but no biggie either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that will probably make it clearer to the intended audience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified to duplicate @garlick's text from above:

Additionally, :envvar:FLUX_ENCLOSING_ID is set to the jobid of the enclosing instance, if it has one.

How's that?

Copy link
Member

@chu11 chu11 Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup i think that works. I guess you could add back the F58 form, but I don't think that matters.

Problem: The FLUX_ENCLOSING_ID environment variable is not documented.

Document it in flux-environment(7).
@grondo
Copy link
Contributor Author

grondo commented Jan 17, 2025

Thanks! Will set MWP now.

@mergify mergify bot merged commit 122bd11 into flux-framework:master Jan 17, 2025
35 checks passed
Copy link

codecov bot commented Jan 17, 2025

Codecov Report

Attention: Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 79.44%. Comparing base (8259a8f) to head (a12f5e5).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
src/broker/runat.c 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6558      +/-   ##
==========================================
- Coverage   79.45%   79.44%   -0.02%     
==========================================
  Files         531      531              
  Lines       88240    88247       +7     
==========================================
- Hits        70109    70104       -5     
- Misses      18131    18143      +12     
Files with missing lines Coverage Δ
src/broker/broker.c 77.19% <100.00%> (+0.08%) ⬆️
src/broker/runat.c 72.34% <87.50%> (+0.02%) ⬆️

... and 9 files with indirect coverage changes

@grondo grondo deleted the issue#6474 branch January 17, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants