ServiceX Year 5 #430

BenGalewsky · 2022-08-24T14:06:52Z

ServiceX goals for the last year of IRIS-HEP will enable the

Increase Reliability
Improve usability
Increase it's physics reach

We believe that with the requested staffing for year 5 we should be able to achieve these goals.

More specifically:

1. Increase Reliability

a. Support multiple code generator backends with a single ServiceX Instance
b. Archive old transform results to manage object store space usage
c. Make releases easier and more frequent by reducing the complexity. Issue #431 puts all of the services into a single repo so they can be release from a single branch

There are several smaller issues in the backlog to meet this goal

2. Improve Usability

a. Make the ServiceX Transform requests durable where the results can be regenerated as needed
c. Add a synchronous interface to the ServiceX frontend and integrate it with the existing Coffea executors
d. Improve error reporting workflow to insure timely and actionable error reports are returned to the user. i.e. #408, #332, #317
e. Integrate with CERN JWT so users can bring their own credentials to runs
f. Implement columnar cache to allow users to share transformed results

3. Increase Physics Reach

a. Return systematic variations in the transform result #71 #429
b. Create transformer for CMS MiniAOD
c. Create transformer that can extract data from ATLAS open data zip files

msneubauer · 2022-08-25T18:19:38Z

Isn't 1b more appropriate to allow users to share transformed results? I guess what you mean by "old". I would think 2f is more about performance on subsequent column queries

BenGalewsky · 2022-08-25T19:10:11Z

Isn't 1b more appropriate to allow users to share transformed results? I guess what you mean by "old". I would think 2f is more about performance on subsequent column queries

So 1b is entirely about managing our resources. Right now transforms just sit in the object store and it eventually fills up. 2a is a bit more about sharing transforms. I can give you my transform ID and even it it got cleaned up last month you are guaranteed to be able to regenerate.

I would certainly appreciate more strategic thinking around 2f. I guess it could be mostly useful for a single analyzer noodling around with their columnar data while they perfect their analysis. I thought part of the "I" in "IDDS" was about making transformed results more reusable for different analyzers.

BenGalewsky added the Epic label Aug 24, 2022

alexander-held mentioned this issue Oct 5, 2022

User experience and performance improvements for pipeline demonstrator iris-hep/analysis-grand-challenge#64

Open

36 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ServiceX Year 5 #430

ServiceX Year 5 #430

BenGalewsky commented Aug 24, 2022 •

edited

Loading

msneubauer commented Aug 25, 2022

BenGalewsky commented Aug 25, 2022

ServiceX Year 5 #430

ServiceX Year 5 #430

Comments

BenGalewsky commented Aug 24, 2022 • edited Loading

1. Increase Reliability

2. Improve Usability

3. Increase Physics Reach

msneubauer commented Aug 25, 2022

BenGalewsky commented Aug 25, 2022

BenGalewsky commented Aug 24, 2022 •

edited

Loading