Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceX Year 5 #430

Open
BenGalewsky opened this issue Aug 24, 2022 · 2 comments
Open

ServiceX Year 5 #430

BenGalewsky opened this issue Aug 24, 2022 · 2 comments
Labels

Comments

@BenGalewsky
Copy link
Contributor

BenGalewsky commented Aug 24, 2022

ServiceX goals for the last year of IRIS-HEP will enable the

  1. Increase Reliability
  2. Improve usability
  3. Increase it's physics reach

We believe that with the requested staffing for year 5 we should be able to achieve these goals.

More specifically:

1. Increase Reliability

a. Support multiple code generator backends with a single ServiceX Instance
b. Archive old transform results to manage object store space usage
c. Make releases easier and more frequent by reducing the complexity. Issue #431 puts all of the services into a single repo so they can be release from a single branch

There are several smaller issues in the backlog to meet this goal

2. Improve Usability

a. Make the ServiceX Transform requests durable where the results can be regenerated as needed
c. Add a synchronous interface to the ServiceX frontend and integrate it with the existing Coffea executors
d. Improve error reporting workflow to insure timely and actionable error reports are returned to the user. i.e. #408, #332, #317
e. Integrate with CERN JWT so users can bring their own credentials to runs
f. Implement columnar cache to allow users to share transformed results

3. Increase Physics Reach

a. Return systematic variations in the transform result #71 #429
b. Create transformer for CMS MiniAOD
c. Create transformer that can extract data from ATLAS open data zip files

@msneubauer
Copy link

Isn't 1b more appropriate to allow users to share transformed results? I guess what you mean by "old". I would think 2f is more about performance on subsequent column queries

@BenGalewsky
Copy link
Contributor Author

Isn't 1b more appropriate to allow users to share transformed results? I guess what you mean by "old". I would think 2f is more about performance on subsequent column queries

So 1b is entirely about managing our resources. Right now transforms just sit in the object store and it eventually fills up. 2a is a bit more about sharing transforms. I can give you my transform ID and even it it got cleaned up last month you are guaranteed to be able to regenerate.

I would certainly appreciate more strategic thinking around 2f. I guess it could be mostly useful for a single analyzer noodling around with their columnar data while they perfect their analysis. I thought part of the "I" in "IDDS" was about making transformed results more reusable for different analyzers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants