Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing JetID (logical expressions) in correctionlib #268

Open
patinkaew opened this issue Dec 10, 2024 · 3 comments
Open

Implementing JetID (logical expressions) in correctionlib #268

patinkaew opened this issue Dec 10, 2024 · 3 comments

Comments

@patinkaew
Copy link

Dear experts,

For the latest NanoAODv15, JME has decided to drop JetID branches because JetID criteria are updated more frequent than the production. Hence, we're trying to implement JetID in json format compatible with and to be used with correctionlib.

However, I attempted different approaches but none is working as expected. I'm currently out of ideas, so I would like to ask for some advices.
The issue here is that JetID criteria [1] is a logical expressions, i.e. logical-and of comparison statement (e.g. NHF<x).

Here are what I tried:
(1) Using Formula node with TFormula for all criteria. This doesn't work because there are limited number of variable's names recognised by TFormula (e.g. x,y,z), but there are more many variables in JetID criteria.

(2) Using Formula node with TFormula for each comparison statement separately and chaining with CorrectionSet. The problem I found is that I cannot chain Compound CorrectionSet again. For (NHF<x) & (NEF<y) & (MUF<z), I can chain these with * (effectively logical-and) operations, but cannot chain them again with + (effectively logical-or) for different eta ranges.

(3) Using Binning node to implement comparison and compose them. The problem is the binning node has a strict [x, y) interval (closed on the lower bound and open on the upper bound). This causes mis-identification in some edge cases when the criterion is <=, for example.

Do you have any other ideas I can try?

Best,
Patin

[1] https://twiki.cern.ch/twiki/bin/view/CMS/JetID13p6TeV

@hqucms
Copy link

hqucms commented Jan 13, 2025

Any suggestions on this, @nsmith- ?

@nsmith-
Copy link
Collaborator

nsmith- commented Jan 13, 2025

Thanks for the ping!

My first reaction is that it is a bit unfortunate that we are not able to produce NanoAOD frequently enough to keep the JetID variable up to date. It was the intention of NanoAOD to be sufficiently quick to produce that it could always have (at least for data) the latest jet energy corrections. I would hope that JetID is no more frequently changing than the JECs themselves? Even if so, certainly it is worth making a new NanoAOD version. The JetID branch is 1 boolean per event, while if every analysis has to re-compute it, they will need to read and decompress 8+ floating-point branches, significantly amplifying the I/O and slowing down the event processing.

The above point aside, indeed I would say there is no elegant way to describe a selection or series of cuts in correctionlib. We didn't expect selections to be in scope for this utility, though I can see how it might be used to that effect in some cases. On the one hand, it is better to let analysis tools like RooDataFrame and Coffea be aware of the selections so they can potentially perform internal optimizations, but on the other hand it would be nice for users to have a guaranteed correct implementation of a given selection. We could consider adding a new node type filter, something like:

class Selection(Model):
    variable: str
    cmp: Literal[">", "<", ">=", "<=", "=="]
    value: float

class Filter(Model):
    nodetype: Literal["filter"]
    inputs: List[str]
    selections: List[Selection]
    true_value: Content
    false_value: Content

@patinkaew
Copy link
Author

Hi @nsmith-,

Thank you for your reply and sorry for a delayed response. I think new node type proposal sounds good. I'm proposing the name "Comparison" instead to be a bit consistent with CMSSW string parser, but we don't need to.

For the Filter node, maybe we can add operator option to use either logical and or logical or of the Comparison nodes. Since you can give both true_value and false_value, I think the name Switch node might be better.

class Comparison(Model):
    variable: str
    cmp: Literal[">", "<", ">=", "<=", "==", "!="]
    value: float

class Switch(Model):
    nodetype: Literal["filter"]
    inputs: List[str]
    selections: List[Comparison]
    operator: Literal["and", "or"]
    true_value: Content
    false_value: Content

Do you think you can implement these? Otherwise, I can also take a look and try to implement these as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants