Implementing JetID (logical expressions) in correctionlib #268

patinkaew · 2024-12-10T11:35:20Z

Dear experts,

For the latest NanoAODv15, JME has decided to drop JetID branches because JetID criteria are updated more frequent than the production. Hence, we're trying to implement JetID in json format compatible with and to be used with correctionlib.

However, I attempted different approaches but none is working as expected. I'm currently out of ideas, so I would like to ask for some advices.
The issue here is that JetID criteria [1] is a logical expressions, i.e. logical-and of comparison statement (e.g. NHF<x).

Here are what I tried:
(1) Using Formula node with TFormula for all criteria. This doesn't work because there are limited number of variable's names recognised by TFormula (e.g. x,y,z), but there are more many variables in JetID criteria.

(2) Using Formula node with TFormula for each comparison statement separately and chaining with CorrectionSet. The problem I found is that I cannot chain Compound CorrectionSet again. For (NHF<x) & (NEF<y) & (MUF<z), I can chain these with * (effectively logical-and) operations, but cannot chain them again with + (effectively logical-or) for different eta ranges.

(3) Using Binning node to implement comparison and compose them. The problem is the binning node has a strict [x, y) interval (closed on the lower bound and open on the upper bound). This causes mis-identification in some edge cases when the criterion is <=, for example.

Do you have any other ideas I can try?

Best,
Patin

[1] https://twiki.cern.ch/twiki/bin/view/CMS/JetID13p6TeV

hqucms · 2025-01-13T11:03:28Z

Any suggestions on this, @nsmith- ?

nsmith- · 2025-01-13T17:01:53Z

Thanks for the ping!

My first reaction is that it is a bit unfortunate that we are not able to produce NanoAOD frequently enough to keep the JetID variable up to date. It was the intention of NanoAOD to be sufficiently quick to produce that it could always have (at least for data) the latest jet energy corrections. I would hope that JetID is no more frequently changing than the JECs themselves? Even if so, certainly it is worth making a new NanoAOD version. The JetID branch is 1 boolean per event, while if every analysis has to re-compute it, they will need to read and decompress 8+ floating-point branches, significantly amplifying the I/O and slowing down the event processing.

The above point aside, indeed I would say there is no elegant way to describe a selection or series of cuts in correctionlib. We didn't expect selections to be in scope for this utility, though I can see how it might be used to that effect in some cases. On the one hand, it is better to let analysis tools like RooDataFrame and Coffea be aware of the selections so they can potentially perform internal optimizations, but on the other hand it would be nice for users to have a guaranteed correct implementation of a given selection. We could consider adding a new node type filter, something like:

class Selection(Model):
    variable: str
    cmp: Literal[">", "<", ">=", "<=", "=="]
    value: float

class Filter(Model):
    nodetype: Literal["filter"]
    inputs: List[str]
    selections: List[Selection]
    true_value: Content
    false_value: Content

patinkaew · 2025-01-17T16:45:51Z

Hi @nsmith-,

Thank you for your reply and sorry for a delayed response. I think new node type proposal sounds good. I'm proposing the name "Comparison" instead to be a bit consistent with CMSSW string parser, but we don't need to.

For the Filter node, maybe we can add operator option to use either logical and or logical or of the Comparison nodes. Since you can give both true_value and false_value, I think the name Switch node might be better.

class Comparison(Model):
    variable: str
    cmp: Literal[">", "<", ">=", "<=", "==", "!="]
    value: float

class Switch(Model):
    nodetype: Literal["filter"]
    inputs: List[str]
    selections: List[Comparison]
    operator: Literal["and", "or"]
    true_value: Content
    false_value: Content

Do you think you can implement these? Otherwise, I can also take a look and try to implement these as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing JetID (logical expressions) in correctionlib #268

Implementing JetID (logical expressions) in correctionlib #268

patinkaew commented Dec 10, 2024

hqucms commented Jan 13, 2025

nsmith- commented Jan 13, 2025

patinkaew commented Jan 17, 2025

Implementing JetID (logical expressions) in correctionlib #268

Implementing JetID (logical expressions) in correctionlib #268

Comments

patinkaew commented Dec 10, 2024

hqucms commented Jan 13, 2025

nsmith- commented Jan 13, 2025

patinkaew commented Jan 17, 2025