Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] Distinguish primary and auxiliary ntuples in RNTupleProcessor::CreateJoin #17881

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

enirolf
Copy link
Contributor

@enirolf enirolf commented Mar 4, 2025

Grouping them all in one collection doesn't really make sense from a semantic point of view. This separation should make the roles of the different ntuples more clear.

A next consideration is the representation of the model(s). Ideally (IMO) there should just be one model passed to this factory that already represents the schema after the join. However, this would put a lot of burden on the user to create a "correct" model, so this requires a bit more thought and will be followed up on later.

@enirolf enirolf self-assigned this Mar 4, 2025
@enirolf enirolf requested review from jblomer and couet as code owners March 4, 2025 15:32
@enirolf enirolf force-pushed the ntuple-processor-join-factory branch from aa7e674 to 113e0e4 Compare March 4, 2025 16:23
Copy link

github-actions bot commented Mar 4, 2025

Test Results

    18 files      18 suites   4d 11h 49m 45s ⏱️
 2 724 tests  2 723 ✅ 0 💤 1 ❌
47 341 runs  47 340 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit 0ce1f45.

♻️ This comment has been updated with latest results.

enirolf added 3 commits March 5, 2025 08:51
... in the `RNTupleProcessor::CreateJoin` factory. This should make the
roles of the different ntuples more clear.
@enirolf enirolf force-pushed the ntuple-processor-join-factory branch from 113e0e4 to 0ce1f45 Compare March 5, 2025 08:00
This, way the `RNTupleJoinConstructor` itself creates a complete
processor, rather than having the `RNTupleProcessor::CreateJoin` factory
be responsible for this.
@enirolf enirolf requested a review from pcanal March 5, 2025 14:45
Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I left some comments inline for discussion.

/// \param[in] primaryNTuple The name and location of the primary RNTuple, which will drive the processor iteration
/// loop.
/// \param[in] auxNTuples The names and locations of the RNTuples to join the primary RNTuple with. Which entries of
/// these auxiliary RNTuples are read is determined by the primary RNTuple, which does not necessarily happens in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small typo, but I actually wanted to ask a clarification regarding this sentence. What does not necessarily happen in sequential order?

Suggested change
/// these auxiliary RNTuples are read is determined by the primary RNTuple, which does not necessarily happens in
/// these auxiliary RNTuples are read is determined by the primary RNTuple, which does not necessarily happen in

/// \param[in] auxNTUples The source specifications (name and storage location) of the auxiliary RNTuples.
/// \param[in] joinFields The names of the fields on which to join, in case the specified RNTuples are unaligned.
/// The join is made based on the combined join field values, and therefore each field has to be present in each
/// specified RNTuple. If an empty list is provided, it is assumed that the specified ntuple are fully aligned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// specified RNTuple. If an empty list is provided, it is assumed that the specified ntuple are fully aligned.
/// specified RNTuple. If an empty list is provided, it is assumed that the specified ntuples are fully aligned.

RNTupleJoinProcessor(const RNTupleOpenSpec &mainNTuple, std::string_view processorName,
std::unique_ptr<RNTupleModel> model = nullptr);
/// \param[in] models The models that specify which fields should be read by the processor, ordered according to
/// {mainNTuple, auxNTuple[0], ...} The pointer returned by RNTupleModel::MakeField can be used to access a field's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// {mainNTuple, auxNTuple[0], ...} The pointer returned by RNTupleModel::MakeField can be used to access a field's
/// {mainNTuple, auxNTuple[0], ...}. The pointer returned by RNTupleModel::MakeField can be used to access a field's

Comment on lines +533 to +534
/// \param[in] models The models that specify which fields should be read by the processor, ordered according to
/// {mainNTuple, auxNTuple[0], ...} The pointer returned by RNTupleModel::MakeField can be used to access a field's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description of the models parameter is slightly different from other descriptions of the same parameter in other places. Would it be better to homogenise them perhaps?

@@ -19,13 +19,13 @@

namespace {
using ROOT::Experimental::RNTupleOpenSpec;
void EnsureUniqueNTupleNames(const std::vector<RNTupleOpenSpec> &ntuples)
void EnsureUniqueNTupleNames(const RNTupleOpenSpec primaryNTuple, const std::vector<RNTupleOpenSpec> &auxNTuples)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void EnsureUniqueNTupleNames(const RNTupleOpenSpec primaryNTuple, const std::vector<RNTupleOpenSpec> &auxNTuples)
void EnsureUniqueNTupleNames(const RNTupleOpenSpec &primaryNTuple, const std::vector<RNTupleOpenSpec> &auxNTuples)

Comment on lines +169 to +172
innerProcs.push_back(
RNTupleProcessor::CreateJoin({fNTupleNames[0], fFileNames[0]}, {{fNTupleNames[1], fFileNames[1]}}, {}));
innerProcs.push_back(
RNTupleProcessor::CreateJoin({fNTupleNames[0], fFileNames[0]}, {{fNTupleNames[1], fFileNames[1]}}, {"i"}));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this practically chaining together a join composition made of two files that are aligned with another join composition with unaligned events?


auto proc = RNTupleProcessor::CreateJoin(ntuples, {});
// Primary ntuple only
auto proc = RNTupleProcessor::CreateJoin({fNTupleNames[0], fFileNames[0]}, {}, {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this test was already here before, but looking at it again made me wonder. Is this just a sanity check or do we expect this syntax to be used? On the surface it looks weird that a user calls CreateJoin if there is only one RNTuple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants