-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] Distinguish primary and auxiliary ntuples in RNTupleProcessor::CreateJoin
#17881
base: master
Are you sure you want to change the base?
[ntuple] Distinguish primary and auxiliary ntuples in RNTupleProcessor::CreateJoin
#17881
Conversation
aa7e674
to
113e0e4
Compare
Test Results 18 files 18 suites 4d 11h 49m 45s ⏱️ For more details on these failures, see this check. Results for commit 0ce1f45. ♻️ This comment has been updated with latest results. |
... in the `RNTupleProcessor::CreateJoin` factory. This should make the roles of the different ntuples more clear.
113e0e4
to
0ce1f45
Compare
This, way the `RNTupleJoinConstructor` itself creates a complete processor, rather than having the `RNTupleProcessor::CreateJoin` factory be responsible for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I left some comments inline for discussion.
/// \param[in] primaryNTuple The name and location of the primary RNTuple, which will drive the processor iteration | ||
/// loop. | ||
/// \param[in] auxNTuples The names and locations of the RNTuples to join the primary RNTuple with. Which entries of | ||
/// these auxiliary RNTuples are read is determined by the primary RNTuple, which does not necessarily happens in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a small typo, but I actually wanted to ask a clarification regarding this sentence. What does not necessarily happen in sequential order?
/// these auxiliary RNTuples are read is determined by the primary RNTuple, which does not necessarily happens in | |
/// these auxiliary RNTuples are read is determined by the primary RNTuple, which does not necessarily happen in |
/// \param[in] auxNTUples The source specifications (name and storage location) of the auxiliary RNTuples. | ||
/// \param[in] joinFields The names of the fields on which to join, in case the specified RNTuples are unaligned. | ||
/// The join is made based on the combined join field values, and therefore each field has to be present in each | ||
/// specified RNTuple. If an empty list is provided, it is assumed that the specified ntuple are fully aligned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// specified RNTuple. If an empty list is provided, it is assumed that the specified ntuple are fully aligned. | |
/// specified RNTuple. If an empty list is provided, it is assumed that the specified ntuples are fully aligned. |
RNTupleJoinProcessor(const RNTupleOpenSpec &mainNTuple, std::string_view processorName, | ||
std::unique_ptr<RNTupleModel> model = nullptr); | ||
/// \param[in] models The models that specify which fields should be read by the processor, ordered according to | ||
/// {mainNTuple, auxNTuple[0], ...} The pointer returned by RNTupleModel::MakeField can be used to access a field's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// {mainNTuple, auxNTuple[0], ...} The pointer returned by RNTupleModel::MakeField can be used to access a field's | |
/// {mainNTuple, auxNTuple[0], ...}. The pointer returned by RNTupleModel::MakeField can be used to access a field's |
/// \param[in] models The models that specify which fields should be read by the processor, ordered according to | ||
/// {mainNTuple, auxNTuple[0], ...} The pointer returned by RNTupleModel::MakeField can be used to access a field's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description of the models
parameter is slightly different from other descriptions of the same parameter in other places. Would it be better to homogenise them perhaps?
@@ -19,13 +19,13 @@ | |||
|
|||
namespace { | |||
using ROOT::Experimental::RNTupleOpenSpec; | |||
void EnsureUniqueNTupleNames(const std::vector<RNTupleOpenSpec> &ntuples) | |||
void EnsureUniqueNTupleNames(const RNTupleOpenSpec primaryNTuple, const std::vector<RNTupleOpenSpec> &auxNTuples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void EnsureUniqueNTupleNames(const RNTupleOpenSpec primaryNTuple, const std::vector<RNTupleOpenSpec> &auxNTuples) | |
void EnsureUniqueNTupleNames(const RNTupleOpenSpec &primaryNTuple, const std::vector<RNTupleOpenSpec> &auxNTuples) |
innerProcs.push_back( | ||
RNTupleProcessor::CreateJoin({fNTupleNames[0], fFileNames[0]}, {{fNTupleNames[1], fFileNames[1]}}, {})); | ||
innerProcs.push_back( | ||
RNTupleProcessor::CreateJoin({fNTupleNames[0], fFileNames[0]}, {{fNTupleNames[1], fFileNames[1]}}, {"i"})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this practically chaining together a join composition made of two files that are aligned with another join composition with unaligned events?
|
||
auto proc = RNTupleProcessor::CreateJoin(ntuples, {}); | ||
// Primary ntuple only | ||
auto proc = RNTupleProcessor::CreateJoin({fNTupleNames[0], fFileNames[0]}, {}, {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this test was already here before, but looking at it again made me wonder. Is this just a sanity check or do we expect this syntax to be used? On the surface it looks weird that a user calls CreateJoin
if there is only one RNTuple.
Grouping them all in one collection doesn't really make sense from a semantic point of view. This separation should make the roles of the different ntuples more clear.
A next consideration is the representation of the model(s). Ideally (IMO) there should just be one model passed to this factory that already represents the schema after the join. However, this would put a lot of burden on the user to create a "correct" model, so this requires a bit more thought and will be followed up on later.