Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/efa: Distinguish max RMA size from max message transfer size #10082

Merged
merged 6 commits into from
Jun 19, 2024

Conversation

darrylabbate
Copy link
Member

This also adds getopt/setopt logic for the new supported endpoint options, as well as some validation logic for the len arguments to applicable operations.

@darrylabbate darrylabbate requested a review from a team June 12, 2024 23:29
Copy link
Contributor

@shijin-aws shijin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, I am not sure whether we have to validate the size every time during transmission calls.

@darrylabbate
Copy link
Member Author

I am not sure whether we have to validate the size every time during transmission calls.

I'll take another look at this - could be superfluous

@darrylabbate darrylabbate force-pushed the efa/ep-size-opts branch 2 times, most recently from d756592 to 01fb587 Compare June 13, 2024 22:57
@darrylabbate darrylabbate requested a review from shijin-aws June 13, 2024 23:05
if (len > efa_rdm_ep->inject_size) {
EFA_WARN(FI_LOG_CQ, "invalid message size %ld for inject.\n", len);
return -FI_EINVAL;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need an assert here right?

Copy link
Member Author

@darrylabbate darrylabbate Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I omitted these since the documentation for len in fi_tagged(3) doesn't have the same qualifier re: valid lengths, as in fi_msg(3) and fi_rma(3). This could easily be an oversight since the documentation was written ~10 years ago and never changed. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clearly an oversight in the documentation (cc @j-xiong ). You can add assert and update the man page

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

if (len > efa_rdm_ep->inject_size) {
EFA_WARN(FI_LOG_CQ, "invalid message size %ld for inject.\n", len);
return -FI_EINVAL;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need an assert

@@ -978,6 +966,7 @@ ssize_t efa_rdm_msg_recv(struct fid_ep *ep_fid, void *buf, size_t len,
struct efa_rdm_ep *ep;

ep = container_of(ep_fid, struct efa_rdm_ep, base_ep.util_ep.ep_fid.fid);
assert(len <= ep->max_msg_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to apply max_msg_size for recv. Though the documentation doesn't say that, I think it's totally fine that user post a bigger recv buffer than the incoming msg size, i.e. the max_msg_size only applies to transmission.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise for RMA read?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RMA read is still a transmission, so the limit applies.

@shijin-aws
Copy link
Contributor

shijin-aws commented Jun 14, 2024

I think we also need to revert 642fc3a as part of this PR

@darrylabbate darrylabbate force-pushed the efa/ep-size-opts branch 2 times, most recently from d0f5652 to 1282b93 Compare June 14, 2024 18:58
@darrylabbate
Copy link
Member Author

Unit test failure after (under) reverting 642fc3a. Likely due to other unit tests utilizing the ability to specify max_msg_size via hints in 2be8604.

@shijin-aws Are we still happy with the revert of 642fc3a? Reading through #9388, it seems like it was not hastily merged. If so, I can probably alter the unit test logic to use fi_setopt() instead.

@shijin-aws
Copy link
Contributor

shijin-aws commented Jun 15, 2024

Ah yes, I think it is caused by these tests https://github.com/ofiwg/libfabric/blob/main/prov/efa/test/efa_unit_test_ep.c#L646-L709. In order to fix it, you need to make it use fi_setopt to set max_msg_size.

New opts: FI_OPT_MAX_MSG_SIZE, FI_OPT_INJECT_MSG_SIZE

Signed-off-by: Darryl Abbate <[email protected]>
This only affects unit tests

Signed-off-by: Darryl Abbate <[email protected]>
@shijin-aws shijin-aws merged commit ad85f72 into ofiwg:main Jun 19, 2024
13 checks passed
@darrylabbate darrylabbate deleted the efa/ep-size-opts branch June 19, 2024 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants