Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/efa: Make runt_size aligned #9626

Merged
merged 1 commit into from
Dec 6, 2023
Merged

Conversation

shijin-aws
Copy link
Contributor

Currently, txe->runt_size can be non-multiple of the memory alignment, which caused the following issues:

  1. It caused the data size to be copied on the receiver side not a multiple of memory alignment. This not only caused non-performant data copy (gdrcopy or local read), but also breaks the LL128 protocol for send/recv, which requires the data size to be copied must be a multiple of 128 (the memory alignment in this case).

  2. It caused the single_pkt_entry_data_size variable in efa_rdm_ope_prepare_to_post_send() to be 0 after doing the alignment trim.

This patch makes the runt size always aligned before we decide whether to use runting read protocol. If the aligned runt size is 0, we won't do runting read.

Also added a series of unit test to validate this change

@shijin-aws shijin-aws requested a review from a team December 6, 2023 03:13

size_t efa_rdm_ep_get_memory_alignment(struct efa_rdm_ep *ep, enum fi_hmem_iface iface)
{
size_t memory_alignment = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the 8 also a macro EFA_RDM_DEFAULT_MEMORY_ALIGNMENT similar to the other ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

mock_txe.desc[0] = &mock_mr;

runt_size = efa_rdm_peer_get_runt_size(peer, efa_rdm_ep, &mock_txe);
printf("runt_size: %lu, expected size: %lu\n", runt_size, expected_runt_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to delete

Copy link
Contributor Author

@shijin-aws shijin-aws Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I told myself 100 times to delete this line but I still forgot.

/* each packet must be aligned */
/*
* Each packet must be aligned.
* single_pkt_entry_data_size & ~(memory_alignment - 1) is 0 when total_pkt_entry_data_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single_pkt_entry_data_size & ~(memory_alignment - 1) is 0 when total_pkt_entry_data_size is smaller than memory_alignment

I think this is a typo single_pkt_entry_data_size & ~(memory_alignment - 1) is 0 when single_pkt_entry_data_size is smaller than memory_alignment

Then this comment does not help... I think the assertion on L542 should be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed the unnecessary comment

mock_txe.addr = addr;
mock_txe.iov_count = 1;
mock_txe.iov[0].iov_base = NULL;
mock_txe.iov[0].iov_len = 9000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_txe.iov[0].iov_len = total_len or simply skip initializing mock_txe.iov

Same in test_efa_rdm_peer_select_readbase_rtm_impl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

msg_length = 12000;
peer_num_runt_bytes_in_flight = 10000;
total_runt_size = 16384;
/* 16384 - 10000 is smaller than 12000 (total_len), runt size must be (16384 - 10000) // 64 * 64 = 11968 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(16384 - 10000) // 64 * 64 = 6336

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Fixed.

txe->bytes_runt = MIN(hmem_info->runt_size - peer->num_runt_bytes_in_flight, txe->total_len);
txe->bytes_runt = efa_rdm_peer_get_runt_size(peer, ep, txe);

assert(txe->bytes_runt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have an assert on txe->bytes_runt but the value could be zero

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh never mind. If efa_rdm_peer_get_runt_size returns 0, then we go to long read. There cannot be a race condition because of the srx lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if runt size 0, we won't reach this function. So this assert is valid

msg_length = 12000;
peer_num_runt_bytes_in_flight = 1000;
total_runt_size = 1004;
/* 1048 - 1000 is smaller than host memory alignment, runt size must be 0 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment says 1048 but runt size is 1004

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Fixed

Currently, txe->runt_size can be non-multiple of the
memory alignment, which caused the following issues:

1. It caused the data size to be copied on the receiver side
not a multiple of memory alignment. This not only caused
non-performant data copy (gdrcopy or local read), but also
breaks the LL128 protocol for send/recv, which requires the
data size to be copied must be a multiple of 128 (the memory
alignment in this case).

2. It caused the single_pkt_entry_data_size variable in
efa_rdm_ope_prepare_to_post_send() to be 0 after doing
the alignment trim.

This patch makes the runt size always aligned before we
decide whether to use runting read protocol. If the
aligned runt size is 0, we won't do runting read.

Also added a series of unit test to validate this change

Signed-off-by: Shi Jin <[email protected]>
@shijin-aws shijin-aws merged commit 1604521 into ofiwg:main Dec 6, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants