-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/efa: Make runt_size aligned #9626
Conversation
prov/efa/src/rdm/efa_rdm_ep_utils.c
Outdated
|
||
size_t efa_rdm_ep_get_memory_alignment(struct efa_rdm_ep *ep, enum fi_hmem_iface iface) | ||
{ | ||
size_t memory_alignment = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the 8 also a macro EFA_RDM_DEFAULT_MEMORY_ALIGNMENT
similar to the other ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
prov/efa/test/efa_unit_test_runt.c
Outdated
mock_txe.desc[0] = &mock_mr; | ||
|
||
runt_size = efa_rdm_peer_get_runt_size(peer, efa_rdm_ep, &mock_txe); | ||
printf("runt_size: %lu, expected size: %lu\n", runt_size, expected_runt_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I told myself 100 times to delete this line but I still forgot.
prov/efa/src/rdm/efa_rdm_ope.c
Outdated
/* each packet must be aligned */ | ||
/* | ||
* Each packet must be aligned. | ||
* single_pkt_entry_data_size & ~(memory_alignment - 1) is 0 when total_pkt_entry_data_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single_pkt_entry_data_size & ~(memory_alignment - 1) is 0 when total_pkt_entry_data_size is smaller than memory_alignment
I think this is a typo single_pkt_entry_data_size & ~(memory_alignment - 1) is 0 when single_pkt_entry_data_size is smaller than memory_alignment
Then this comment does not help... I think the assertion on L542 should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, removed the unnecessary comment
prov/efa/test/efa_unit_test_runt.c
Outdated
mock_txe.addr = addr; | ||
mock_txe.iov_count = 1; | ||
mock_txe.iov[0].iov_base = NULL; | ||
mock_txe.iov[0].iov_len = 9000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mock_txe.iov[0].iov_len = total_len
or simply skip initializing mock_txe.iov
Same in test_efa_rdm_peer_select_readbase_rtm_impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
prov/efa/test/efa_unit_test_runt.c
Outdated
msg_length = 12000; | ||
peer_num_runt_bytes_in_flight = 10000; | ||
total_runt_size = 16384; | ||
/* 16384 - 10000 is smaller than 12000 (total_len), runt size must be (16384 - 10000) // 64 * 64 = 11968 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(16384 - 10000) // 64 * 64 = 6336
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Fixed.
txe->bytes_runt = MIN(hmem_info->runt_size - peer->num_runt_bytes_in_flight, txe->total_len); | ||
txe->bytes_runt = efa_rdm_peer_get_runt_size(peer, ep, txe); | ||
|
||
assert(txe->bytes_runt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have an assert on txe->bytes_runt
but the value could be zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh never mind. If efa_rdm_peer_get_runt_size
returns 0, then we go to long read. There cannot be a race condition because of the srx lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if runt size 0, we won't reach this function. So this assert is valid
prov/efa/test/efa_unit_test_runt.c
Outdated
msg_length = 12000; | ||
peer_num_runt_bytes_in_flight = 1000; | ||
total_runt_size = 1004; | ||
/* 1048 - 1000 is smaller than host memory alignment, runt size must be 0 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment says 1048 but runt size is 1004
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Fixed
Currently, txe->runt_size can be non-multiple of the memory alignment, which caused the following issues: 1. It caused the data size to be copied on the receiver side not a multiple of memory alignment. This not only caused non-performant data copy (gdrcopy or local read), but also breaks the LL128 protocol for send/recv, which requires the data size to be copied must be a multiple of 128 (the memory alignment in this case). 2. It caused the single_pkt_entry_data_size variable in efa_rdm_ope_prepare_to_post_send() to be 0 after doing the alignment trim. This patch makes the runt size always aligned before we decide whether to use runting read protocol. If the aligned runt size is 0, we won't do runting read. Also added a series of unit test to validate this change Signed-off-by: Shi Jin <[email protected]>
Currently, txe->runt_size can be non-multiple of the memory alignment, which caused the following issues:
It caused the data size to be copied on the receiver side not a multiple of memory alignment. This not only caused non-performant data copy (gdrcopy or local read), but also breaks the LL128 protocol for send/recv, which requires the data size to be copied must be a multiple of 128 (the memory alignment in this case).
It caused the single_pkt_entry_data_size variable in efa_rdm_ope_prepare_to_post_send() to be 0 after doing the alignment trim.
This patch makes the runt size always aligned before we decide whether to use runting read protocol. If the aligned runt size is 0, we won't do runting read.
Also added a series of unit test to validate this change