Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/efa: Add new efa-direct fi_info objects #10735

Merged
merged 3 commits into from
Feb 7, 2025

Conversation

sunkuamzn
Copy link
Contributor

efa-direct is a new provider that provides direct access to the EFA
device with minimal overhead and features. It does not support SAS
ordering, tagged messaging, atomics. It does not have copy based
protocols and cannot support sending/receiving large message sizes
and HMEM devices that don't have p2p support.

The functionality provided by the efa-direct provider is a subset of the
efa rdm provider. When applications do not require capabilities only
supported by the efa rdm provider, it is more performant for them to use
the efa-direct provider. Accordingly, the efa-direct fi_info objects are
returned before the efa rdm fi_info objects in the fi_getinfo call.

@sunkuamzn sunkuamzn requested a review from a team January 28, 2025 00:48
Copy link
Contributor

@shijin-aws shijin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My high level comment is that I don't think efa direct should be a separate ep type. I understand you want to use this to simplify your branching. But the branching should be on prov_name instead of ep type. I thought our plan is to make efa direct cover both dgram and rdm ep type. But that will break existing dgram user which use efa as prov_name. I don't know if this user actually exists, but to avoid the risk I am ok to make dgram ep type stay in efa and make efa-direct only cover rdm.

prov/efa/src/efa_base_ep.h Outdated Show resolved Hide resolved
prov/efa/src/efa_base_ep.h Outdated Show resolved Hide resolved
prov/efa/src/efa_device.h Outdated Show resolved Hide resolved
prov/efa/src/efa_domain.c Outdated Show resolved Hide resolved
prov/efa/src/efa_domain.h Outdated Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Outdated Show resolved Hide resolved
prov/efa/src/efa_user_info.c Show resolved Hide resolved
@sunkuamzn sunkuamzn force-pushed the efa-direct-prov branch 3 times, most recently from 2437ba1 to 470cd6d Compare January 28, 2025 17:35
prov/efa/src/efa_prov_info.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov.c Outdated Show resolved Hide resolved
prov/efa/src/efa_fabric.c Outdated Show resolved Hide resolved
Comment on lines 45 to 46
#define EFA_INFO_TYPE_IS_RDM(_info) \
(_info && _info->ep_attr && (_info->ep_attr->type == FI_EP_RDM) && !strcasecmp(_info->fabric_attr->name, EFA_PROV_NAME))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the linear string comparisons necessary? These are already pretty gnarly chained conditionals.

If the fi_info.fabric_attr.name is truly the only way to disambiguate efa-rdm from efa-direct (I.e. we can't add some other enum field elsewhere), I'd like to explore employing constant-time comparison schemes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is strncmp with the length of the longer string good enough? These macros are only used in the fi_getinfo and fi_domain calls, so they are not performance critical.

prov/efa/src/efa_base_ep.h Outdated Show resolved Hide resolved
prov/efa/src/efa_fabric.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Outdated Show resolved Hide resolved
prov/efa/src/efa_user_info.c Outdated Show resolved Hide resolved
prov/efa/src/efa_base_ep.h Outdated Show resolved Hide resolved
@sunkuamzn sunkuamzn force-pushed the efa-direct-prov branch 5 times, most recently from 41129e0 to 0d73582 Compare January 30, 2025 17:34
@sunkuamzn sunkuamzn changed the title core, prov/efa: Add new efa-direct provider prov/efa: Add new efa-direct provider Jan 30, 2025
@sunkuamzn sunkuamzn changed the title prov/efa: Add new efa-direct provider prov/efa: Add new efa-direct fi_info objects Jan 30, 2025
shijin-aws
shijin-aws previously approved these changes Jan 30, 2025
Copy link
Contributor

@shijin-aws shijin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thank you! Only a few nits.

prov/efa/src/efa_base_ep.h Outdated Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Outdated Show resolved Hide resolved
prov/efa/test/efa_unit_test_common.c Outdated Show resolved Hide resolved
@shijin-aws
Copy link
Contributor

AWS CI failure seems to still to be some tests are not using -f efa

@shijin-aws
Copy link
Contributor

I still hit bad qpn error when I only test rdm_pingpong with 1K... which means there is something wrong. Need to address this before can merge the PR

libfabric/fabtests$ FI_LOG_LEVEL=warn fi_rma_bw -p efa -f efa-direct -E -S 1024 lo
calhost
libfabric:1266354:1738371468::efa:cq:efa_cq_handle_error():100<warn> err: 107, message: Bad queue pair (QP) number (QP does not exist or is in error state) My EFA addr: fi_addr_efa://[fe80::86f:e0ff:fe87:ae7b]:1:574304512 My host id: i-0be65d40744bbb17b Peer EFA addr: fi_addr_efa://[fe80::86f:e0ff:fe87:ae7b]:0:0 Peer host id: N/A (9)
libfabric:1266354:1738371468::efa:cq:efa_show_help():100<warn> This error is detected remotely; typically encountered when the peer process is no longer present
[error] fabtests:common/shared.c:3043: cq_readerr 107 (Transport endpoint is not connected), provider errno: 9 (Bad queue pair (QP) number (QP does not exist or is in error state) My EFA addr: fi_addr_efa://[fe80::86f:e0ff:fe87:ae7b]:1:574304512 My host id: i-0be65d40744bbb17b Peer EFA addr: fi_addr_efa://[fe80::86f:e0ff:fe87:ae7b]:0:0 Peer host id: N/A)

@shijin-aws
Copy link
Contributor

shijin-aws commented Feb 1, 2025

I would carefully review int efa_prov_info_alloc_for_rdm(struct fi_info **prov_info_rdm_ptr, struct efa_device *device) to see whether we miss any configuration that should be supported for efa-direct as well.
Or just have another efa_prov_info_alloc_for_direct for similar usage.

prov/efa/src/efa_av.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Show resolved Hide resolved
prov/efa/src/efa_user_info.c Outdated Show resolved Hide resolved
prov/efa/src/efa_prov_info.c Show resolved Hide resolved
@sunkuamzn sunkuamzn force-pushed the efa-direct-prov branch 4 times, most recently from 46210fe to 300556d Compare February 5, 2025 23:15
}

/* efa-direct path requires FI_CONTEXT2 */
prov_info->tx_attr->mode |= FI_CONTEXT2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both dgram and direct requires FI_CONTEXT2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

shijin-aws
shijin-aws previously approved these changes Feb 5, 2025
@@ -251,15 +217,29 @@ void efa_prov_info_set_tx_rx_attr(struct fi_info *prov_info,
struct efa_device *device,
enum fi_ep_type ep_type)
{
/* efa-direct and DGRAM paths require FI_CONTEXT2 */
prov_info->tx_attr->mode |= FI_CONTEXT2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mode will be washed by the change in L225....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to move to the place after the if block

@shijin-aws shijin-aws self-requested a review February 6, 2025 02:48
efa-direct is a new path that provides direct access to the EFA
device with minimal overhead and features. It does not support SAS
ordering, tagged messaging, atomics. It does not have  copy based
protocols and cannot support sending/receiving large message sizes
and HMEM devices that don't have p2p support.

The functionality provided by the efa-direct path is a subset of the
efa rdm path. When applications do not require  capabilities only
supported by the efa rdm path, it is more performant for them to use
the efa-direct path. Accordingly, the efa-direct fi_info objects are
returned before the efa rdm fi_info objects in the fi_getinfo call.

Signed-off-by: Sai Sunku <[email protected]>
Previously, device->rdm_info->ep_attr->max_msg_size would always be the
max message size for FI_MSG operations (corresponding to the device MTU
size). With efa direct and related changes, max_msg_size can be the
maximum RMA size supported by the device which is 1GB.

So use device->ibv_port_attr.max_msg_sz instead when MTU size is
required.

Signed-off-by: Sai Sunku <[email protected]>
Remove temporary hacks added previously to test the efa-direct code path
and add new tests

Signed-off-by: Sai Sunku <[email protected]>
@shijin-aws
Copy link
Contributor

bot:aws:retest

return;
}
}
enable_hmem = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this variable enable_hmem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. To catch the case when HAVE_CUDA is defined but ofi_hmem_is_initialized(FI_HMEM_CUDA) is false. That happens when you compile with CUDA but run without it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems better to do an early return forif !ofi_hmem_is_initialized, then you do not need this enable_hmem.

return;
}

EFA_HMEM_IFACE_FOREACH(i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check for system here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No but FI_HMEM_SYSTEM will never fail the check below, so I don't think it's worth skipping FI_HMEM_SYSTEM explicitly

info->initialized = true;
if (iface == FI_HMEM_SYNAPSEAI || iface == FI_HMEM_SYSTEM) {
info->p2p_supported_by_device = true;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a EFA_HMEM_IFACE_FOREACH_NON_SYSTEM

@sunkuamzn
Copy link
Contributor Author

bot:aws:retest

@shijin-aws shijin-aws merged commit af4a3eb into ofiwg:main Feb 7, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants