Skip to content

Usage Guide

tim dettmar edited this page Sep 9, 2021 · 1 revision

DPU-MPI Usage Guide

Using DPU-MPI is relatively straightforward.
At a minimum, you will need to include the following:

#include "dpulib.h" // Always required
#include "get_ip.h" // IP utilities

After initializing MPI as usual, you’ll need to initialize the DPU connection.
If you know the DPU IP address offset, you can simply specify it in the second argument to offset_addr rather than using a command-line argument. You’ll also need to select your interface name, like ib0_mlx5.
The port is currently fixed at 9999. The third argument, 32 is the maximum number of simultaneous pending operations to support.

char *ip = offset_addr("ib0_mlx5", atoi(argv[1]));
char *port = "9999";

DPUContext *ctx = DPU_Init(ip, port, 32);
if (!ctx)
{
    // Some error occurred
    return 1;
}

Then, you can call the DPU_MPI_Ialltoall function almost like MPI_Ialltoall.
Memory registration and transfers occur in this function, so there is no need to register any MRs before calling this.

int index = DPU_MPI_Ialltoall(
    ctx, sndbuf, 1, MPI_UINT32_T,
    rcvbuf, 1, MPI_UINT32_T, worldsize );
if (index < 0)
{
    fprintf(stderr, "Bad response.\n");
    return 1;
}
int cookie1 = get_cookie(ctx, index);

ctx is your DPU context created earlier,
sndbuf/rcvbuf are the buffers,
MPI_UINT32_T is your datatype, 1/1 are the send/receive counts in that order.
worldsize should be set to MPI_Comm_size(MPI_COMM_WORLD).
The return value is the index inside the queue that can be used to check your job status. Then you can get the cookie by calling get_cookie(). This cookie can be used to check the job_status just like an MPI_Request object.

In this library, there are a few ways to check the job status:
DPU_MPI_Wait, analogous to MPI_Wait, or
DPU_MPI_Test, which is analogous to MPI_Test,
as well as a specific function DPU_MPI_Poll / DPU_MPI_Longpoll to poll the InfiniBand CQ using ibv_poll_cq and ibv_req_notify_cq respectively.
For most use cases, simply use DPU_MPI_Wait with the cookie. If you would like to poll from time to time, use DPU_MPI_Test. The polling routines are called internally.

ret = DPU_MPI_Wait(ctx, cookie1);
if (ret)
{
    fprintf(stderr, "DPU Poll Failed\n");
    return 1;
}

Just like MPI_Wait this routine blocks until the cookie is completed. That is essentially all that is needed to convert MPI_Ialltoall to use the DPU. Finally, ensure you gracefully exit when done, terminating the server job.

DPU_Exit(ctx);

Server Side Options

There are some server-side options you can adjust.
LAZY_UNPINNING is on by default and only deregisters MRs when absolutely necessary, at the cost of increased memory usage.
MAX_QUEUE specifies the server’s maximum number of requests. This should match or exceed the number requested when calling DPU_Init from your client.

// Lazy MR unpinning
#define LAZY_UNPINNING 1
// Maximum number of simultaneous jobs
#define MAX_QUEUE 32
Clone this wiki locally