Feature/multiprocess #19

jishminor · 2023-05-30T19:02:13Z

This is a large overhaul of the backend, changing the implementation to be all contained within the tritonserver process, to splitting out and managing a child process per tflite model instance. This PR closes #5 as there will be a copy of the ACL scheduler singelton instance per model now.

Tensorpipe is used to manage the transport of the input tensor data to the respective model instance processes, and as it's written, only one memcpy into a shared memory channel is needed to do this, meaning the overhead is minimal.

Signed-off-by: Josh Minor <[email protected]>

jishminor · 2023-05-31T14:19:52Z

One issue with tensorpipe shared mem channel is that both sides of the pipe busy wait, and repeatedly call sched_yield as mentioned here. The flame graph for one side of the pipe where the TP_SHM_Reactor is eating an entire core busy waiting looks like this:

Signed-off-by: Josh Minor <[email protected]>

jishminor added 5 commits May 25, 2023 17:22

Working multiprocess solution

d17efa7

Signed-off-by: Josh Minor <[email protected]>

Support papi op profiling per model

31384e5

Signed-off-by: Josh Minor <[email protected]>

Use cma channel first, and improve error handling

5ac8615

Signed-off-by: Josh Minor <[email protected]>

Make parent process handle the connection listen

6344cfd

Signed-off-by: Josh Minor <[email protected]>

Fix build.yml

53adc1a

Signed-off-by: Josh Minor <[email protected]>

jishminor added 19 commits May 31, 2023 10:10

Fix cleanup of child process

64f6c81

Signed-off-by: Josh Minor <[email protected]>

Handle model execution failure correctly

81eb50a

Signed-off-by: Josh Minor <[email protected]>

Improve error handling

c1c81fd

Signed-off-by: Josh Minor <[email protected]>

Fix issue with MMap input tensors in models

a94b986

Signed-off-by: Josh Minor <[email protected]>

Improve efficiency of model instance message passing

7a6ab0c

Signed-off-by: Josh Minor <[email protected]>

Fix bug in model instance tensor allocation

c7c5e59

Signed-off-by: Josh Minor <[email protected]>

Use papi low level api

20eeff4

Signed-off-by: Josh Minor <[email protected]>

Fix csv generation script

35fb954

Signed-off-by: Josh Minor <[email protected]>

Append utc time to csv file name

06ec744

Signed-off-by: Josh Minor <[email protected]>

Wait for model load message before claiming model is ready

0c87cbb

Signed-off-by: Josh Minor <[email protected]>

Don't worry about papi hl

53acc26

Signed-off-by: Josh Minor <[email protected]>

Add support for model NUMA policies

8b0e28e

Signed-off-by: Josh Minor <[email protected]>

Fix non numa build

adf2c0b

Signed-off-by: Josh Minor <[email protected]>

Add time to profiling data by default

40c32d1

Signed-off-by: Josh Minor <[email protected]>

Explicitly call terminate on child process

c81b65c

Signed-off-by: Josh Minor <[email protected]>

Put timeout on cleanup

a939d81

Signed-off-by: Josh Minor <[email protected]>

Fix exit handling in child process

d90d739

Signed-off-by: Josh Minor <[email protected]>

Add support for uncore papi events

c08cd3b

Signed-off-by: Josh Minor <[email protected]>

Simplify perf counter infra

7bfade4

Signed-off-by: Josh Minor <[email protected]>

jishminor force-pushed the feature/multiprocess branch 2 times, most recently from 631e840 to 9efcdfe Compare July 28, 2023 21:19

Add sample id to csv file, and fix csv gen

0ba5134

Signed-off-by: Josh Minor <[email protected]>

jishminor force-pushed the feature/multiprocess branch from 9efcdfe to 0ba5134 Compare August 2, 2023 19:12

Only keep one copy of op timinigs in csv

9737271

Signed-off-by: Josh Minor <[email protected]>

jishminor added 7 commits August 17, 2023 13:56

Add function to get list of avail cpus per socket

9096e59

Signed-off-by: Josh Minor <[email protected]>

Fix bug in validating arbitrary batch sizes

3caac11

Signed-off-by: Josh Minor <[email protected]>

Implement thread pinning feature

aba3fa3

Signed-off-by: Josh Minor <[email protected]>

Give back threads to avail cpus

7367f6b

Signed-off-by: Josh Minor <[email protected]>

Fix issue with model unloading

0b47640

Signed-off-by: Josh Minor <[email protected]>

Add flag to control thread pinning

5c7ff58

Signed-off-by: Josh Minor <[email protected]>

Fix thread pinning strat

6e875e4

Signed-off-by: Josh Minor <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/multiprocess #19

Feature/multiprocess #19

jishminor commented May 30, 2023

jishminor commented May 31, 2023

Feature/multiprocess #19

Are you sure you want to change the base?

Feature/multiprocess #19

Conversation

jishminor commented May 30, 2023

jishminor commented May 31, 2023