-
Notifications
You must be signed in to change notification settings - Fork 558
Efficiency
We test Euler's extreme performance based on Alibaba's search advertising data using the traditional GraphSAGE algorithm. For each node, we use about dozens of sparse features, and don't use any dense feature. We set the number of neighbor aggregation layers to 1 and the number of neighbor samples to 10. The graph for test contains about 200 million nodes and 4 billion edges. The machine learning framework used is TensorFlow.
To precisely test Euler's service capability, we adopt Heterogeneous deployment. In this way, Euler is deployed on separate machines and physically isolated from the processes of TensorFlow.
We test the serving capabilities of Euler with one, two, and five instance respectively. Each instance runs on a machine with Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz, 96 cores, 512G memory. All TF workers run independently on the same type of hardware. We use docker to allocate 16 core CPUs and 50GB of memory for each worker.
Under the three deployments, we modify the number of TF workers to test the QPS (samples trained per seconds). Below is the results.
number of TF worker | one machine | two machines | five machines |
---|---|---|---|
100 | 112w | 112w | 112w |
200 | 145w | 220w | 220w |
300 | 148w | 285w | 298w |
400 | 151w | 290w | 410w |
500 | 152w | 300w | 505w |
600 | 155w | 310w | 596w |
From the above results, we can see that:
- With Euler's machine resources increasing, our peak service capability can be linearly expanded.
- With the number of TF workers increasing, QPS expands linearly until it reaches the limit of the hardware resources used by Euler.
- Deploying Euler on 5 machines can drive 600 TF workers and provide 600W training QPS. The resources required to deploy Euler occupy only 5% of the total compute resources.