Merge branch 'master' into patch-11

anandhu-eng · Dec 18, 2024 · 233c1eb · 233c1eb
2 parents dbf1483 + 647f9f8
commit 233c1eb
Show file tree

Hide file tree

Showing 29 changed files with 467 additions and 942 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,3 +2,4 @@ loadgen/build/
 libmlperf_loadgen.a
 __pycache__/
 generated/
+*.swp
diff --git a/docs/benchmarks/graph/get-rgat-data.md b/docs/benchmarks/graph/get-rgat-data.md
@@ -0,0 +1,39 @@
+---
+hide:
+  - toc
+---
+
+# Graph Neural Network using R-GAT 
+
+## Dataset
+
+The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.
+
+=== "Full Dataset"
+    R-GAT validation run uses the IGBH dataset consisting of 547,306,935 nodes and 5,812,005,639 edges.
+
+    ### Get Full Dataset
+    ```
+    cm run script --tags=get,dataset,igbh,_full -j
+    ```
+
+=== "Debug Dataset"
+    R-GAT debug run uses the IGBH debug dataset(tiny).
+
+    ### Get Full Dataset
+    ```
+    cm run script --tags=get,dataset,igbh,_debug -j
+    ```
+
+## Model
+The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands.
+
+Get the Official MLPerf R-GAT Model
+
+=== "PyTorch"
+
+    ### PyTorch
+    ```
+    cm run script --tags=get,ml-model,rgat -j
+    ```
+
diff --git a/docs/benchmarks/graph/rgat.md b/docs/benchmarks/graph/rgat.md
@@ -0,0 +1,13 @@
+---
+hide:
+  - toc
+---
+
+
+# Graph Neural Network using R-GAT 
+
+
+=== "MLCommons-Python"
+    ## MLPerf Reference Implementation in Python
+
+{{ mlperf_inference_implementation_readme (4, "rgat", "reference", devices = ["CPU", "CUDA"]) }}
diff --git a/docs/index.md b/docs/index.md
@@ -1,7 +1,7 @@
 # MLPerf Inference Benchmarks
 
 ## Overview
-The currently valid [MLPerf Inference Benchmarks](index_gh.md) as of MLPerf inference v4.0 round are listed below, categorized by tasks. Under each model you can find its details like the dataset used, reference accuracy, server latency constraints etc.
+The currently valid [MLPerf Inference Benchmarks](index_gh.md) as of MLPerf inference v5.0 round are listed below, categorized by tasks. Under each model you can find its details like the dataset used, reference accuracy, server latency constraints etc.
 
 ---
 
@@ -80,7 +80,7 @@ The currently valid [MLPerf Inference Benchmarks](index_gh.md) as of MLPerf infe
 - **Server Scenario Latency Constraint**: 130ms
 - **Equal Issue mode**: False
 - **High accuracy variant**: yes
-- **Submission Category**: Datacenter, Edge
+- **Submission Category**: Edge
 
 #### [LLAMA2-70B](benchmarks/language/llama2-70b.md)
 - **Dataset**: OpenORCA (GPT-4 split, max_seq_len=1024)
@@ -157,11 +157,22 @@ The currently valid [MLPerf Inference Benchmarks](index_gh.md) as of MLPerf infe
 - **High accuracy variant**: Yes
 - **Submission Category**: Datacenter
 
+## Graph Neural Networks
+### [R-GAT](benchmarks/graph/rgat.md)
+- **Dataset**: Illinois Graph Benchmark Heterogeneous validation dataset
+    - **Dataset Size**: 788,379
+    - **QSL Size**: 788,379
+- **Number of Parameters**: 
+- **Reference Model Accuracy**: ACC = ?
+- **Server Scenario Latency Constraint**: N/A
+- **Equal Issue mode**: True
+- **High accuracy variant**: No
+- **Submission Category**: Datacenter
 ---
 
 ## Submission Categories
-- **Datacenter Category**: All the current inference benchmarks are applicable to the datacenter category.
-- **Edge Category**: All benchmarks except DLRMv2, LLAMA2-70B, and Mixtral-8x7B are applicable to the edge category.
+- **Datacenter Category**: All benchmarks except bert are applicable to the datacenter category for inference v5.0.
+- **Edge Category**: All benchmarks except DLRMv2, LLAMA2-70B, Mixtral-8x7B and R-GAT are applicable to the edge category for v5.0.
 
 ## High Accuracy Variants
 - **Benchmarks**: `bert`, `llama2-70b`, `gpt-j`,  `dlrm_v2`, and `3d-unet` have a normal accuracy variant as well as a high accuracy variant.

diff --git a/docs/submission/index.md b/docs/submission/index.md
@@ -13,13 +13,15 @@ hide:
 
 Click [here](https://youtu.be/eI1Hoecc3ho) to view the recording of the workshop: Streamlining your MLPerf Inference results using CM.
 
-=== "CM based benchmark"
+Click [here](https://docs.google.com/presentation/d/1cmbpZUpVr78EIrhzyMBnnWnjJrD-mZ2vmSb-yETkTA8/edit?usp=sharing) to view the prposal slide for Common Automation for MLPerf Inference Submission Generation through CM.
+
+=== "CM based results"
     If you have followed the `cm run` commands under the individual model pages in the [benchmarks](../index.md) directory, all the valid results will get aggregated to the `cm cache` folder. The following command could be used to browse the structure of inference results folder generated by CM.
     ### Get results folder structure
     ```bash
     cm find cache --tags=get,mlperf,inference,results,dir | xargs tree
     ```
-=== "Non CM based benchmark"
+=== "Non CM based results"
     If you have not followed the `cm run` commands under the individual model pages in the [benchmarks](../index.md) directory, please make sure that the result directory is structured in the following way. 
     ```
     └── System description ID(SUT Name)
@@ -35,18 +37,20 @@ Click [here](https://youtu.be/eI1Hoecc3ho) to view the recording of the workshop
                 |   ├── mlperf_log_detail.txt
                 |   ├── mlperf_log_accuracy.json
                 |   └── accuracy.txt
-                └── Compliance_Test_ID
-                    ├── Performance
-                    |   └── run_x/#1 run for all scenarios
-                    |       ├── mlperf_log_summary.txt
-                    |       └── mlperf_log_detail.txt
-                    ├── Accuracy
-                    |   ├── baseline_accuracy.txt
-                    |   ├── compliance_accuracy.txt
-                    |   ├── mlperf_log_accuracy.json
-                    |   └── accuracy.txt
-                    ├── verify_performance.txt
-                    └── verify_accuracy.txt #for TEST01 only
+                |── Compliance_Test_ID
+                |   ├── Performance
+                |   |   └── run_x/#1 run for all scenarios
+                |   |       ├── mlperf_log_summary.txt
+                |   |       └── mlperf_log_detail.txt
+                |   ├── Accuracy
+                |   |   ├── baseline_accuracy.txt
+                |   |   ├── compliance_accuracy.txt
+                |   |   ├── mlperf_log_accuracy.json
+                |   |   └── accuracy.txt
+                |   ├── verify_performance.txt
+                |   └── verify_accuracy.txt #for TEST01 only
+                |── user.conf
+                └── measurements.json
     ```
 
     <details>
@@ -67,67 +71,69 @@ Once all the results across all the models are ready you can use the following c
 
 ## Generate actual submission tree
 
-=== "Closed Edge"
-    ### Closed Edge Submission
-    ```bash
-    cm run script --tags=generate,inference,submission \
-       --clean \
-       --preprocess_submission=yes \
-       --run-checker \
-       --submitter=MLCommons \
-       --tar=yes \
-       --env.CM_TAR_OUTFILE=submission.tar.gz \
-       --division=closed \
-       --category=edge \
-       --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
-       --quiet
-    ```
-
-=== "Closed Datacenter"
-    ### Closed Datacenter Submission
-    ```bash
-    cm run script --tags=generate,inference,submission \
-       --clean \
-       --preprocess_submission=yes \
-       --run-checker \
-       --submitter=MLCommons \
-       --tar=yes \
-       --env.CM_TAR_OUTFILE=submission.tar.gz \
-       --division=closed \
-       --category=datacenter \
-       --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
-       --quiet
-    ```
-=== "Open Edge"
-    ### Open Edge Submission
-    ```bash
-    cm run script --tags=generate,inference,submission \
-       --clean \
-       --preprocess_submission=yes \
-       --run-checker \
-       --submitter=MLCommons \
-       --tar=yes \
-       --env.CM_TAR_OUTFILE=submission.tar.gz \
-       --division=open \
-       --category=edge \
-       --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
-       --quiet
-    ```
-=== "Open Datacenter"
-    ### Closed Datacenter Submission
-    ```bash
-    cm run script --tags=generate,inference,submission \
-       --clean \
-       --preprocess_submission=yes \
-       --run-checker \
-       --submitter=MLCommons \
-       --tar=yes \
-       --env.CM_TAR_OUTFILE=submission.tar.gz \
-       --division=open \
-       --category=datacenter \
-       --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
-       --quiet
-    ```
+=== "Docker run"
+    ### Docker run
+    === "Closed"
+        ### Closed Submission
+        ```bash
+        cm docker script --tags=generate,inference,submission \
+            --clean \
+            --preprocess_submission=yes \
+            --run-checker \
+            --submitter=MLCommons \
+            --tar=yes \
+            --env.CM_TAR_OUTFILE=submission.tar.gz \
+            --division=closed \
+            --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
+            --quiet
+        ```
+
+    === "Open"
+        ### Open Submission
+        ```bash
+        cm docker script --tags=generate,inference,submission \
+            --clean \
+            --preprocess_submission=yes \
+            --run-checker \
+            --submitter=MLCommons \
+            --tar=yes \
+            --env.CM_TAR_OUTFILE=submission.tar.gz \
+            --division=open \
+            --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
+            --quiet
+        ```
+
+=== "Native run"
+    ### Native run
+    === "Closed"
+        ### Closed Submission
+        ```bash
+        cm run script --tags=generate,inference,submission \
+            --clean \
+            --preprocess_submission=yes \
+            --run-checker \
+            --submitter=MLCommons \
+            --tar=yes \
+            --env.CM_TAR_OUTFILE=submission.tar.gz \
+            --division=closed \
+            --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
+            --quiet
+        ```
+
+    === "Open"
+        ### Open Submission
+        ```bash
+        cm run script --tags=generate,inference,submission \
+            --clean \
+            --preprocess_submission=yes \
+            --run-checker \
+            --submitter=MLCommons \
+            --tar=yes \
+            --env.CM_TAR_OUTFILE=submission.tar.gz \
+            --division=open \
+            --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
+            --quiet
+        ```
 
 * Use `--hw_name="My system name"` to give a meaningful system name. Examples can be seen [here](https://github.com/mlcommons/inference_results_v3.0/tree/main/open/cTuning/systems)
 
@@ -137,6 +143,10 @@ Once all the results across all the models are ready you can use the following c
 
 * Use `--results_dir` option to specify the results folder for Non CM based benchmarks
 
+* Use `--category` option to specify the category for which submission is generated(datacenter/edge). By default, the category is taken from `system_meta.json` file located in the SUT root directory.
+
+* Use `--submission_base_dir` to specify the directory to which outputs from preprocess submission script and final submission is to be dumped. No need to provide `--submission_dir` along with this. For `docker run`, use `--submission_base_dir` instead of `--submission_dir`.
+
 The above command should generate "submission.tar.gz" if there are no submission checker issues and you can upload it to the [MLCommons Submission UI](https://submissions-ui.mlcommons.org/submission).
 
 ## Aggregate Results in GitHub

diff --git a/docs/system_requirements.yml b/docs/system_requirements.yml
@@ -0,0 +1,50 @@
+# All memory requirements in GB
+resnet:
+  reference:
+    fp32:
+      system_memory: 8
+      accelerator_memory: 4
+      disk_storage: 25
+  nvidia:
+    int8:
+      system_memory: 8
+      accelerator_memory: 4
+      disk_storage: 100
+  intel:
+    int8:
+      system_memory: 8
+      accelerator_memory: 0
+      disk_storage: 50
+  qualcomm:
+    int8:
+      system_memory: 8
+      accelerator_memory: 8
+      disk_storage: 50
+retinanet:
+  reference:
+    fp32:
+      system_memory: 8
+      accelerator_memory: 8
+      disk_storage: 200
+  nvidia:
+    int8:
+      system_memory: 8
+      accelerator_memory: 8
+      disk_storage: 200
+  intel:
+    int8:
+      system_memory: 8
+      accelerator_memory: 0
+      disk_storage: 200
+  qualcomm:
+    int8:
+      system_memory: 8
+      accelerator_memory: 8
+      disk_storage: 200
+rgat:
+  reference:
+    fp32:
+      system_memory: 768
+      accelerator_memory: 8
+      disk_storage: 2300
+
diff --git a/graph/R-GAT/README.md b/graph/R-GAT/README.md
@@ -232,9 +232,12 @@ docker build . -f dockerfile.gpu -t rgat-gpu
 ```
 Run docker container:
 ```bash
-docker run --rm -it -v $(pwd):/root --gpus all rgat-gpu
+docker run --rm -it -v $(pwd):/workspace/root --gpus all rgat-gpu
 ```
-Run benchmark inside the docker container:
+Go inside the root folder and run benchmark inside the docker container:
 ```bash
+cd root
 python3 main.py --dataset igbh-dgl --dataset-path igbh/ --profile rgat-dgl-full --device gpu [--model-path <path_to_ckpt>] [--in-memory] [--dtype <fp16 or fp32>] [--scenario <SingleStream, MultiStream, Server or Offline>]
 ```
+
+**NOTE:** For official submissions, this benchmark is required to run in equal issue mode. Please make sure that the flag `rgat.*.sample_concatenate_permutation` is set to one in the [mlperf.conf](../../loadgen/mlperf.conf) file when loadgen is built.
diff --git a/graph/R-GAT/dockerfile.gpu b/graph/R-GAT/dockerfile.gpu
@@ -26,6 +26,8 @@ RUN apt install -y --no-install-recommends rsync
 # Upgrade pip
 RUN python3 -m pip install --upgrade pip
 
+RUN pip install torch-geometric torch-scatter torch-sparse -f https://pytorch-geometric.com/whl/torch-2.1.0+cu121.html
+RUN pip install  dgl -f https://data.dgl.ai/wheels/torch-2.1/cu121/repo.html
 
 COPY requirements.txt requirements.txt
 RUN pip install -r requirements.txt
@@ -35,10 +37,6 @@ RUN cd /tmp && \
     pip install pybind11 && \
     CFLAGS="-std=c++14" python3 setup.py install
 
-RUN export TORCH_VERSION=$(python -c "import torch; print(torch.__version__)")
-RUN pip install torch-geometric torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-${TORCH_VERSION}.html
-RUN pip install  dgl -f https://data.dgl.ai/wheels/torch-2.1/cu121/repo.html
-
 # Clean up
 RUN rm -rf mlperf \
     rm requirements.txt