From 25c7d322d0547e778216ed94ea301f2da60a39c9 Mon Sep 17 00:00:00 2001
From: Arjun Suresh <arjun@gateoverflow.com>
Date: Mon, 4 Nov 2024 15:46:28 +0000
Subject: [PATCH 1/8] Create new_benchmark_checklist.md

---
 new_benchmark_checklist.md | 69 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)
 create mode 100644 new_benchmark_checklist.md

diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md
new file mode 100644
index 0000000..7848e34
--- /dev/null
+++ b/new_benchmark_checklist.md
@@ -0,0 +1,69 @@
+# MLPerf Inference New Benchmark Checklist Documentation
+
+This document provides guidelines and requirements for setting up and validating a new MLPerf Inference benchmark implementation.
+
+---
+
+## 1. Applicable Categories
+Specify whether the benchmark applies to:
+- **Edge**
+- **Datacenter**
+- **Both**
+
+## 2. Applicable Scenarios for Each Category
+List each scenario applicable to the selected categories. Examples of scenarios include:
+- **Single-stream**
+- **Multi-stream**
+- **Server**
+- **Offline**
+
+## 3. Applicable Compliance Tests
+Identify the compliance tests required for each applicable category and scenario. Note:
+- **TEST04** is **not applicable** if processing times vary significantly for different inputs.
+
+## 4. Latency Threshold for Server Scenarios
+If the **Server** scenario is applicable:
+- Document the latency threshold. For example, **99% of the samples must be processed within the specified latency threshold**.
+
+## 5. Validation Dataset: Unique Samples
+Specify the number of **unique samples** in the validation dataset. Note that:
+- These samples will be repeated as necessary to meet the minimum required duration for the inference run.
+
+## 6. Equal Issue Mode Applicability
+Document whether **Equal Issue Mode** is applicable:
+- This is relevant if the time required to process a sample is not consistent across all inputs.
+
+## 7. Expected `accuracy.txt` Contents
+Detail the expected contents of the `accuracy.txt` file after running the reference accuracy script. This file should reflect the accuracy performance based on the validation dataset and reference model.
+
+## 8. Reference Implementation Dataset Coverage
+Ensure the reference implementation:
+- Processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs.
+
+## 9. Test Runs with Smaller Input Sets
+Verify that the reference implementation can perform test runs with a smaller subset of inputs for **performance** and **accuracy** runs.
+
+## 10. Dataset and Reference Model Instructions
+Provide clear instructions on:
+- **Downloading** the dataset and reference model.
+- **Using** the dataset and model for the benchmark.
+
+## 11. CPU-Only and Minimum GPU Requirements
+Document:
+- Whether the reference implementation can run on **CPUs only**.
+- The **minimum number** of GPUs and **required memory** if GPU usage is necessary.
+
+## 12. System Memory and Storage Requirements
+Specify the minimum system requirements to run the reference implementation:
+- **System RAM**: Units of 256 GB RAM.
+- **Storage**: Units of 500 GB storage.
+
+## 13. Submission Checker Modifications
+Ensure all necessary changes are made to the **submission checker** to validate the benchmark correctly.
+
+## 14. Sample Log Files
+Include sample logs for all applicable scenario runs:
+- `mlperf_log_summary.txt`
+- `mlperf_log_detail.txt`
+  
+These files should successfully pass the submission checker and represent a compliant run.

From fad17be089a9779623f0c813fe53e1ff52e1f42f Mon Sep 17 00:00:00 2001
From: Arjun Suresh <arjun@gateoverflow.com>
Date: Tue, 5 Nov 2024 15:21:07 +0000
Subject: [PATCH 2/8] Update new_benchmark_checklist.md

---
 new_benchmark_checklist.md | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md
index 7848e34..ce19028 100644
--- a/new_benchmark_checklist.md
+++ b/new_benchmark_checklist.md
@@ -23,45 +23,48 @@ Identify the compliance tests required for each applicable category and scenario
 
 ## 4. Latency Threshold for Server Scenarios
 If the **Server** scenario is applicable:
-- Document the latency threshold. For example, **99% of the samples must be processed within the specified latency threshold**.
+- Document the latency threshold. **(99% of the samples must be processed within the specified latency threshold)**.
 
 ## 5. Validation Dataset: Unique Samples
-Specify the number of **unique samples** in the validation dataset. Note that:
-- These samples will be repeated as necessary to meet the minimum required duration for the inference run.
+Specify the number of **unique samples** in the validation dataset and the **QSL size**. Note that:
+- The unique samples will be repeated as necessary to meet the minimum required duration for the inference run.
+- QSL size determines the number of inputs which are loaded to the memory at a time - typically large enough to overflow the system cache. 
 
 ## 6. Equal Issue Mode Applicability
 Document whether **Equal Issue Mode** is applicable:
 - This is relevant if the time required to process a sample is not consistent across all inputs.
 
-## 7. Expected `accuracy.txt` Contents
-Detail the expected contents of the `accuracy.txt` file after running the reference accuracy script. This file should reflect the accuracy performance based on the validation dataset and reference model.
+## 7. Expected accuracy and `accuracy.txt` Contents
+Detail the expected contents of the `accuracy.txt` file after running the reference accuracy script. This file should reflect the accuracy performance based on the validation dataset and reference model. 
 
-## 8. Reference Implementation Dataset Coverage
-Ensure the reference implementation:
-- Processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs.
+## 8. Reference Model details
+Number of Parameters of the model, FLOPs and the data type used for determining the reference accuracy. **For example, Number of Parameters: 25.6 million, FLOPs: 3.8 billion, Datatype: fp16**
 
-## 9. Test Runs with Smaller Input Sets
+## 9. Reference Implementation Dataset Coverage
+Ensure the reference implementation can successfully processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs and generate valid log files.
+
+## 10. Test Runs with Smaller Input Sets
 Verify that the reference implementation can perform test runs with a smaller subset of inputs for **performance** and **accuracy** runs.
 
-## 10. Dataset and Reference Model Instructions
+## 11. Dataset and Reference Model Instructions
 Provide clear instructions on:
 - **Downloading** the dataset and reference model.
 - **Using** the dataset and model for the benchmark.
 
-## 11. CPU-Only and Minimum GPU Requirements
+## 12. CPU-Only and Minimum GPU Requirements
 Document:
 - Whether the reference implementation can run on **CPUs only**.
 - The **minimum number** of GPUs and **required memory** if GPU usage is necessary.
 
-## 12. System Memory and Storage Requirements
+## 13. System Memory and Storage Requirements
 Specify the minimum system requirements to run the reference implementation:
 - **System RAM**: Units of 256 GB RAM.
 - **Storage**: Units of 500 GB storage.
 
-## 13. Submission Checker Modifications
+## 14. Submission Checker Modifications
 Ensure all necessary changes are made to the **submission checker** to validate the benchmark correctly.
 
-## 14. Sample Log Files
+## 15. Sample Log Files
 Include sample logs for all applicable scenario runs:
 - `mlperf_log_summary.txt`
 - `mlperf_log_detail.txt`

From df8907e2494eef8439b356d02683dc9d963dbe87 Mon Sep 17 00:00:00 2001
From: Arjun Suresh <arjun@gateoverflow.com>
Date: Tue, 10 Dec 2024 13:48:20 +0000
Subject: [PATCH 3/8] Update new_benchmark_checklist.md

---
 new_benchmark_checklist.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md
index ce19028..601c9e9 100644
--- a/new_benchmark_checklist.md
+++ b/new_benchmark_checklist.md
@@ -51,13 +51,13 @@ Provide clear instructions on:
 - **Downloading** the dataset and reference model.
 - **Using** the dataset and model for the benchmark.
 
-## 12. CPU-Only and Minimum GPU Requirements
+## 12. CPU-Only and Recommended GPU Requirements
 Document:
 - Whether the reference implementation can run on **CPUs only**.
 - The **minimum number** of GPUs and **required memory** if GPU usage is necessary.
 
 ## 13. System Memory and Storage Requirements
-Specify the minimum system requirements to run the reference implementation:
+Specify the recommended system requirements to run the reference implementation:
 - **System RAM**: Units of 256 GB RAM.
 - **Storage**: Units of 500 GB storage.
 

From 827d7062a531ff2734065a94c1f1427991db6e42 Mon Sep 17 00:00:00 2001
From: Arjun Suresh <arjun@gateoverflow.com>
Date: Mon, 16 Dec 2024 05:00:04 +0000
Subject: [PATCH 4/8] Update new_benchmark_checklist.md

---
 new_benchmark_checklist.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md
index 601c9e9..2584c40 100644
--- a/new_benchmark_checklist.md
+++ b/new_benchmark_checklist.md
@@ -26,19 +26,19 @@ If the **Server** scenario is applicable:
 - Document the latency threshold. **(99% of the samples must be processed within the specified latency threshold)**.
 
 ## 5. Validation Dataset: Unique Samples
-Specify the number of **unique samples** in the validation dataset and the **QSL size**. Note that:
+Specify the number of **unique samples** in the validation dataset and the **QSL size** in the [inference policies benchmark section](https://github.com/mlcommons/inference_policies/blob/master/inference_rules.adoc#41-benchmarks) and also in the [mlperf.conf](https://github.com/mlcommons/inference/blob/master/loadgen/mlperf.conf). Note that:
 - The unique samples will be repeated as necessary to meet the minimum required duration for the inference run.
 - QSL size determines the number of inputs which are loaded to the memory at a time - typically large enough to overflow the system cache. 
 
 ## 6. Equal Issue Mode Applicability
-Document whether **Equal Issue Mode** is applicable:
+- Documented whether **Equal Issue Mode** is applicable in the [mlperf.conf](https://github.com/mlcommons/inference/blob/master/loadgen/mlperf.conf#L42)
 - This is relevant if the time required to process a sample is not consistent across all inputs.
 
 ## 7. Expected accuracy and `accuracy.txt` Contents
 Detail the expected contents of the `accuracy.txt` file after running the reference accuracy script. This file should reflect the accuracy performance based on the validation dataset and reference model. 
 
 ## 8. Reference Model details
-Number of Parameters of the model, FLOPs and the data type used for determining the reference accuracy. **For example, Number of Parameters: 25.6 million, FLOPs: 3.8 billion, Datatype: fp16**
+Document the details of the reference model like number of parameters, FLOPs and the data type in the [docs page](https://github.com/mlcommons/inference/blob/docs/docs/index.md). **For example, Number of Parameters: 25.6 million, FLOPs: 3.8 billion, Datatype: fp16**
 
 ## 9. Reference Implementation Dataset Coverage
 Ensure the reference implementation can successfully processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs and generate valid log files.

From 11fe2cc1e63ffad11a1b816b2fb8ed5927c3d54e Mon Sep 17 00:00:00 2001
From: Arjun Suresh <arjun@gateoverflow.com>
Date: Mon, 16 Dec 2024 05:47:42 +0000
Subject: [PATCH 5/8] Update new_benchmark_checklist.md

---
 new_benchmark_checklist.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md
index 2584c40..d78cf98 100644
--- a/new_benchmark_checklist.md
+++ b/new_benchmark_checklist.md
@@ -41,7 +41,7 @@ Detail the expected contents of the `accuracy.txt` file after running the refere
 Document the details of the reference model like number of parameters, FLOPs and the data type in the [docs page](https://github.com/mlcommons/inference/blob/docs/docs/index.md). **For example, Number of Parameters: 25.6 million, FLOPs: 3.8 billion, Datatype: fp16**
 
 ## 9. Reference Implementation Dataset Coverage
-Ensure the reference implementation can successfully processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs and generate valid log files.
+Ensure the reference implementation can successfully processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs and generate valid log files passing the submission checker.
 
 ## 10. Test Runs with Smaller Input Sets
 Verify that the reference implementation can perform test runs with a smaller subset of inputs for **performance** and **accuracy** runs.

From 78d635c4a2bde67bf153f13fdb9b272bde8b4dad Mon Sep 17 00:00:00 2001
From: Arjun Suresh <arjun@gateoverflow.com>
Date: Mon, 16 Dec 2024 05:53:20 +0000
Subject: [PATCH 6/8] Update new_benchmark_checklist.md

---
 new_benchmark_checklist.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md
index d78cf98..2d72e12 100644
--- a/new_benchmark_checklist.md
+++ b/new_benchmark_checklist.md
@@ -58,8 +58,8 @@ Document:
 
 ## 13. System Memory and Storage Requirements
 Specify the recommended system requirements to run the reference implementation:
-- **System RAM**: Units of 256 GB RAM.
-- **Storage**: Units of 500 GB storage.
+- **System RAM**: System RAM required to run the reference implementation.
+- **Storage**: Secondary storage required to run the reference implementation.
 
 ## 14. Submission Checker Modifications
 Ensure all necessary changes are made to the **submission checker** to validate the benchmark correctly.

From 1d302d2809e8caa4ad69780cf7c072a06a4a5eab Mon Sep 17 00:00:00 2001
From: Arjun Suresh <arjun@gateoverflow.com>
Date: Mon, 16 Dec 2024 06:30:16 +0000
Subject: [PATCH 7/8] Update new_benchmark_checklist.md

---
 new_benchmark_checklist.md | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md
index 2d72e12..1e31bfe 100644
--- a/new_benchmark_checklist.md
+++ b/new_benchmark_checklist.md
@@ -51,20 +51,17 @@ Provide clear instructions on:
 - **Downloading** the dataset and reference model.
 - **Using** the dataset and model for the benchmark.
 
-## 12. CPU-Only and Recommended GPU Requirements
+## 12. Recommended System Requirements to run the reference implementation
 Document:
 - Whether the reference implementation can run on **CPUs only**.
 - The **minimum number** of GPUs and **required memory** if GPU usage is necessary.
-
-## 13. System Memory and Storage Requirements
-Specify the recommended system requirements to run the reference implementation:
-- **System RAM**: System RAM required to run the reference implementation.
+- **System RAM**: System memory required to run the reference implementation.
 - **Storage**: Secondary storage required to run the reference implementation.
 
-## 14. Submission Checker Modifications
+## 13. Submission Checker Modifications
 Ensure all necessary changes are made to the **submission checker** to validate the benchmark correctly.
 
-## 15. Sample Log Files
+## 14. Sample Log Files
 Include sample logs for all applicable scenario runs:
 - `mlperf_log_summary.txt`
 - `mlperf_log_detail.txt`

From fbfb42ac6dfca36bc6bb16d56da4fd21987ec1d9 Mon Sep 17 00:00:00 2001
From: Arjun Suresh <arjun@gateoverflow.com>
Date: Mon, 16 Dec 2024 06:32:15 +0000
Subject: [PATCH 8/8] Update new_benchmark_checklist.md

---
 new_benchmark_checklist.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md
index 1e31bfe..67f33ca 100644
--- a/new_benchmark_checklist.md
+++ b/new_benchmark_checklist.md
@@ -54,7 +54,7 @@ Provide clear instructions on:
 ## 12. Recommended System Requirements to run the reference implementation
 Document:
 - Whether the reference implementation can run on **CPUs only**.
-- The **minimum number** of GPUs and **required memory** if GPU usage is necessary.
+- The **required GPU memory** if GPU usage is necessary.
 - **System RAM**: System memory required to run the reference implementation.
 - **Storage**: Secondary storage required to run the reference implementation.