From 25c7d322d0547e778216ed94ea301f2da60a39c9 Mon Sep 17 00:00:00 2001 From: Arjun Suresh Date: Mon, 4 Nov 2024 15:46:28 +0000 Subject: [PATCH 1/8] Create new_benchmark_checklist.md --- new_benchmark_checklist.md | 69 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) create mode 100644 new_benchmark_checklist.md diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md new file mode 100644 index 0000000..7848e34 --- /dev/null +++ b/new_benchmark_checklist.md @@ -0,0 +1,69 @@ +# MLPerf Inference New Benchmark Checklist Documentation + +This document provides guidelines and requirements for setting up and validating a new MLPerf Inference benchmark implementation. + +--- + +## 1. Applicable Categories +Specify whether the benchmark applies to: +- **Edge** +- **Datacenter** +- **Both** + +## 2. Applicable Scenarios for Each Category +List each scenario applicable to the selected categories. Examples of scenarios include: +- **Single-stream** +- **Multi-stream** +- **Server** +- **Offline** + +## 3. Applicable Compliance Tests +Identify the compliance tests required for each applicable category and scenario. Note: +- **TEST04** is **not applicable** if processing times vary significantly for different inputs. + +## 4. Latency Threshold for Server Scenarios +If the **Server** scenario is applicable: +- Document the latency threshold. For example, **99% of the samples must be processed within the specified latency threshold**. + +## 5. Validation Dataset: Unique Samples +Specify the number of **unique samples** in the validation dataset. Note that: +- These samples will be repeated as necessary to meet the minimum required duration for the inference run. + +## 6. Equal Issue Mode Applicability +Document whether **Equal Issue Mode** is applicable: +- This is relevant if the time required to process a sample is not consistent across all inputs. + +## 7. Expected `accuracy.txt` Contents +Detail the expected contents of the `accuracy.txt` file after running the reference accuracy script. This file should reflect the accuracy performance based on the validation dataset and reference model. + +## 8. Reference Implementation Dataset Coverage +Ensure the reference implementation: +- Processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs. + +## 9. Test Runs with Smaller Input Sets +Verify that the reference implementation can perform test runs with a smaller subset of inputs for **performance** and **accuracy** runs. + +## 10. Dataset and Reference Model Instructions +Provide clear instructions on: +- **Downloading** the dataset and reference model. +- **Using** the dataset and model for the benchmark. + +## 11. CPU-Only and Minimum GPU Requirements +Document: +- Whether the reference implementation can run on **CPUs only**. +- The **minimum number** of GPUs and **required memory** if GPU usage is necessary. + +## 12. System Memory and Storage Requirements +Specify the minimum system requirements to run the reference implementation: +- **System RAM**: Units of 256 GB RAM. +- **Storage**: Units of 500 GB storage. + +## 13. Submission Checker Modifications +Ensure all necessary changes are made to the **submission checker** to validate the benchmark correctly. + +## 14. Sample Log Files +Include sample logs for all applicable scenario runs: +- `mlperf_log_summary.txt` +- `mlperf_log_detail.txt` + +These files should successfully pass the submission checker and represent a compliant run. From fad17be089a9779623f0c813fe53e1ff52e1f42f Mon Sep 17 00:00:00 2001 From: Arjun Suresh Date: Tue, 5 Nov 2024 15:21:07 +0000 Subject: [PATCH 2/8] Update new_benchmark_checklist.md --- new_benchmark_checklist.md | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md index 7848e34..ce19028 100644 --- a/new_benchmark_checklist.md +++ b/new_benchmark_checklist.md @@ -23,45 +23,48 @@ Identify the compliance tests required for each applicable category and scenario ## 4. Latency Threshold for Server Scenarios If the **Server** scenario is applicable: -- Document the latency threshold. For example, **99% of the samples must be processed within the specified latency threshold**. +- Document the latency threshold. **(99% of the samples must be processed within the specified latency threshold)**. ## 5. Validation Dataset: Unique Samples -Specify the number of **unique samples** in the validation dataset. Note that: -- These samples will be repeated as necessary to meet the minimum required duration for the inference run. +Specify the number of **unique samples** in the validation dataset and the **QSL size**. Note that: +- The unique samples will be repeated as necessary to meet the minimum required duration for the inference run. +- QSL size determines the number of inputs which are loaded to the memory at a time - typically large enough to overflow the system cache. ## 6. Equal Issue Mode Applicability Document whether **Equal Issue Mode** is applicable: - This is relevant if the time required to process a sample is not consistent across all inputs. -## 7. Expected `accuracy.txt` Contents -Detail the expected contents of the `accuracy.txt` file after running the reference accuracy script. This file should reflect the accuracy performance based on the validation dataset and reference model. +## 7. Expected accuracy and `accuracy.txt` Contents +Detail the expected contents of the `accuracy.txt` file after running the reference accuracy script. This file should reflect the accuracy performance based on the validation dataset and reference model. -## 8. Reference Implementation Dataset Coverage -Ensure the reference implementation: -- Processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs. +## 8. Reference Model details +Number of Parameters of the model, FLOPs and the data type used for determining the reference accuracy. **For example, Number of Parameters: 25.6 million, FLOPs: 3.8 billion, Datatype: fp16** -## 9. Test Runs with Smaller Input Sets +## 9. Reference Implementation Dataset Coverage +Ensure the reference implementation can successfully processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs and generate valid log files. + +## 10. Test Runs with Smaller Input Sets Verify that the reference implementation can perform test runs with a smaller subset of inputs for **performance** and **accuracy** runs. -## 10. Dataset and Reference Model Instructions +## 11. Dataset and Reference Model Instructions Provide clear instructions on: - **Downloading** the dataset and reference model. - **Using** the dataset and model for the benchmark. -## 11. CPU-Only and Minimum GPU Requirements +## 12. CPU-Only and Minimum GPU Requirements Document: - Whether the reference implementation can run on **CPUs only**. - The **minimum number** of GPUs and **required memory** if GPU usage is necessary. -## 12. System Memory and Storage Requirements +## 13. System Memory and Storage Requirements Specify the minimum system requirements to run the reference implementation: - **System RAM**: Units of 256 GB RAM. - **Storage**: Units of 500 GB storage. -## 13. Submission Checker Modifications +## 14. Submission Checker Modifications Ensure all necessary changes are made to the **submission checker** to validate the benchmark correctly. -## 14. Sample Log Files +## 15. Sample Log Files Include sample logs for all applicable scenario runs: - `mlperf_log_summary.txt` - `mlperf_log_detail.txt` From df8907e2494eef8439b356d02683dc9d963dbe87 Mon Sep 17 00:00:00 2001 From: Arjun Suresh Date: Tue, 10 Dec 2024 13:48:20 +0000 Subject: [PATCH 3/8] Update new_benchmark_checklist.md --- new_benchmark_checklist.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md index ce19028..601c9e9 100644 --- a/new_benchmark_checklist.md +++ b/new_benchmark_checklist.md @@ -51,13 +51,13 @@ Provide clear instructions on: - **Downloading** the dataset and reference model. - **Using** the dataset and model for the benchmark. -## 12. CPU-Only and Minimum GPU Requirements +## 12. CPU-Only and Recommended GPU Requirements Document: - Whether the reference implementation can run on **CPUs only**. - The **minimum number** of GPUs and **required memory** if GPU usage is necessary. ## 13. System Memory and Storage Requirements -Specify the minimum system requirements to run the reference implementation: +Specify the recommended system requirements to run the reference implementation: - **System RAM**: Units of 256 GB RAM. - **Storage**: Units of 500 GB storage. From 827d7062a531ff2734065a94c1f1427991db6e42 Mon Sep 17 00:00:00 2001 From: Arjun Suresh Date: Mon, 16 Dec 2024 05:00:04 +0000 Subject: [PATCH 4/8] Update new_benchmark_checklist.md --- new_benchmark_checklist.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md index 601c9e9..2584c40 100644 --- a/new_benchmark_checklist.md +++ b/new_benchmark_checklist.md @@ -26,19 +26,19 @@ If the **Server** scenario is applicable: - Document the latency threshold. **(99% of the samples must be processed within the specified latency threshold)**. ## 5. Validation Dataset: Unique Samples -Specify the number of **unique samples** in the validation dataset and the **QSL size**. Note that: +Specify the number of **unique samples** in the validation dataset and the **QSL size** in the [inference policies benchmark section](https://github.com/mlcommons/inference_policies/blob/master/inference_rules.adoc#41-benchmarks) and also in the [mlperf.conf](https://github.com/mlcommons/inference/blob/master/loadgen/mlperf.conf). Note that: - The unique samples will be repeated as necessary to meet the minimum required duration for the inference run. - QSL size determines the number of inputs which are loaded to the memory at a time - typically large enough to overflow the system cache. ## 6. Equal Issue Mode Applicability -Document whether **Equal Issue Mode** is applicable: +- Documented whether **Equal Issue Mode** is applicable in the [mlperf.conf](https://github.com/mlcommons/inference/blob/master/loadgen/mlperf.conf#L42) - This is relevant if the time required to process a sample is not consistent across all inputs. ## 7. Expected accuracy and `accuracy.txt` Contents Detail the expected contents of the `accuracy.txt` file after running the reference accuracy script. This file should reflect the accuracy performance based on the validation dataset and reference model. ## 8. Reference Model details -Number of Parameters of the model, FLOPs and the data type used for determining the reference accuracy. **For example, Number of Parameters: 25.6 million, FLOPs: 3.8 billion, Datatype: fp16** +Document the details of the reference model like number of parameters, FLOPs and the data type in the [docs page](https://github.com/mlcommons/inference/blob/docs/docs/index.md). **For example, Number of Parameters: 25.6 million, FLOPs: 3.8 billion, Datatype: fp16** ## 9. Reference Implementation Dataset Coverage Ensure the reference implementation can successfully processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs and generate valid log files. From 11fe2cc1e63ffad11a1b816b2fb8ed5927c3d54e Mon Sep 17 00:00:00 2001 From: Arjun Suresh Date: Mon, 16 Dec 2024 05:47:42 +0000 Subject: [PATCH 5/8] Update new_benchmark_checklist.md --- new_benchmark_checklist.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md index 2584c40..d78cf98 100644 --- a/new_benchmark_checklist.md +++ b/new_benchmark_checklist.md @@ -41,7 +41,7 @@ Detail the expected contents of the `accuracy.txt` file after running the refere Document the details of the reference model like number of parameters, FLOPs and the data type in the [docs page](https://github.com/mlcommons/inference/blob/docs/docs/index.md). **For example, Number of Parameters: 25.6 million, FLOPs: 3.8 billion, Datatype: fp16** ## 9. Reference Implementation Dataset Coverage -Ensure the reference implementation can successfully processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs and generate valid log files. +Ensure the reference implementation can successfully processes the entire validation dataset during **performance**, **accuracy**, and applicable **compliance** runs and generate valid log files passing the submission checker. ## 10. Test Runs with Smaller Input Sets Verify that the reference implementation can perform test runs with a smaller subset of inputs for **performance** and **accuracy** runs. From 78d635c4a2bde67bf153f13fdb9b272bde8b4dad Mon Sep 17 00:00:00 2001 From: Arjun Suresh Date: Mon, 16 Dec 2024 05:53:20 +0000 Subject: [PATCH 6/8] Update new_benchmark_checklist.md --- new_benchmark_checklist.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md index d78cf98..2d72e12 100644 --- a/new_benchmark_checklist.md +++ b/new_benchmark_checklist.md @@ -58,8 +58,8 @@ Document: ## 13. System Memory and Storage Requirements Specify the recommended system requirements to run the reference implementation: -- **System RAM**: Units of 256 GB RAM. -- **Storage**: Units of 500 GB storage. +- **System RAM**: System RAM required to run the reference implementation. +- **Storage**: Secondary storage required to run the reference implementation. ## 14. Submission Checker Modifications Ensure all necessary changes are made to the **submission checker** to validate the benchmark correctly. From 1d302d2809e8caa4ad69780cf7c072a06a4a5eab Mon Sep 17 00:00:00 2001 From: Arjun Suresh Date: Mon, 16 Dec 2024 06:30:16 +0000 Subject: [PATCH 7/8] Update new_benchmark_checklist.md --- new_benchmark_checklist.md | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md index 2d72e12..1e31bfe 100644 --- a/new_benchmark_checklist.md +++ b/new_benchmark_checklist.md @@ -51,20 +51,17 @@ Provide clear instructions on: - **Downloading** the dataset and reference model. - **Using** the dataset and model for the benchmark. -## 12. CPU-Only and Recommended GPU Requirements +## 12. Recommended System Requirements to run the reference implementation Document: - Whether the reference implementation can run on **CPUs only**. - The **minimum number** of GPUs and **required memory** if GPU usage is necessary. - -## 13. System Memory and Storage Requirements -Specify the recommended system requirements to run the reference implementation: -- **System RAM**: System RAM required to run the reference implementation. +- **System RAM**: System memory required to run the reference implementation. - **Storage**: Secondary storage required to run the reference implementation. -## 14. Submission Checker Modifications +## 13. Submission Checker Modifications Ensure all necessary changes are made to the **submission checker** to validate the benchmark correctly. -## 15. Sample Log Files +## 14. Sample Log Files Include sample logs for all applicable scenario runs: - `mlperf_log_summary.txt` - `mlperf_log_detail.txt` From fbfb42ac6dfca36bc6bb16d56da4fd21987ec1d9 Mon Sep 17 00:00:00 2001 From: Arjun Suresh Date: Mon, 16 Dec 2024 06:32:15 +0000 Subject: [PATCH 8/8] Update new_benchmark_checklist.md --- new_benchmark_checklist.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/new_benchmark_checklist.md b/new_benchmark_checklist.md index 1e31bfe..67f33ca 100644 --- a/new_benchmark_checklist.md +++ b/new_benchmark_checklist.md @@ -54,7 +54,7 @@ Provide clear instructions on: ## 12. Recommended System Requirements to run the reference implementation Document: - Whether the reference implementation can run on **CPUs only**. -- The **minimum number** of GPUs and **required memory** if GPU usage is necessary. +- The **required GPU memory** if GPU usage is necessary. - **System RAM**: System memory required to run the reference implementation. - **Storage**: Secondary storage required to run the reference implementation.