Skip to content

Commit

Permalink
Merge pull request #7 from anandhu-eng/cm_readme_inference_update
Browse files Browse the repository at this point in the history
Cm readme inference update
  • Loading branch information
anandhu-eng authored Aug 22, 2024
2 parents acf5591 + 9ea1d14 commit 238318a
Show file tree
Hide file tree
Showing 12 changed files with 840 additions and 198 deletions.
124 changes: 0 additions & 124 deletions docs/benchmarks/index.md

This file was deleted.

4 changes: 4 additions & 0 deletions docs/benchmarks/language/get-llama2-70b-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,8 @@ Get the Official MLPerf LLAMA2-70b Model
```
cm run script --tags=get,ml-model,llama2-70b,_pytorch -j
```

!!! tip

Downloading llama2-70B model from Hugging Face will prompt you to enter the Hugging Face username and password. Please note that the password required is the [**access token**](https://huggingface.co/settings/tokens) generated for your account. Additionally, ensure that your account has access to the [llama2-70B](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) model.

9 changes: 7 additions & 2 deletions docs/benchmarks/medical_imaging/get-3d-unet-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,14 @@ The benchmark implementation run command will automatically download the validat
=== "Validation"
3d-unet validation run uses the KiTS19 dataset performing [KiTS 2019](https://kits19.grand-challenge.org/) kidney tumor segmentation task

### Get Validation Dataset
### Get Validation Dataset(Original)
```
cm run script --tags=get,dataset,kits19,validation -j
cm run script --tags=get,dataset,kits19,_validation -j
```

### Get Validation Dataset(Preprocessed)
```
cm run script --tags=get,dataset,kits19,preprocessed -j
```

## Model
Expand Down
1 change: 0 additions & 1 deletion docs/index.md

This file was deleted.

169 changes: 169 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# MLPerf Inference Benchmarks

## Overview
This document provides details on various [MLPerf Inference Benchmarks](index_gh.md) categorized by tasks, models, and datasets. Each section lists the models performing similar tasks, with details on datasets, accuracy, and server latency constraints.

---

## 1. Image Classification
### [ResNet50-v1.5](benchmarks/image_classification/resnet50.md)
- **Dataset**: Imagenet-2012 (224x224) Validation
- **Dataset Size**: 50,000
- **QSL Size**: 1,024
- **Number of Parameters**: 25.6 Million
- **FLOPs**: TBD
- **Reference Model Accuracy**: 76.46% ACC
- **Server Scenario Latency Constraint**: 15ms
- **Equal Issue mode**: False
- **Accuracy Variants**: 99% of fp32 reference model accuracy
- **Submission Category**: Data Center, Edge

---

## 2. Text to Image
### [Stable Diffusion](benchmarks/text_to_image/sdxl.md)
- **Dataset**: Subset of Coco2014
- **Dataset Size**: 5,000
- **QSL Size**: 5,000
- **Number of Parameters**: 3.5 Billion <!-- taken from https://stability.ai/news/stable-diffusion-sdxl-1-announcement -->
- **FLOPs**: TBD
- **Required Accuracy (Closed Division)**:
- FID: 23.01085758 ≤ FID ≤ 23.95007626
- CLIP: 32.68631873 ≤ CLIP ≤ 31.81331801
- **Equal Issue mode**: False
- **Accuracy Variants**: 98% of fp32 reference model accuracy
- **Submission Category**: Data Center, Edge

---

## 3. Object Detection
### [Retinanet](benchmarks/object_detection/retinanet.md)
- **Dataset**: OpenImages
- **Dataset Size**: 24,781
- **QSL Size**: 64
- **Number of Parameters**: TBD
- **FLOPs**: TBD
- **Reference Model Accuracy**: 0.3755 mAP
- **Server Scenario Latency Constraint**: 100ms
- **Equal Issue mode**: False
- **Accuracy Variants**: 99% of fp32 reference model accuracy
- **Submission Category**: Data Center, Edge

---

## 4. Medical Image Segmentation
### [3d-unet](benchmarks/medical_imaging/3d-unet.md)
- **Dataset**: KiTS2019
- **Dataset Size**: 42
- **QSL Size**: 42
- **Number of Parameters**: 19 Million <!-- taken from https://arxiv.org/pdf/1606.06650 -->
- **FLOPs**: TBD
- **Reference Model Accuracy**: 0.86330 Mean DICE Score
- **Server Scenario**: Not Applicable
- **Equal Issue mode**: True
- **Accuracy Variants**: 99% and 99.9% of fp32 reference model accuracy
- **Submission Category**: Data Center, Edge

---

## 5. Language Tasks

### 5.1. Question Answering

### [Bert-Large](benchmarks/language/bert.md)
- **Dataset**: Squad v1.1 (384 Sequence Length)
- **Dataset Size**: 10,833
- **QSL Size**: 10,833
- **Number of Parameters**: 340 Million <!-- taken from https://huggingface.co/transformers/v2.9.1/pretrained_models.html -->
- **FLOPs**: TBD
- **Reference Model Accuracy**: F1 Score = 90.874%
- **Server Scenario Latency Constraint**: 130ms
- **Equal Issue mode**: False
- **Accuracy Variants**: 99% and 99.9% of fp32 reference model accuracy
- **Submission Category**: Data Center, Edge

### [LLAMA2-70B](benchmarks/language/llama2-70b.md)
- **Dataset**: OpenORCA (GPT-4 split, max_seq_len=1024)
- **Dataset Size**: 24,576
- **QSL Size**: 24,576
- **Number of Parameters**: 70 Billion
- **FLOPs**: TBD
- **Reference Model Accuracy**:
- Rouge1: 44.4312
- Rouge2: 22.0352
- RougeL: 28.6162
- Tokens_per_sample: 294.45
- **Server Scenario Latency Constraint**:
- TTFT: 2000ms
- TPOT: 200ms
- **Equal Issue mode**: True
- **Accuracy Variants**: 99% and 99.9% of fp32 reference model accuracy
- **Submission Category**: Data Center

### 5.2. Text Summarization

### [GPT-J](benchmarks/language/gpt-j.md)
- **Dataset**: CNN Daily Mail v3.0.0
- **Dataset Size**: 13,368
- **QSL Size**: 13,368
- **Number of Parameters**: 6 Billion
- **FLOPs**: TBD
- **Reference Model Accuracy**:
- Rouge1: 42.9865
- Rouge2: 20.1235
- RougeL: 29.9881
- Gen_len: 4,016,878
- **Server Scenario Latency Constraint**: 20s
- **Equal Issue mode**: True
- **Accuracy Variants**: 99% and 99.9% of fp32 reference model accuracy
- **Submission Category**: Data Center, Edge

### 5.3. Mixed Tasks (Question Answering, Math, and Code Generation)

### [Mixtral-8x7B](benchmarks/language/mixtral-8x7b.md)
- **Datasets**:
- OpenORCA (5k samples of GPT-4 split, max_seq_len=2048)
- GSM8K (5k samples of the validation split, max_seq_len=2048)
- MBXP (5k samples of the validation split, max_seq_len=2048)
- **Dataset Size**: 15,000
- **QSL Size**: 15,000
- **Number of Parameters**: 47 Billion <!-- https://huggingface.co/blog/moe -->
- **FLOPs**: TBD
- **Reference Model Accuracy**:
- Rouge1: 45.4911
- Rouge2: 23.2829
- RougeL: 30.3615
- GSM8K Accuracy: 73.78%
- MBXP Accuracy: 60.12%
- Tokens_per_sample: 294.45
- **Server Scenario Latency Constraint**:
- TTFT: 2000ms
- TPOT: 200ms
- **Equal Issue mode**: True
- **Accuracy Variants**: 99% and of fp16 reference model
- **Submission Category**: Data Center

---

## 6. Recommendation
### [DLRMv2](benchmarks/recommendation/dlrm-v2.md)
- **Dataset**: Synthetic Multihot Criteo
- **Dataset Size**: 204,800
- **QSL Size**: 204,800
- **Number of Parameters**: TBD
- **FLOPs**: TBD
- **Reference Model Accuracy**: AUC = 80.31%
- **Server Scenario Latency Constraint**: 60ms
- **Equal Issue mode**: False
- **Accuracy Variants**: 99% and 99.9% of fp32 reference model accuracy
- **Submission Category**: Data Center

---

### Submission Categories
- **Datacenter Category**: All the benchmarks can participate.
- **Edge Category**: All benchmarks except DLRMv2, LLAMA2, and Mixtral-8x7B can participate.

### High Accuracy Variants
- **Benchmarks**: `bert`, `llama2-70b`, `dlrm_v2`, and `3d-unet`
- **Requirement**: Must achieve at least 99.9% of the reference model accuracy, compared to the default 99% accuracy requirement.
1 change: 1 addition & 0 deletions docs/index_gh.md
2 changes: 1 addition & 1 deletion docs/install/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ CM needs `git`, `python3-pip` and `python3-venv` installed on your system. If an

Here, repo is in the format `githubUsername@githubRepo`.

Now, you are ready to use the `cm` commands to run MLPerf inference as given in the [benchmarks](../benchmarks/index.md) page
Now, you are ready to use the `cm` commands to run MLPerf inference as given in the [benchmarks](../index.md) page
Loading

0 comments on commit 238318a

Please sign in to comment.