Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cambricon: update mlu with master #46

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .github/workflows/python-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: flag-gems-test

on:
push:
branches: [ "master" ]
pull_request:
branches: [ "master" ]

jobs:
container-unit-test:
runs-on: [self-hosted, docker]
container:
image: localhost:5000/flag-gems-ci:v1.0
ports:
- 81
options: --gpus all --hostname flag-gems_cicd_ut
steps:
- name: checkout-code
uses: actions/checkout@v2

- name: unit_test-flag-gems
run: |
CUDA_VISIBLE_DEVICES=0 pytest -s tests/test_*

container-model-test:
runs-on: [self-hosted, docker]
container:
image: localhost:5000/flag-gems-ci:v1.0
ports:
- 82
options: --gpus all --hostname flag-gems_cicd_model -v /home/flaggems_cicd/huggingface_cache_bert:/__w/_temp/_github_home/.cache/huggingface
steps:
- name: checkout-code
uses: actions/checkout@v2

- name: examples-flag-gems
run: |
CUDA_VISIBLE_DEVICES=1 pytest -s examples/model_bert_test.py
8 changes: 3 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,9 @@ FlagGems
│ │ ├──ops: single operators
│ │ ├──fused: fused operators
│ │ ├──__init__.py
├── tests
│ ├──flag_gems
│ │ ├──model_bert_test.py: test for BERT model running with flag_gems
│ │ ├──op_accu_test.py: test for accuracy of operators
│ │ ├──op_perf_test.py: test for performance of operators
├── tests: accuracy test files
├── benchmark: performance test files
├── examples: model test files
├── LICENSE
├── README.md
├── README_cn.md
Expand Down
4 changes: 1 addition & 3 deletions OperatorList.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
## Operator List

FlagGems will implement the following operators as planned. Version 1.0 will be released within 6 months.

## v1.0
- addmm
- bmm
Expand Down Expand Up @@ -32,6 +30,7 @@ FlagGems will implement the following operators as planned. Version 1.0 will be

## v2.0

- mv
- all
- any
- bitwise_and
Expand All @@ -41,7 +40,6 @@ FlagGems will implement the following operators as planned. Version 1.0 will be
- eq
- ge
- gt
- is_nonzero
- isinf
- isnan
- le
Expand Down
49 changes: 34 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,19 @@ By registering with the ATen backend of PyTorch, FlagGems facilitates a seamless
- support pointwise operators: abs, add, div, dropout, exp, gelu, mul, pow, reciprocal, relu, rsqrt, silu, sub, triu
- support reduction operators: cumsum, layernorm, mean, softmax

### v2.0
- support BLAS operator: mv, outer
- support pointwise operators: bitwise_and, bitwise_not, bitwise_or, cos, clamp, eq, ge, gt, isinf, isnan, le, lt, ne, neg, or, sin, tanh, sigmoid
- support reduction operators: all, any, amax, argmax, max, min, prod, sum, var_mean, vector_norm, cross_entropy_loss, group_norm, log_softmax, rms_norm
- support fused operators: skip_rms_norm, skip_layer_norm, gelu_and_mul, silu_and_mul, apply_rotary_position_embedding

## Quick Start

### Requirements

1. Triton >= 2.2.0
2. PyTorch >= 2.1.2
3. Transformers >= 4.31.0
3. Transformers >= 4.40.2

### Installation

Expand Down Expand Up @@ -61,37 +67,50 @@ pip install .

### Execute

1. Run Tests
- Operator Accuracy
1. Test Operator Accuracy
- Run reference on cuda
```shell
cd tests
pytest test_xx_ops.py
```
- Run reference on cpu
```shell
cd tests/flag_gems
pytest op_accu_test.py
cd tests
pytest test_xx_ops.py --device cpu
```
- Model Accuracy

2. Test Model Accuracy
```shell
cd examples
pytest model_xx_test.py
```

3. Test Operator Performance
- Test CUDA performance
```shell
cd tests/flag_gems
pytest model_bert_test.py
cd benchmark
pytest test_xx_perf.py -s
```
- Operator Performance
- Test end-to-end performance
```shell
cd tests/flag_gems
python op_perf_test.py
cd benchmark
pytest test_xx_perf.py -s --mode cpu
```

2. Run tests with logging infomation
4. Run tests with logging infomation
```shell
pytest program.py --log-cli-level debug
```
Not recommended in performance testing.

## Supported Operators

Operators will be implemented according to [OperatorList.md](https://github.com/FlagOpen/FlagGems/blob/master/OperatorList.md).

## Supported Models

| Model | float16 | float32 | bfloat16 |
| :---: | :---: | :---: | :---: |
| Bert_base | ✓ | ✓ | ✓ |
- Bert-base-uncased
- Llama-2-7b

## Supported Platforms

Expand Down
42 changes: 30 additions & 12 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,19 @@ FlagGems通过对PyTorch的后端aten算子进行覆盖重写,实现算子库
- 支持pointwise类算子:abs, add, div, dropout, exp, gelu, mul, pow, reciprocal, relu, rsqrt, silu, sub, triu
- 支持reduction类算子:cumsum, layernorm, mean, softmax

### v2.0
- 支持BLAS类算子: mv, outer
- 支持pointwise类算子: bitwise_and, bitwise_not, bitwise_or, cos, clamp, eq, ge, gt, isinf, isnan, le, lt, ne, neg, or, sin, tanh, sigmoid
- 支持reduction类算子: all, any, amax, argmax, max, min, prod, sum, var_mean, vector_norm, cross_entropy_loss, group_norm, log_softmax, rms_norm
- 支持融合算子: skip_rms_norm, skip_layer_norm, gelu_and_mul, silu_and_mul, apply_rotary_position_embedding

## 快速入门

### 依赖

1. Triton >= 2.2.0
2. PyTorch >= 2.1.2
3. Transformers >= 4.31.0
3. Transformers >= 4.40.2

### 安装

Expand Down Expand Up @@ -60,37 +66,49 @@ pip install .

### 执行

1. 运行测试
- 算子正确性测试
1. 算子正确性测试
- 在CUDA上运行参考实现
```shell
cd tests/flag_gems
pytest op_accu_test.py
```
- 模型正确性测试
- 在CPU上运行参考实现
```shell
cd tests/flag_gems
pytest model_bert_test.py
cd tests
pytest test_xx_ops.py --device cpu
```
- 算子性能测试
2. 模型正确性测试
```shell
cd examples
pytest model_xx_test.py
```

3. 算子性能测试
- 测试CUDA性能
```shell
cd tests/flag_gems
python op_perf_test.py
cd benchmark
pytest test_xx_perf.py -s
```
- 测试端到端性能
```shell
cd benchmark
pytest test_xx_perf.py -s --mode cpu
```

2. 运行时打印日志信息
```shell
pytest program.py --log-cli-level debug
```
测试性能时不建议打开。

## 支持算子

算子将按照文档[OperatorList.md](https://github.com/FlagOpen/FlagGems/blob/master/OperatorList.md)的顺序逐步实现。

## 支持模型

| Model | float16 | float32 | bfloat16 |
| :---: | :---: | :---: | :---: |
| Bert_base | ✓ | ✓ | ✓ |
- Bert-base-uncased
- Llama-2-7b

## 支持平台

Expand Down
Empty file added benchmark/__init__.py
Empty file.
15 changes: 15 additions & 0 deletions benchmark/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
def pytest_addoption(parser):
parser.addoption(
"--mode",
action="store",
default="cuda",
required=False,
choices=["cuda", "cpu"],
help="record latency in cuda or cpu",
)


def pytest_configure(config):
value = config.getoption("--mode")
global CPU_MODE
CPU_MODE = value == "cpu"
Loading
Loading