Skip to content

Commit

Permalink
TensorRT OSS v8.2 Early Access Release
Browse files Browse the repository at this point in the history
Signed-off-by: Rajeev Rao <[email protected]>
  • Loading branch information
rajeevsrao committed Oct 5, 2021
1 parent 80674b3 commit 2d517d2
Show file tree
Hide file tree
Showing 278 changed files with 432,506 additions and 56,929 deletions.
40 changes: 39 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,44 @@
# TensorRT OSS Release Changelog

## [8.2.0 EA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-0-EA) - 2021-10-05
### Added
- [Demo applications](demo/HuggingFace) showcasing TensorRT inference of [HuggingFace Transformers](https://huggingface.co/transformers).
- Support is currently extended to GPT-2 and T5 models.
- Added support for the following ONNX operators:
- `Einsum`
- `IsNan`
- `GatherND`
- `Scatter`
- `ScatterElements`
- `ScatterND`
- `Sign`
- `Round`
- Added support for building TensorRT Python API on Windows.

### Updated
- Notable API updates in TensorRT 8.2.0.6 EA release. See [TensorRT Developer Guide](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html) for details.
- Added three new APIs, `IExecutionContext: getEnqueueEmitsProfile()`, `setEnqueueEmitsProfile()`, and `reportToProfiler()` which can be used to collect layer profiling info when the inference is launched as a CUDA graph.
- Eliminated the global logger; each `Runtime`, `Builder` or `Refitter` now has its own logger.
- Added new operators: `IAssertionLayer`, `IConditionLayer`, `IEinsumLayer`, `IIfConditionalBoundaryLayer`, `IIfConditionalOutputLayer`, `IIfConditionalInputLayer`, and `IScatterLayer`.
- Added new `IGatherLayer` modes: `kELEMENT` and `kND`
- Added new `ISliceLayer` modes: `kFILL`, `kCLAMP`, and `kREFLECT`
- Added new `IUnaryLayer` operators: `kSIGN` and `kROUND`
- Added new runtime class `IEngineInspector` that can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc.
- `ProfilingVerbosity` enums have been updated to show their functionality more explicitly.
- Updated TensorRT OSS container defaults to cuda 11.4
- CMake to target C++14 builds.
- Updated following ONNX operators:
- `Gather` and `GatherElements` implementations to natively support negative indices
- `Pad` layer to support ND padding, along with `edge` and `reflect` padding mode support
- `If` layer with general performance improvements.

### Removed
- Removed `sampleMLP`.
- Several flags of trtexec have been deprecated:
- `--explicitBatch` flag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically.
- `--explicitPrecision` flag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode.
- `--nvtxMode=[verbose|default|none]` has been deprecated in favor of `--profilingVerbosity=[detailed|layer_names_only|none]` to show its functionality more explicitly.

## [21.10](https://github.com/NVIDIA/TensorRT/releases/tag/21.10) - 2021-10-05
### Added
- Benchmark script for demoBERT-Megatron
Expand Down Expand Up @@ -33,7 +72,6 @@
- Mark BOOL tiles as unsupported
- Remove unnecessary shape tensor checks


### Removed
- N/A

Expand Down
4 changes: 3 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,11 @@ option(BUILD_PLUGINS "Build TensorRT plugin" ON)
option(BUILD_PARSERS "Build TensorRT parsers" ON)
option(BUILD_SAMPLES "Build TensorRT samples" ON)

set(CMAKE_CXX_STANDARD 11)
# C++14
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)

set(CMAKE_CXX_FLAGS "-Wno-deprecated-declarations ${CMAKE_CXX_FLAGS} -DBUILD_SYSTEM=cmake_oss")

############################################################################################
Expand Down
47 changes: 23 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ This repository contains the Open Source Software (OSS) components of NVIDIA Ten
To build the TensorRT-OSS components, you will first need the following software packages.

**TensorRT GA build**
* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.0.3.4
* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.2.0.6

**System Packages**
* [CUDA](https://developer.nvidia.com/cuda-toolkit)
* Recommended versions:
* cuda-11.3.1 + cuDNN-8.2
* cuda-11.4.x + cuDNN-8.2
* cuda-10.2 + cuDNN-8.2
* [GNU make](https://ftp.gnu.org/gnu/make/) >= v4.1
* [cmake](https://github.com/Kitware/CMake/releases) >= v3.13
Expand All @@ -34,16 +34,16 @@ To build the TensorRT-OSS components, you will first need the following software
* [Docker](https://docs.docker.com/install/) >= 19.03
* [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker)
* Toolchains and SDKs
* (Cross compilation for Jetson platform) [NVIDIA JetPack](https://developer.nvidia.com/embedded/jetpack) >= 4.6 (July 2021)
* (Cross compilation for Jetson platform) [NVIDIA JetPack](https://developer.nvidia.com/embedded/jetpack) >= 4.6 (current support only for TensorRT 8.0.1)
* (For Windows builds) [Visual Studio](https://visualstudio.microsoft.com/vs/older-downloads/) 2017 Community or Enterprise edition
* (Cross compilation for QNX platform) [QNX Toolchain](https://blackberry.qnx.com/en)
* PyPI packages (for demo applications/tests)
* [onnx](https://pypi.org/project/onnx/) 1.8.0
* [onnx](https://pypi.org/project/onnx/) 1.9.0
* [onnxruntime](https://pypi.org/project/onnxruntime/) 1.8.0
* [tensorflow-gpu](https://pypi.org/project/tensorflow/) >= 2.4.1
* [Pillow](https://pypi.org/project/Pillow/) >= 8.1.2
* [pycuda](https://pypi.org/project/pycuda/) < 2020.1
* [numpy](https://pypi.org/project/numpy/) 1.21.0
* [tensorflow-gpu](https://pypi.org/project/tensorflow/) >= 2.5.1
* [Pillow](https://pypi.org/project/Pillow/) >= 8.3.2
* [pycuda](https://pypi.org/project/pycuda/) < 2021.1
* [numpy](https://pypi.org/project/numpy/)
* [pytest](https://pypi.org/project/pytest/)
* Code formatting tools (for contributors)
* [Clang-format](https://clang.llvm.org/docs/ClangFormat.html)
Expand All @@ -66,27 +66,27 @@ To build the TensorRT-OSS components, you will first need the following software

Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download).

**Example: Ubuntu 18.04 on x86-64 with cuda-11.3**
**Example: Ubuntu 18.04 on x86-64 with cuda-11.4**

```bash
cd ~/Downloads
tar -xvzf TensorRT-8.0.3.4.Ubuntu-18.04.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-8.0.3.4
tar -xvzf TensorRT-8.2.0.6.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-8.2.0.6
```

**Example: Windows on x86-64 with cuda-11.3**
**Example: Windows on x86-64 with cuda-11.4**

```powershell
cd ~\Downloads
Expand-Archive .\TensorRT-8.0.3.4.Windows10.x86_64.cuda-11.3.cudnn8.2.zip
$Env:TRT_LIBPATH = '$(Get-Location)\TensorRT-8.0.3.4'
Expand-Archive .\TensorRT-8.2.0.6.Windows10.x86_64.cuda-11.4.cudnn8.2.zip
$Env:TRT_LIBPATH = '$(Get-Location)\TensorRT-8.2.0.6'
$Env:PATH += 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\MSBuild\15.0\Bin\'
```


3. #### (Optional - for Jetson builds only) Download the JetPack SDK
1. Download and launch the JetPack SDK manager. Login with your NVIDIA developer account.
2. Select the platform and target OS (example: Jetson AGX Xavier, `Linux Jetpack 4.4`), and click Continue.
2. Select the platform and target OS (example: Jetson AGX Xavier, `Linux Jetpack 4.6`), and click Continue.
3. Under `Download & Install Options` change the download folder and select `Download now, Install later`. Agree to the license terms and click Continue.
4. Move the extracted files into the `<TensorRT-OSS>/docker/jetpack_files` folder.
Expand All @@ -98,13 +98,13 @@ For Linux platforms, we recommend that you generate a docker container for build
1. #### Generate the TensorRT-OSS build container.
The TensorRT-OSS build container can be generated using the supplied Dockerfiles and build script. The build container is configured for building TensorRT OSS out-of-the-box.
**Example: Ubuntu 18.04 on x86-64 with cuda-11.3**
**Example: Ubuntu 18.04 on x86-64 with cuda-11.4.2 (default)**
```bash
./docker/build.sh --file docker/ubuntu-18.04.Dockerfile --tag tensorrt-ubuntu18.04-cuda11.3 --cuda 11.3.1
./docker/build.sh --file docker/ubuntu-18.04.Dockerfile --tag tensorrt-ubuntu18.04-cuda11.4
```
**Example: CentOS/RedHat 8 on x86-64 with cuda-10.2**
**Example: CentOS/RedHat 7 on x86-64 with cuda-10.2**
```bash
./docker/build.sh --file docker/centos-8.Dockerfile --tag tensorrt-centos8-cuda10.2 --cuda 10.2
./docker/build.sh --file docker/centos-7.Dockerfile --tag tensorrt-centos7-cuda10.2 --cuda 10.2
```
**Example: Ubuntu 18.04 cross-compile for Jetson (aarch64) with cuda-10.2 (JetPack SDK)**
```bash
Expand All @@ -114,7 +114,7 @@ For Linux platforms, we recommend that you generate a docker container for build
2. #### Launch the TensorRT-OSS build container.
**Example: Ubuntu 18.04 build container**
```bash
./docker/launch.sh --tag tensorrt-ubuntu18.04-cuda11.3 --gpus all
./docker/launch.sh --tag tensorrt-ubuntu18.04-cuda11.4 --gpus all
```
> NOTE:
1. Use the `--tag` corresponding to build container generated in Step 1.
Expand All @@ -125,7 +125,7 @@ For Linux platforms, we recommend that you generate a docker container for build
## Building TensorRT-OSS
* Generate Makefiles or VS project (Windows) and build.
**Example: Linux (x86-64) build with default cuda-11.3**
**Example: Linux (x86-64) build with default cuda-11.4.2**
```bash
cd $TRT_OSSPATH
mkdir -p build && cd build
Expand Down Expand Up @@ -156,21 +156,20 @@ For Linux platforms, we recommend that you generate a docker container for build
msbuild ALL_BUILD.vcxproj
```
> NOTE:
1. The default CUDA version used by CMake is 11.3.1. To override this, for example to 10.2, append `-DCUDA_VERSION=10.2` to the cmake command.
1. The default CUDA version used by CMake is 11.4.2. To override this, for example to 10.2, append `-DCUDA_VERSION=10.2` to the cmake command.
2. If samples fail to link on CentOS7, create this symbolic link: `ln -s $TRT_OUT_DIR/libnvinfer_plugin.so $TRT_OUT_DIR/libnvinfer_plugin.so.8`
* Required CMake build arguments are:
- `TRT_LIB_DIR`: Path to the TensorRT installation directory containing libraries.
- `TRT_OUT_DIR`: Output directory where generated build artifacts will be copied.
* Optional CMake build arguments:
- `CMAKE_BUILD_TYPE`: Specify if binaries generated are for release or debug (contain debug symbols). Values consists of [`Release`] | `Debug`
- `CUDA_VERISON`: The version of CUDA to target, for example [`11.3.1`].
- `CUDA_VERISON`: The version of CUDA to target, for example [`11.4.2`].
- `CUDNN_VERSION`: The version of cuDNN to target, for example [`8.2`].
- `PROTOBUF_VERSION`: The version of Protobuf to use, for example [`3.0.0`]. Note: Changing this will not configure CMake to use a system version of Protobuf, it will configure CMake to download and try building that version.
- `CMAKE_TOOLCHAIN_FILE`: The path to a toolchain file for cross compilation.
- `BUILD_PARSERS`: Specify if the parsers should be built, for example [`ON`] | `OFF`. If turned OFF, CMake will try to find precompiled versions of the parser libraries to use in compiling samples. First in `${TRT_LIB_DIR}`, then on the system. If the build type is Debug, then it will prefer debug builds of the libraries before release versions if available.
- `BUILD_PLUGINS`: Specify if the plugins should be built, for example [`ON`] | `OFF`. If turned OFF, CMake will try to find a precompiled version of the plugin library to use in compiling samples. First in `${TRT_LIB_DIR}`, then on the system. If the build type is Debug, then it will prefer debug builds of the libraries before release versions if available.
- `BUILD_SAMPLES`: Specify if the samples should be built, for example [`ON`] | `OFF`.
- `CUB_VERSION`: The version of CUB to use, for example [`1.8.0`].
- `GPU_ARCHS`: GPU (SM) architectures to target. By default we generate CUDA code for all major SMs. Specific SM versions can be specified here as a quoted space-separated list to reduce compilation time and binary size. Table of compute capabilities of NVIDIA GPUs can be found [here](https://developer.nvidia.com/cuda-gpus). Examples:
- NVidia A100: `-DGPU_ARCHS="80"`
- Tesla T4, GeForce RTX 2080: `-DGPU_ARCHS="75"`
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
8.0.3.4
8.2.0.6
1 change: 0 additions & 1 deletion cmake/modules/set_ifndef.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#

function (set_ifndef variable value)
if(NOT DEFINED ${variable})
set(${variable} ${value} PARENT_SCOPE)
Expand Down
8 changes: 2 additions & 6 deletions cmake/toolchains/cmake_aarch64-android.toolchain
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@ set(CMAKE_SYSTEM_PROCESSOR aarch64)
set(CMAKE_C_COMPILER $ENV{AARCH64_ANDROID_CC})
set(CMAKE_CXX_COMPILER $ENV{AARCH64_ANDROID_CC})

set(CMAKE_C_FLAGS "$ENV{AARCH64_ANDROID_CFLAGS} -pie -fPIE"
CACHE STRING "" FORCE)
set(CMAKE_C_FLAGS "$ENV{AARCH64_ANDROID_CFLAGS} -pie -fPIE" CACHE STRING "" FORCE)
set(CMAKE_CXX_FLAGS "${CMAKE_C_FLAGS}" CACHE STRING "" FORCE)

set(CMAKE_C_COMPILER_TARGET aarch64-none-linux-android)
Expand All @@ -37,11 +36,8 @@ set(CMAKE_CUDA_HOST_COMPILER ${CMAKE_CXX_COMPILER} CACHE STRING "" FORCE)
set(CMAKE_CUDA_FLAGS "-I${CUDA_INCLUDE_DIRS} -Xcompiler=\"-fPIC ${CMAKE_CXX_FLAGS}\"" CACHE STRING "" FORCE)
set(CMAKE_CUDA_COMPILER_FORCED TRUE)


set(CUDA_LIBS -L${CUDA_ROOT}/lib64)

set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcublas -lcudart -lnvToolsExt -lculibos -lcudadevrt -llog)

set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcudart -lnvToolsExt -lculibos -lcudadevrt -llog)

set(DISABLE_SWIG TRUE)
set(TRT_PLATFORM_ID "aarch64-android")
25 changes: 15 additions & 10 deletions cmake/toolchains/cmake_aarch64.toolchain
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,29 @@

set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_SYSTEM_PROCESSOR aarch64)

set(TRT_PLATFORM_ID "aarch64")
set(CUDA_PLATFORM_ID "aarch64-linux")

set(CMAKE_C_COMPILER /usr/bin/aarch64-linux-gnu-gcc)
set(CMAKE_CXX_COMPILER /usr/bin/aarch64-linux-gnu-g++)
if("$ENV{ARMSERVER}" AND "${CUDA_VERSION}" VERSION_GREATER_EQUAL 11.0)
set(CUDA_PLATFORM_ID "sbsa-linux")
else()
set(CUDA_PLATFORM_ID "aarch64-linux")
endif()

set(CMAKE_C_COMPILER $ENV{AARCH64_CC})
set(CMAKE_CXX_COMPILER $ENV{AARCH64_CC})

set(CMAKE_C_FLAGS "" CACHE STRING "" FORCE)
set(CMAKE_CXX_FLAGS "" CACHE STRING "" FORCE)
set(CMAKE_C_FLAGS "$ENV{AARCH64_CFLAGS}" CACHE STRING "" FORCE)
set(CMAKE_CXX_FLAGS "$ENV{AARCH64_CFLAGS}" CACHE STRING "" FORCE)

set(CMAKE_C_COMPILER_TARGET aarch64)
set(CMAKE_CXX_COMPILER_TARGET aarch64)
set(CMAKE_C_COMPILER_TARGET aarch64-linux-gnu)
set(CMAKE_CXX_COMPILER_TARGET aarch64-linux-gnu)

set(CMAKE_C_COMPILER_FORCED TRUE)
set(CMAKE_CXX_COMPILER_FORCED TRUE)

set(CUDA_ROOT /usr/local/cuda-${CUDA_VERSION}/targets/${CUDA_PLATFORM_ID} CACHE STRING "CUDA ROOT dir")

set(CUDNN_ROOT_DIR /pdk_files/cudnn)
set(BUILD_LIBRARY_ONLY 1)

Expand All @@ -46,6 +53,4 @@ set(CMAKE_CUDA_COMPILER_FORCED TRUE)

set(CUDA_LIBS -L${CUDA_ROOT}/lib)

set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcublas -lcudart -lstdc++ -lm)

set(DISABLE_SWIG TRUE)
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcudart -lstdc++ -lm)
5 changes: 3 additions & 2 deletions cmake/toolchains/cmake_ppc64le.toolchain
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@ set(CMAKE_SYSTEM_PROCESSOR ppc64le)

set(CMAKE_C_COMPILER powerpc64le-linux-gnu-gcc)
set(CMAKE_CXX_COMPILER powerpc64le-linux-gnu-g++)
set(CMAKE_AR /usr/bin/ar CACHE STRING "" FORCE)

set(CMAKE_C_COMPILER_TARGET ppc64le)
set(CMAKE_CXX_COMPILER_TARGET ppc64le)
set(CMAKE_C_COMPILER_TARGET powerpc64le-linux-gnu)
set(CMAKE_CXX_COMPILER_TARGET powerpc64le-linux-gnu)

set(CMAKE_CUDA_HOST_COMPILER ${CMAKE_CXX_COMPILER} CACHE STRING "" FORCE)
set(CMAKE_CUDA_FLAGS "-I${CUDA_ROOT}/include -Xcompiler=\"-fPIC ${CMAKE_CXX_FLAGS}\"" CACHE STRING "" FORCE)
Expand Down
10 changes: 4 additions & 6 deletions cmake/toolchains/cmake_qnx.toolchain
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# limitations under the License.
#

set(CMAKE_SYSTEM_NAME qnx)
set(CMAKE_SYSTEM_NAME QNX)
set(CMAKE_SYSTEM_PROCESSOR aarch64)

if(DEFINED ENV{QNX_BASE})
Expand All @@ -39,8 +39,8 @@ message(STATUS "QNX_TARGET = ${QNX_TARGET}")
set(CMAKE_C_COMPILER ${QNX_HOST}/usr/bin/aarch64-unknown-nto-qnx7.0.0-gcc)
set(CMAKE_CXX_COMPILER ${QNX_HOST}/usr/bin/aarch64-unknown-nto-qnx7.0.0-g++)

set(CMAKE_C_COMPILER_TARGET aarch64)
set(CMAKE_CXX_COMPILER_TARGET aarch64)
set(CMAKE_C_COMPILER_TARGET aarch64-unknown-nto-qnx)
set(CMAKE_CXX_COMPILER_TARGET aarch64-unknown-nto-qnx)

set(CMAKE_C_COMPILER_FORCED TRUE)
set(CMAKE_CXX_COMPILER_FORCED TRUE)
Expand All @@ -54,8 +54,6 @@ set(CMAKE_CUDA_COMPILER_FORCED TRUE)

set(CUDA_LIBS -L${CUDA_ROOT}/lib)

set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcublas -lcudart)
#...Disable swig
set(DISABLE_SWIG TRUE)
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcudart)

set(TRT_PLATFORM_ID "aarch64-qnx")
3 changes: 1 addition & 2 deletions cmake/toolchains/cmake_x64_win.toolchain
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,12 @@ set(W10_LIBRARY_SUFFIXES .lib .dll)
set(W10_CUDA_ROOT ${CUDA_TOOLKIT_ROOT_DIR})
set(W10_LINKER ${MSVC_COMPILER_DIR}/bin/amd64/link)


set(CMAKE_CUDA_HOST_COMPILER ${CMAKE_NVCC_COMPILER} CACHE STRING "" FORCE)

set(ADDITIONAL_PLATFORM_INCL_FLAGS "-I${MSVC_COMPILER_DIR}/include -I${MSVC_COMPILER_DIR}/../ucrt/include")
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${ADDITIONAL_PLATFORM_LIB_FLAGS} "-LIBPATH:${NV_TOOLS}/ddk/wddmv2/official/17134/Lib/10.0.17134.0/um/x64")
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${ADDITIONAL_PLATFORM_LIB_FLAGS} "-LIBPATH:${MSVC_COMPILER_DIR}/lib/amd64" )
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${ADDITIONAL_PLATFORM_LIB_FLAGS} "-LIBPATH:${MSVC_COMPILER_DIR}/../ucrt/lib/x64")
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${ADDITIONAL_PLATFORM_LIB_FLAGS} "-LIBPATH:${W10_CUDA_ROOT}/lib/x64 cudart.lib cublas.lib")
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${ADDITIONAL_PLATFORM_LIB_FLAGS} "-LIBPATH:${W10_CUDA_ROOT}/lib/x64 cudart.lib")

set(TRT_PLATFORM_ID "win10")
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,17 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
set(SAMPLE_SOURCES
sampleMLP.cpp
)

set(SAMPLE_PARSERS "caffe")
set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_SYSTEM_PROCESSOR x86_64)

include(../CMakeSamplesTemplate.txt)
set(CMAKE_C_COMPILER /opt/rh/devtoolset-8/root/usr/bin/gcc)
set(CMAKE_CXX_COMPILER /opt/rh/devtoolset-8/root/usr/bin/g++)

if(DEFINED CUDA_ROOT)
set(CUDA_TOOLKIT_ROOT_DIR ${CUDA_ROOT})
endif()

set(CUDA_INCLUDE_DIRS ${CUDA_ROOT}/include)

set(TRT_PLATFORM_ID "x86_64")
2 changes: 2 additions & 0 deletions demo/HuggingFace/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.pyc
__pycache__/
Empty file added demo/HuggingFace/GPT2/.gitkeep
Empty file.
Loading

0 comments on commit 2d517d2

Please sign in to comment.