Skip to content

Commit

Permalink
Merge pull request #3007 from rapidsai/branch-0.16
Browse files Browse the repository at this point in the history
  • Loading branch information
raydouglass committed Oct 21, 2020
2 parents 645ead7 + e9bf80c commit 8fa0b6e
Show file tree
Hide file tree
Showing 718 changed files with 28,413 additions and 15,611 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ __pycache__
htmlcov
build/
build_prims/
cmake-build*
cuml.egg-info/
dist/
python/cuml/**/*.cpp
Expand Down
11 changes: 9 additions & 2 deletions BUILD.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,17 @@ To install cuML from source, ensure the following dependencies are met:
9. NCCL (>=2.4)
10. UCX [optional] (>= 1.7) - enables point-to-point messaging in the cuML standard communicator. This is necessary for many multi-node multi-GPU cuML algorithms to function.

It is recommended to use conda for environment/package management. If doing so, a convenience environment .yml file is located in `conda/environments/cuml_dec_cudax.y.yml` (replace x.y for your CUDA version). This file contains most of the dependencies mentioned above (notable exceptions are `gcc` and `zlib`). To use it, for example to create an environment named `cuml_dev` for CUDA 10.0 and Python 3.7, you can use the follow command:
It is recommended to use conda for environment/package management. If doing so, a convenience environment .yml file is located in `conda/environments/cuml_dec_cudax.y.yml` (replace x.y for your CUDA version). This file contains most of the dependencies mentioned above (notable exceptions are `gcc` and `zlib`). To use it, for example to create an environment named `cuml_dev` for CUDA 10.2 and Python 3.7, you can use the follow command:

```bash
conda create -n cuml_dev python=3.7
conda env update -n cuml_dev --file=conda/environments/cuml_dev_cuda10.2.yml
```
conda env create -n cuml_dev python=3.7 --file=conda/environments/cuml_dev_cuda10.0.yml

These conda environments are based on the general RAPIDS meta packages that install common dependencies for RAPIDS projects. To install different versions of packages contained in those meta packages after creating the environment, it is recommended to remove those meta packages (without removing the actual packages contained in the environment) with the following command (having the environment active):

```bash
conda remove --force rapids-build-env rapids-notebook-env rapids-doc-env
```

## Installing from Source:
Expand Down
107 changes: 107 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,107 @@
# cuML 0.16.0 (21 Oct 2020)

## New Features
- PR #2922: Install RAFT headers with cuML
- PR #2909: Update allgatherv for compatibility with latest RAFT
- PR #2677: Ability to export RF trees as JSON
- PR #2698: Distributed TF-IDF transformer
- PR #2476: Porter Stemmer
- PR #2789: Dask LabelEncoder
- PR #2152: add FIL C++ benchmark
- PR #2638: Improve cython build with custom `build_ext`
- PR #2866: Support XGBoost-style multiclass models (gradient boosted decision trees) in FIL C++
- PR #2874: Issue warning for degraded accuracy with float64 models in Treelite
- PR #2881: Introduces experimental batched backend for random forest
- PR #2916: Add SKLearn multi-class GBDT model support in FIL

## Improvements
- PR #2947: Add more warnings for accuracy degradation with 64-bit models
- PR #2873: Remove empty marker kernel code for NVTX markers
- PR #2796: Remove tokens of length 1 by default for text vectorizers
- PR #2741: Use rapids build packages in conda environments
- PR #2735: Update seed to random_state in random forest and associated tests
- PR #2739: Use cusparse_wrappers.h from RAFT
- PR #2729: Replace `cupy.sparse` with `cupyx.scipy.sparse`
- PR #2749: Correct docs for python version used in cuml_dev conda environment
- PR #2747: Adopting raft::handle_t and raft::comms::comms_t in cuML
- PR #2762: Fix broken links and provide minor edits to docs
- PR #2723: Support and enable convert_dtype in estimator predict
- PR #2758: Match sklearn's default n_components behavior for PCA
- PR #2770: Fix doxygen version during cmake
- PR #2766: Update default RandomForestRegressor score function to use r2
- PR #2775: Enablinbg mg gtests w/ raft mpi comms
- PR #2783: Add pytest that will fail when GPU IDs in Dask cluster are not unique
- PR #2784: Add SparseCumlArray container for sparse index/data arrays
- PR #2785: Add in cuML-specific dev conda dependencies
- PR #2778: Add README for FIL
- PR #2799: Reenable lightgbm test with lower (1%) proba accuracy
- PR #2800: Align cuML's spdlog version with RMM's
- PR #2824: Make data conversions warnings be debug level
- PR #2835: Rng prims, utils, and dependencies in RAFT
- PR #2541: Improve Documentation Examples and Source Linking
- PR #2837: Make the FIL node reorder loop more obvious
- PR #2849: make num_classes significant in FLOAT_SCALAR case
- PR #2792: Project flash (new build process) script changes
- PR #2850: Clean up unused params in paramsPCA
- PR #2871: Add timing function to utils
- PR #2863: in FIL, rename leaf_value_t enums to more descriptive
- PR #2867: improve stability of FIL benchmark measurements
- PR #2798: Add python tests for FIL multiclass classification of lightgbm models
- PR #2892: Update ci/local/README.md
- PR #2910: Adding Support for CuPy 8.x
- PR #2914: Add tests for XGBoost multi-class models in FIL
- PR #2622: Simplify tSNE perplexity search
- PR #2930: Pin libfaiss to <=1.6.3
- PR #2928: Updating Estimators Derived from Base for Consistency
- PR #2942: Adding `cuml.experimental` to the Docs
- PR #3010: Improve gpuCI Scripts

## Bug Fixes
- PR #2973: Allow data imputation for nan values
- PR #2982: Adjust kneighbors classifier test threshold to avoid intermittent failure
- PR #2885: Changing test target for NVTX wrapper test
- PR #2882: Allow import on machines without GPUs
- PR #2875: Bug fix to enable colorful NVTX markers
- PR #2744: Supporting larger number of classes in KNeighborsClassifier
- PR #2769: Remove outdated doxygen options for 1.8.20
- PR #2787: Skip lightgbm test for version 3 and above temporarily
- PR #2805: Retain index in stratified splitting for dataframes
- PR #2781: Use Python print to correctly redirect spdlogs when sys.stdout is changed
- PR #2787: Skip lightgbm test for version 3 and above temporarily
- PR #2813: Fix memory access in generation of non-row-major random blobs
- PR #2810: Update Rf MNMG threshold to prevent sporadic test failure
- PR #2808: Relax Doxygen version required in CMake to coincide with integration repo
- PR #2818: Fix parsing of singlegpu option in build command
- PR #2827: Force use of whole dataset when sample bootstrapping is disabled
- PR #2829: Fixing description for labels in docs and removing row number constraint from PCA xform/inverse_xform
- PR #2832: Updating stress tests that fail with OOM
- PR #2831: Removing repeated capture and parameter in lambda function
- PR #2847: Workaround for TSNE lockup, change caching preference.
- PR #2842: KNN index preprocessors were using incorrect n_samples
- PR #2848: Fix typo in Python docstring for UMAP
- PR #2856: Fix LabelEncoder for filtered input
- PR #2855: Updates for RMM being header only
- PR #2844: Fix for OPG KNN Classifier & Regressor
- PR #2880: Fix bugs in Auto-ARIMA when s==None
- PR #2877: TSNE exception for n_components > 2
- PR #2879: Update unit test for LabelEncoder on filtered input
- PR #2932: Marking KBinsDiscretizer pytests as xfail
- PR #2925: Fixing Owner Bug When Slicing CumlArray Objects
- PR #2931: Fix notebook error handling in gpuCI
- PR #2941: Fixing dask tsvd stress test failure
- PR #2943: Remove unused shuffle_features parameter
- PR #2940: Correcting labels meta dtype for `cuml.dask.make_classification`
- PR #2965: Notebooks update
- PR #2955: Fix for conftest for singlegpu build
- PR #2968: Remove shuffle_features from RF param names
- PR #2957: Fix ols test size for stability
- PR #2972: Upgrade Treelite to 0.93
- PR #2981: Prevent unguarded import of sklearn in SVC
- PR #2984: Fix GPU test scripts gcov error
- PR #2990: Reduce MNMG kneighbors regressor test threshold
- PR #2997: Changing ARIMA `get/set_params` to `get/set_fit_params`
- PR #3038: Require `ucx-proc=*=gpu`

# cuML 0.15.0 (26 Aug 2020)

## New Features
Expand Down Expand Up @@ -29,6 +133,7 @@
- PR #2661: CUDA-11 support for single-gpu code
- PR #2322: Sparse FIL forests with 8-byte nodes
- PR #2675: Update conda recipes to support CUDA 11
- PR #2645: Add experimental, sklearn-based preprocessing

## Improvements
- PR #2336: Eliminate `rmm.device_array` usage
Expand Down Expand Up @@ -91,6 +196,7 @@
- PR #2623: Fixing kmeans score() API to be compatible with Scikit-learn
- PR #2629: Add naive_bayes api docs
- PR #2643: 'dense' and 'sparse' values of `storage_type` for FIL
- PR #2691: Generic Base class attribute setter
- PR #2666: Update MBSGD documentation to mention that the model is experimental
- PR #2687: Update xgboost version to 1.2.0dev.rapidsai0.15
- PR #2684: CUDA 11 conda development environment yml and faiss patch
Expand Down Expand Up @@ -346,6 +452,7 @@
- PR #2305: Fixed race condition in DBScan
- PR #2354: Fix broken links in README
- PR #2619: Explicitly skip raft test folder for pytest 6.0.0
- PR #2788: Set the minimum number of columns that can be sampled to 1 to fix 0 mem allocation error

# cuML 0.13.0 (31 Mar 2020)

Expand Down
9 changes: 3 additions & 6 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ HELP="$0 [<target> ...] [<flag> ...]
default action (no args) is to build and install 'libcuml', 'cuml', and 'prims' targets only for the detected GPU arch
"
LIBCUML_BUILD_DIR=${REPODIR}/cpp/build
LIBCUML_BUILD_DIR=${LIBCUML_BUILD_DIR:=${REPODIR}/cpp/build}
CUML_BUILD_DIR=${REPODIR}/python/build
PYTHON_DEPS_CLONE=${REPODIR}/python/external_repositories
BUILD_DIRS="${LIBCUML_BUILD_DIR} ${CUML_BUILD_DIR} ${PYTHON_DEPS_CLONE}"
Expand All @@ -71,7 +71,6 @@ BUILD_STATIC_FAISS=OFF
# CONDA_PREFIX, but there is no fallback from there!
INSTALL_PREFIX=${INSTALL_PREFIX:=${PREFIX:=${CONDA_PREFIX}}}
PARALLEL_LEVEL=${PARALLEL_LEVEL:=""}
BUILD_ABI=${BUILD_ABI:=ON}

function hasArg {
(( ${NUMARGS} != 0 )) && (echo " ${ARGS} " | grep -q " $1 ")
Expand Down Expand Up @@ -171,7 +170,6 @@ if completeBuild || hasArg libcuml || hasArg prims || hasArg bench || hasArg pri
cd ${LIBCUML_BUILD_DIR}

cmake -DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX} \
-DCMAKE_CXX11_ABI=${BUILD_ABI} \
-DBLAS_LIBRARIES=${INSTALL_PREFIX}/lib/libopenblas.so.0 \
${GPU_ARCH} \
-DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
Expand Down Expand Up @@ -226,10 +224,9 @@ fi
if completeBuild || hasArg cuml || hasArg pydocs; then
cd ${REPODIR}/python
if [[ ${INSTALL_TARGET} != "" ]]; then
python setup.py build_ext -j${PARALLEL_LEVEL:-1} --inplace ${SINGLEGPU_PYTHON_FLAG}
python setup.py install --single-version-externally-managed --record=record.txt ${SINGLEGPU_PYTHON_FLAG}
python setup.py build_ext -j${PARALLEL_LEVEL:-1} ${SINGLEGPU_PYTHON_FLAG} --library-dir=${LIBCUML_BUILD_DIR} install --single-version-externally-managed --record=record.txt
else
python setup.py build_ext -j${PARALLEL_LEVEL:-1} --inplace --library-dir=${LIBCUML_BUILD_DIR} ${SINGLEGPU_PYTHON_FLAG}
python setup.py build_ext -j${PARALLEL_LEVEL:-1} --library-dir=${LIBCUML_BUILD_DIR} ${SINGLEGPU_PYTHON_FLAG}
fi

if hasArg pydocs; then
Expand Down
2 changes: 1 addition & 1 deletion ci/checks/style.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ source activate gdf
cd $WORKSPACE
export GIT_DESCRIBE_TAG=`git describe --tags`
export MINOR_VERSION=`echo $GIT_DESCRIBE_TAG | grep -o -E '([0-9]+\.[0-9]+)'`
conda install "ucx-py=${MINOR_VERSION}"
conda install "ucx-py=${MINOR_VERSION}" "ucx-proc=*=gpu"

# Run flake8 and get results/return code
FLAKE=`flake8 --exclude=cpp,thirdparty,__init__.py,versioneer.py && flake8 --config=python/.flake8.cython`
Expand Down
76 changes: 46 additions & 30 deletions ci/cpu/build.sh
Original file line number Diff line number Diff line change
@@ -1,27 +1,24 @@
#!/bin/bash
# Copyright (c) 2018, NVIDIA CORPORATION.
######################################
# cuML CPU conda build script for CI #
######################################
##############################################
# cuML CPU conda build script for CI #
##############################################
set -ex

# Logger function for build status output
function logger() {
echo -e "\n>>>> $@\n"
}

# Set path and build parallel level
export PATH=/conda/bin:/usr/local/cuda/bin:$PATH
export PARALLEL_LEVEL=4

# Set versions of packages needed to be grabbed
export CUDF_VERSION=0.8.*
export NVSTRINGS_VERSION=0.8.*
export RMM_VERSION=0.8.*
export PATH=/opt/conda/bin:/usr/local/cuda/bin:$PATH
export PARALLEL_LEVEL=${PARALLEL_LEVEL:-4}

# Set home to the job's workspace
export HOME=$WORKSPACE

# Determine CUDA release version
export CUDA_REL=${CUDA_VERSION%.*}

# Setup 'gpuci_conda_retry' for build retries (results in 2 total attempts)
export GPUCI_CONDA_RETRY_MAX=1
export GPUCI_CONDA_RETRY_SLEEP=30

# Switch to project root; also root of repo checkout
cd $WORKSPACE

Expand All @@ -34,17 +31,22 @@ fi
# SETUP - Check environment
################################################################################

logger "Get env..."
gpuci_logger "Check environment variables"
env

logger "Activate conda env..."
source activate gdf
gpuci_logger "Activate conda env"
. /opt/conda/etc/profile.d/conda.sh
conda activate rapids

logger "Check versions..."
gpuci_logger "Check compiler versions"
python --version
gcc --version
g++ --version
conda list
$CC --version
$CXX --version

gpuci_logger "Check conda environment"
conda info
conda config --show-sources
conda list --show-channel-urls

# FIX Added to deal with Anancoda SSL verification issues during conda builds
conda config --set ssl_verify False
Expand All @@ -53,18 +55,32 @@ conda config --set ssl_verify False
# BUILD - Conda package builds (conda deps: libcuml <- cuml)
################################################################################

logger "Build conda pkg for libcuml..."
source ci/cpu/libcuml/build_libcuml.sh
if [[ -z "$PROJECT_FLASH" || "$PROJECT_FLASH" == "0" ]]; then
if [ "$BUILD_LIBCUML" == '1' -o "$BUILD_CUML" == '1' ]; then
gpuci_logger "Build conda pkg for libcuml"
gpuci_conda_retry build conda/recipes/libcuml
fi
else
if [ "$BUILD_LIBCUML" == '1' ]; then
gpuci_logger "PROJECT FLASH: Build conda pkg for libcuml"
gpuci_conda_retry build conda/recipes/libcuml --dirty --no-remove-work-dir
fi
fi

logger "Build conda pkg for cuml..."
source ci/cpu/cuml/build_cuml.sh
if [ "$BUILD_CUML" == '1' ]; then
if [[ -z "$PROJECT_FLASH" || "$PROJECT_FLASH" == "0" ]]; then
gpuci_logger "Build conda pkg for cuml"
gpuci_conda_retry build conda/recipes/cuml --python=${PYTHON}
else
gpuci_logger "PROJECT FLASH: Build conda pkg for cuml"
gpuci_conda_retry build -c ci/artifacts/cuml/cpu/conda-bld/ --dirty --no-remove-work-dir conda/recipes/cuml --python=${PYTHON}
fi
fi

################################################################################
# UPLOAD - Conda packages
################################################################################

logger "Upload conda pkgs for libcuml..."
source ci/cpu/libcuml/upload-anaconda.sh
gpuci_logger "Upload conda pkgs"
source ci/cpu/upload.sh

logger "Upload conda pkg for cuml..."
source ci/cpu/cuml/upload-anaconda.sh
10 changes: 0 additions & 10 deletions ci/cpu/cuml/build_cuml.sh

This file was deleted.

32 changes: 0 additions & 32 deletions ci/cpu/cuml/upload-anaconda.sh

This file was deleted.

10 changes: 0 additions & 10 deletions ci/cpu/libcuml/build_libcuml.sh

This file was deleted.

Loading

0 comments on commit 8fa0b6e

Please sign in to comment.