Releases: dmlc/xgboost
Release Candidate of version 1.1.0
R package: xgboost_1.1.0.1.tar.gz
1.0.2 Patch Release
1.0.1 Patch Release
This release is identical to the 1.0.0 release, except that it fixes a small bug that rendered 1.0.0 incompatible with Python 3.5. See #5328.
Release 1.0.0 stable
v1.0.0 (2020.02.19)
This release marks a major milestone for the XGBoost project.
Apache-style governance, contribution policy, and semantic versioning (#4646, #4659)
- Starting with 1.0.0 release, the XGBoost Project is adopting Apache-style governance. The full community guideline is available in the doc website. Note that we now have Project Management Committee (PMC) who would steward the project on the long-term basis. The PMC is also entrusted to run and fund the project's continuous integration (CI) infrastructure (https://xgboost-ci.net).
- We also adopt the semantic versioning. See our release versioning policy.
Better performance scaling for multi-core CPUs (#4502, #4529, #4716, #4851, #5008, #5107, #5138, #5156)
- Poor performance scaling of the
hist
algorithm for multi-core CPUs has been under investigation (#3810). Previous effort #4529 was replaced with a series of pull requests (#5107, #5138, #5156) aimed at achieving the same performance benefits while keeping the C++ codebase legible. The latest performance benchmark results show up to 5x speedup on Intel CPUs with many cores. Note: #5244, which concludes the effort, will become part of the upcoming release 1.1.0.
Improved installation experience on Mac OSX (#4672, #5074, #5080, #5146, #5240)
- It used to be quite complicated to install XGBoost on Mac OSX. XGBoost uses OpenMP to distribute work among multiple CPU cores, and Mac's default C++ compiler (Apple Clang) does not come with OpenMP. Existing work-around (using another C++ compiler) was complex and prone to fail with cryptic diagnosis (#4933, #4949, #4969).
- Now it only takes two commands to install XGBoost:
brew install libomp
followed bypip install xgboost
. The installed XGBoost will use all CPU cores. - Even better, XGBoost is now available from Homebrew:
brew install xgboost
. See Homebrew/homebrew-core#50467. - Previously, if you installed the XGBoost R package using the command
install.packages('xgboost')
, it could only use a single CPU core and you would experience slow training performance. With 1.0.0 release, the R package will use all CPU cores out of box.
Distributed XGBoost now available on Kubernetes (#4621, #4939)
Ruby binding for XGBoost (#4856)
New Native Dask interface for multi-GPU and multi-node scaling (#4473, #4507, #4617, #4819, #4907, #4914, #4941, #4942, #4951, #4973, #5048, #5077, #5144, #5270)
- XGBoost now integrates seamlessly with Dask, a lightweight distributed framework for data processing. Together with the first-class support for cuDF data frames (see below), it is now easier than ever to create end-to-end data pipeline running on one or more NVIDIA GPUs.
- Multi-GPU training with Dask is now up to 20% faster than the previous release (#4914, #4951).
First-class support for cuDF data frames and cuPy arrays (#4737, #4745, #4794, #4850, #4891, #4902, #4918, #4927, #4928, #5053, #5189, #5194, #5206, #5219, #5225)
- cuDF is a data frame library for loading and processing tabular data on NVIDIA GPUs. It provides a Pandas-like API.
- cuPy implements a NumPy-compatible multi-dimensional array on NVIDIA GPUs.
- Now users can keep the data on the GPU memory throughout the end-to-end data pipeline, obviating the need for copying data between the main memory and GPU memory.
- XGBoost can accept any data structure that exposes
__array_interface__
signature, opening way to support other columar formats that are compatible with Apache Arrow.
Feature interaction constraint is now available with approx
and gpu_hist
algorithms (#4534, #4587, #4596, #5034).
Learning to rank is now GPU accelerated (#4873, #5004, #5129)
- Supported ranking objectives: NDGC, Map, Pairwise.
- Up to 2x improved training performance on GPUs.
Enable gamma
parameter for GPU training (#4874, #4953)
- The
gamma
parameter specifies the minimum loss reduction required to add a new split in a tree. A larger value forgamma
has the effect of pre-pruning the tree, by making harder to add splits.
External memory for GPU training (#4486, #4526, #4747, #4833, #4879, #5014)
- It is now possible to use NVIDIA GPUs even when the size of training data exceeds the available GPU memory. Note that the external memory support for GPU is still experimental. #5093 will further improve performance and will become part of the upcoming release 1.1.0.
- RFC for enabling external memory with GPU algorithms: #4357
Improve Scikit-Learn interface (#4558, #4842, #4929, #5049, #5151, #5130, #5227)
- Many users of XGBoost enjoy the convenience and breadth of Scikit-Learn ecosystem. In this release, we revise the Scikit-Learn API of XGBoost (
XGBRegressor
,XGBClassifier
, andXGBRanker
) to achieve feature parity with the traditional XGBoost interface (xgboost.train()
). - Insert check to validate data shapes.
- Produce an error message if
eval_set
is not a tuple. An error message is better than silently crashing. - Allow using
numpy.RandomState
object. - Add
n_jobs
as an alias ofnthread
. - Roadmap: #5152
XGBoost4J-Spark: Redesigning checkpointing mechanism
- RFC is available at #4786
- Clean up checkpoint file after a successful training job (#4754): The current implementation in XGBoost4J-Spark does not clean up the checkpoint file after a successful training job. If the user runs another job with the same checkpointing directory, she will get a wrong model because the second job will re-use the checkpoint file left over from the first job. To prevent this scenario, we propose to always clean up the checkpoint file after every successful training job.
- Avoid Multiple Jobs for Checkpointing (#5082): The current method for checkpoint is to collect the booster produced at the last iteration of each checkpoint internal to Driver and persist it in HDFS. The major issue with this approach is that it needs to re-perform the data preparation for training if the user did not choose to cache the training dataset. To avoid re-performing data prep, we build external-memory checkpointing in the XGBoost4J layer as well.
- Enable deterministic repartitioning when checkpoint is enabled (#4807): Distributed algorithm for gradient boosting assumes a fixed partition of the training data between multiple iterations. In previous versions, there was no guarantee that data partition would stay the same, especially when a worker goes down and some data had to recovered from previous checkpoint. In this release, we make data partition deterministic by using the data hash value of each data row in computing the partition.
XGBoost4J-Spark: handle errors thrown by the native code (#4560)
- All core logic of XGBoost is written in C++, so XGBoost4J-Spark internally uses the C++ code via Java Native Interface (JNI). #4560 adds a proper error handling for any errors or exceptions arising from the C++ code, so that the XGBoost Spark application can be torn down in an orderly fashion.
XGBoost4J-Spark: Refine method to count the number of alive cores (#4858)
- The
SparkParallelismTracker
class ensures that sufficient number of executor cores are alive. To that end, it is important to query the number of alive cores reliably.
XGBoost4J: Add BigDenseMatrix
to store more than Integer.MAX_VALUE
elements (#4383)
Robust model serialization with JSON (#4632, #4708, #4739, #4868, #4936, #4945, #4974, #5086, #5087, #5089, #5091, #5094, #5110, #5111, #5112, #5120, #5137, #5218, #5222, #5236, #5245, #5248, #5281)
-
In this release, we introduce an experimental support of using JSON for serializing (saving/loading) XGBoost models and related hyperparameters for training. We would like to eventually replace the old binary format with JSON, since it is an open format and parsers are available in many programming languages and platforms. See the documentation for model I/O using JSON. #3980 explains why JSON was chosen over other alternatives.
-
To maximize interoperability and compatibility of the serialized models, we now split serialization into two parts (#4855):
- Model, e.g. decision trees and strictly related metadata like
num_features
. - Internal configuration, consisting of training parameters and other configurable parameters. For example,
max_delta_step
,tree_method
,objective
,predictor
,gpu_id
.
Previously, users often ran into issues where the model file produced by one machine could not load or run on another machine. For example, models trained using a machine with an NVIDIA GPU could not run on another machine without a GPU (#5291, #5234). The reason is that the old binary format saved some internal configuration that were not universally applicable to all machines, e.g.
predictor='gpu_predictor'
.Now, model saving function (
Booster.save_model()
in Python) will save only the model, without internal configuration. This will guarantee that your model file would be used anywhere. Internal configuration will be serialized in limited circumstances such as:- Multiple nodes in a distributed system exchange model details over the network.
- Model checkpointing, to recove...
- Model, e.g. decision trees and strictly related metadata like
Release Candidate 2 of version 1.0.0
Python package
- Linux 64-bit wheel: xgboost-1.0.0rc2-py3-none-manylinux1_x86_64.whl
- Windows 64-bit wheel: xgboost-1.0.0rc2-py3-none-win_amd64.whl
- Source distribution: xgboost-1.0.0rc2.tar.gz
R package: xgboost_1.0.0.1.tar.gz
JVM packages (Linux 64-bit only)
- XGBoost4J: xgboost4j_2.12-1.0.0-RC2.jar
- XGBoost4J-Spark: xgboost4j-spark_2.12-1.0.0-RC2.jar
- XGBoost4J-Flink: xgboost4j-flink_2.12-1.0.0-RC2.jar
Release Candidate of version 1.0.0
Python package
- Linux 64-bit wheel: xgboost-1.0.0rc1-py2.py3-none-manylinux1_x86_64.whl
- Windows 64-bit wheel: xgboost-1.0.0rc1-py2.py3-none-win_amd64.whl
- Source distribution: xgboost-1.0.0rc1.tar.gz
R package: xgboost_1.0.0.1.tar.gz
JVM packages (Linux 64-bit only)
- XGBoost4J: xgboost4j_2.12-1.0.0-RC1.jar
- XGBoost4J-Spark: xgboost4j-spark_2.12-1.0.0-RC1.jar
- XGBoost4J-Flink: xgboost4j-flink_2.12-1.0.0-RC1.jar
This is a stable release of 0.90 version
XGBoost Python package drops Python 2.x (#4379, #4381)
Python 2.x is reaching its end-of-life at the end of this year. Many scientific Python packages are now moving to drop Python 2.x.
XGBoost4J-Spark now requires Spark 2.4.x (#4377)
- Spark 2.3 is reaching its end-of-life soon. See discussion at #4389.
- Consistent handling of missing values (#4309, #4349, #4411): Many users had reported issue with inconsistent predictions between XGBoost4J-Spark and the Python XGBoost package. The issue was caused by Spark mis-handling non-zero missing values (NaN, -1, 999 etc). We now alert the user whenever Spark doesn't handle missing values correctly (#4309, #4349). See the tutorial for dealing with missing values in XGBoost4J-Spark. This fix also depends on the availability of Spark 2.4.x.
Roadmap: better performance scaling for multi-core CPUs (#4310)
- Poor performance scaling of the
hist
algorithm for multi-core CPUs has been under investigation (#3810). #4310 optimizes quantile sketches and other pre-processing tasks. Special thanks to @SmirnovEgorRu.
Roadmap: Harden distributed training (#4250)
- Make distributed training in XGBoost more robust by hardening Rabit, which implements the AllReduce primitive. In particular, improve test coverage on mechanisms for fault tolerance and recovery. Special thanks to @chenqin.
New feature: Multi-class metric functions for GPUs (#4368)
- Metrics for multi-class classification have been ported to GPU:
merror
,mlogloss
. Special thanks to @trivialfis. - With supported metrics, XGBoost will select the correct devices based on your system and
n_gpus
parameter.
New feature: Scikit-learn-like random forest API (#4148, #4255, #4258)
- XGBoost Python package now offers
XGBRFClassifier
andXGBRFRegressor
API to train random forests. See the tutorial. Special thanks to @canonizer
New feature: use external memory in GPU predictor (#4284, #4396, #4438, #4457)
-
It is now possible to make predictions on GPU when the input is read from external memory. This is useful when you want to make predictions with big dataset that does not fit into the GPU memory. Special thanks to @rongou, @canonizer, @sriramch.
dtest = xgboost.DMatrix('test_data.libsvm#dtest.cache') bst.set_param('predictor', 'gpu_predictor') bst.predict(dtest)
-
Coming soon: GPU training (
gpu_hist
) with external memory
New feature: XGBoost can now handle comments in LIBSVM files (#4430)
- Special thanks to @trivialfis and @hcho3
New feature: Embed XGBoost in your C/C++ applications using CMake (#4323, #4333, #4453)
-
It is now easier than ever to embed XGBoost in your C/C++ applications. In your CMakeLists.txt, add
xgboost::xgboost
as a linked library:find_package(xgboost REQUIRED) add_executable(api-demo c-api-demo.c) target_link_libraries(api-demo xgboost::xgboost)
XGBoost C API documentation is available. Special thanks to @trivialfis
Performance improvements
- Use feature interaction constraints to narrow split search space (#4341, #4428)
- Additional optimizations for
gpu_hist
(#4248, #4283) - Reduce OpenMP thread launches in
gpu_hist
(#4343) - Additional optimizations for multi-node multi-GPU random forests. (#4238)
- Allocate unique prediction buffer for each input matrix, to avoid re-sizing GPU array (#4275)
- Remove various synchronisations from CUDA API calls (#4205)
- XGBoost4J-Spark
- Allow the user to control whether to cache partitioned training data, to potentially reduce execution time (#4268)
Bug-fixes
- Fix node reuse in
hist
(#4404) - Fix GPU histogram allocation (#4347)
- Fix matrix attributes not sliced (#4311)
- Revise AUC and AUCPR metrics now work with weighted ranking task (#4216, #4436)
- Fix timer invocation for InitDataOnce() in
gpu_hist
(#4206) - Fix R-devel errors (#4251)
- Make gradient update in GPU linear updater thread-safe (#4259)
- Prevent out-of-range access in column matrix (#4231)
- Don't store DMatrix handle in Python object until it's initialized, to improve exception safety (#4317)
- XGBoost4J-Spark
- Fix non-deterministic order within a zipped partition on prediction (#4388)
- Remove race condition on tracker shutdown (#4224)
- Allow set the parameter
maxLeaves
. (#4226) - Allow partial evaluation of dataframe before prediction (#4407)
- Automatically set
maximize_evaluation_metrics
if not explicitly given (#4446)
API changes
- Deprecate
reg:linear
in favor ofreg:squarederror
. (#4267, #4427) - Add attribute getter and setter to the Booster object in XGBoost4J (#4336)
Maintenance: Refactor C++ code for legibility and maintainability
- Fix clang-tidy warnings. (#4149)
- Remove deprecated C APIs. (#4266)
- Use Monitor class to time functions in
hist
. (#4273) - Retire DVec class in favour of c++20 style span for device memory. (#4293)
- Improve HostDeviceVector exception safety (#4301)
Maintenance: testing, continuous integration, build system
- Major refactor of CMakeLists.txt (#4323, #4333, #4453): adopt modern CMake and export XGBoost as a target
- Major improvement in Jenkins CI pipeline (#4234)
- Support CUDA 10.1 (#4223, #4232, #4265, #4468)
- Python wheels are now built with CUDA 9.0, so that JIT is not required on Volta architecture (#4459)
- Integrate with NVTX CUDA profiler (#4205)
- Add a test for cpu predictor using external memory (#4308)
- Refactor tests to get rid of duplication (#4358)
- Remove test dependency on
craigcitro/r-travis
, since it's deprecated (#4353) - Add files from local R build to
.gitignore
(#4346) - Make XGBoost4J compatible with Java 9+ by revising NativeLibLoader (#4351)
- Jenkins build for CUDA 10.0 (#4281)
- Remove remaining
silent
anddebug_verbose
in Python tests (#4299) - Use all cores to build XGBoost4J lib on linux (#4304)
- Upgrade Jenkins Linux build environment to GCC 5.3.1, CMake 3.6.0 (#4306)
- Make CMakeLists.txt compatible with CMake 3.3 (#4420)
- Add OpenMP option in CMakeLists.txt (#4339)
- Get rid of a few trivial compiler warnings (#4312)
- Add external Docker build cache, to speed up builds on Jenkins CI (#4331, #4334, #4458)
- Fix Windows tests (#4403)
- Fix a broken python test (#4395)
- Use a fixed seed to split data in XGBoost4J-Spark tests, for reproducibility (#4417)
- Add additional Python tests to test training under constraints (#4426)
- Enable building with shared NCCL. (#4447)
Usability Improvements, Documentation
- Document limitation of one-split-at-a-time Greedy tree learning heuristic (#4233)
- Update build doc: PyPI wheel now support multi-GPU (#4219)
- Fix docs for
num_parallel_tree
(#4221) - Fix document about
colsample_by*
parameter (#4340) - Make the train and test input with same colnames. (#4329)
- Update R contribute link. (#4236)
- Fix travis R tests (#4277)
- Log version number in crash log in XGBoost4J-Spark (#4271, #4303)
- Allow supression of Rabit output in Booster::train in XGBoost4J (#4262)
- Add tutorial on handling missing values in XGBoost4J-Spark (#4425)
- Fix typos (#4345, #4393, #4432, #4435)
- Added language classifier in setup.py (#4327)
- Added Travis CI badge (#4344)
- Add BentoML to use case section (#4400)
- Remove subtly sexist remark (#4418)
- Add R vignette about parsing JSON dumps (#4439)
Acknowledgement
Contributors: Nan Zhu (@CodingCat), Adam Pocock (@Craigacp), Daniel Hen (@Daniel8hen), Jiaxiang Li (@JiaxiangBU), Rory Mitchell (@RAMitchell), Egor Smirnov (@SmirnovEgorRu), Andy Adinets (@canonizer), Jonas (@elcombato), Harry Braviner (@harrybraviner), Philip Hyunsu Cho (@hcho3), Tong He (@hetong007), James Lamb (@jameslamb), Jean-Francois Zinque (@jeffzi), Yang Yang (@jokerkeny), Mayank Suman (@mayanksuman), jess (@monkeywithacupcake), Hajime Morrita (@omo), Ravi Kalia (@project-delphi), @ras44, Rong Ou (@rongou), Shaochen Shi (@shishaochen), Xu Xiao (@sperlingxx), @sriramch, Jiaming Yuan (@trivialfis), Christopher Suchanek (@wsuchy), Bozhao (@yubozhao)
Reviewers: Nan Zhu (@CodingCat), Adam Pocock (@Craigacp), Daniel Hen (@Daniel8hen), Jiaxiang Li (@JiaxiangBU), Laurae (@Laurae2), Rory Mitchell (@RAMitchell), Egor Smirnov (@SmirnovEgorRu), @alois-bissuel, Andy Adinets (@canonizer), Chen Qin (@chenqin), Harry Braviner (@harrybraviner), Philip Hyunsu Cho (@hcho3), Tong He (@hetong007), @jakirkham, James Lamb (@jameslamb), Julien Schueller (@jschueller), Mayank Suman (@mayanksuman), Hajime Morrita (@omo), Rong Ou (@rongou), Sara Robinson (@sararob), Shaochen Shi (@shishaochen), Xu Xiao (@sperlingxx), @sriramch, Sean Owen (@srowen), Sergei Lebedev (@superbobry), Yuan (Terry) Tang (@terrytangyuan), Theodore Vasiloudis (@thvasilo), Matthew Tovbin (@tovbinm), Jiaming Yuan (@trivialfis), Xin Yin (@xydrolase)
This is a stable release of 0.82 version
This release is packed with many new features and bug fixes.
Roadmap: better performance scaling for multi-core CPUs (#3957)
- Poor performance scaling of the
hist
algorithm for multi-core CPUs has been under investigation (#3810). #3957 marks an important step toward better performance scaling, by using software pre-fetching and replacing STL vectors with C-style arrays. Special thanks to @Laurae2 and @SmirnovEgorRu. - See #3810 for latest progress on this roadmap.
New feature: Distributed Fast Histogram Algorithm (hist
) (#4011, #4102, #4140, #4128)
- It is now possible to run the
hist
algorithm in distributed setting. Special thanks to @CodingCat. The benefits include:- Faster local computation via feature binning
- Support for monotonic constraints and feature interaction constraints
- Simpler codebase than
approx
, allowing for future improvement
- Depth-wise tree growing is now performed in a separate code path, so that cross-node syncronization is performed only once per level.
New feature: Multi-Node, Multi-GPU training (#4095)
- Distributed training is now able to utilize clusters equipped with NVIDIA GPUs. In particular, the rabit AllReduce layer will communicate GPU device information. Special thanks to @mt-jones, @RAMitchell, @rongou, @trivialfis, @canonizer, and @jeffdk.
- Resource management systems will be able to assign a rank for each GPU in the cluster.
- In Dask, users will be able to construct a collection of XGBoost processes over an inhomogeneous device cluster (i.e. workers with different number and/or kinds of GPUs).
New feature: Multiple validation datasets in XGBoost4J-Spark (#3904, #3910)
- You can now track the performance of the model during training with multiple evaluation datasets. By specifying
eval_sets
or callsetEvalSets
over aXGBoostClassifier
orXGBoostRegressor
, you can pass in multiple evaluation datasets typed as aMap
fromString
toDataFrame
. Special thanks to @CodingCat. - See the usage of multiple validation datasets here
New feature: Additional metric functions for GPUs (#3952)
- Element-wise metrics have been ported to GPU:
rmse
,mae
,logloss
,poisson-nloglik
,gamma-deviance
,gamma-nloglik
,error
,tweedie-nloglik
. Special thanks to @trivialfis and @RAMitchell. - With supported metrics, XGBoost will select the correct devices based on your system and
n_gpus
parameter.
New feature: Column sampling at individual nodes (splits) (#3971)
- Columns (features) can now be sampled at individual tree nodes, in addition to per-tree and per-level sampling. To enable per-node sampling, set
colsample_bynode
parameter, which represents the fraction of columns sampled at each node. This parameter is set to 1.0 by default (i.e. no sampling per node). Special thanks to @canonizer. - The
colsample_bynode
parameter works cumulatively with othercolsample_by*
parameters: for example,{'colsample_bynode':0.5, 'colsample_bytree':0.5}
with 100 columns will give 25 features to choose from at each split.
Major API change: consistent logging level via verbosity
(#3982, #4002, #4138)
- XGBoost now allows fine-grained control over logging. You can set
verbosity
to 0 (silent), 1 (warning), 2 (info), and 3 (debug). This is useful for controlling the amount of logging outputs. Special thanks to @trivialfis. - Parameters
silent
anddebug_verbose
are now deprecated. - Note: Sometimes XGBoost tries to change configurations based on heuristics, which is displayed as warning message. If there's unexpected behaviour, please try to increase value of verbosity.
Major bug fix: external memory (#4040, #4193)
- Clarify object ownership in multi-threaded prefetcher, to avoid memory error.
- Correctly merge two column batches (which uses CSC layout).
- Add unit tests for external memory.
- Special thanks to @trivialfis and @hcho3.
Major bug fix: early stopping fixed in XGBoost4J and XGBoost4J-Spark (#3928, #4176)
- Early stopping in XGBoost4J and XGBoost4J-Spark is now consistent with its counterpart in the Python package. Training stops if the current iteration is
earlyStoppingSteps
away from the best iteration. If there are multiple evaluation sets, only the last one is used to determinate early stop. - See the updated documentation here
- Special thanks to @CodingCat, @yanboliang, and @mingyang.
Major bug fix: infrequent features should not crash distributed training (#4045)
- For infrequently occuring features, some partitions may not get any instance. This scenario used to crash distributed training due to mal-formed ranges. The problem has now been fixed.
- In practice, one-hot-encoded categorical variables tend to produce rare features, particularly when the cardinality is high.
- Special thanks to @CodingCat.
Performance improvements
- Faster, more space-efficient radix sorting in
gpu_hist
(#3895) - Subtraction trick in histogram calculation in
gpu_hist
(#3945) - More performant re-partition in XGBoost4J-Spark (#4049)
Bug-fixes
- Fix semantics of
gpu_id
when running multiple XGBoost processes on a multi-GPU machine (#3851) - Fix page storage path for external memory on Windows (#3869)
- Fix configuration setup so that DART utilizes GPU (#4024)
- Eliminate NAN values from SHAP prediction (#3943)
- Prevent empty quantile sketches in
hist
(#4155) - Enable running objectives with 0 GPU (#3878)
- Parameters are no longer dependent on system locale (#3891, #3907)
- Use consistent data type in the GPU coordinate descent code (#3917)
- Remove undefined behavior in the CLI config parser on the ARM platform (#3976)
- Initialize counters in GPU AllReduce (#3987)
- Prevent deadlocks in GPU AllReduce (#4113)
- Load correct values from sliced NumPy arrays (#4147, #4165)
- Fix incorrect GPU device selection (#4161)
- Make feature binning logic in
hist
aware of query groups when running a ranking task (#4115). For ranking task, query groups are weighted, not individual instances. - Generate correct C++ exception type for
LOG(FATAL)
macro (#4159) - Python package
- Python package should run on system without
PATH
environment variable (#3845) - Fix
coef_
andintercept_
signature to be compatible withsklearn.RFECV
(#3873) - Use UTF-8 encoding in Python package README, to support non-English locale (#3867)
- Add AUC-PR to list of metrics to maximize for early stopping (#3936)
- Allow loading pickles without
self.booster
attribute, for backward compatibility (#3938, #3944) - White-list DART for feature importances (#4073)
- Update usage of h2oai/datatable (#4123)
- Python package should run on system without
- XGBoost4J-Spark
- Address scalability issue in prediction (#4033)
- Enforce the use of per-group weights for ranking task (#4118)
- Fix vector size of
rawPredictionCol
inXGBoostClassificationModel
(#3932) - More robust error handling in Spark tracker (#4046, #4108)
- Fix return type of
setEvalSets
(#4105) - Return correct value of
getMaxLeaves
(#4114)
API changes
- Add experimental parameter
single_precision_histogram
to use single-precision histograms for thegpu_hist
algorithm (#3965) - Python package
- Add option to select type of feature importances in the scikit-learn inferface (#3876)
- Add
trees_to_df()
method to dump decision trees as Pandas data frame (#4153) - Add options to control node shapes in the GraphViz plotting function (#3859)
- Add
xgb_model
option toXGBClassifier
, to load previously saved model (#4092) - Passing lists into
DMatrix
is now deprecated (#3970)
- XGBoost4J
- Support multiple feature importance features (#3801)
Maintenance: Refactor C++ code for legibility and maintainability
- Refactor
hist
algorithm code and add unit tests (#3836) - Minor refactoring of split evaluator in
gpu_hist
(#3889) - Removed unused leaf vector field in the tree model (#3989)
- Simplify the tree representation by combining
TreeModel
andRegTree
classes (#3995) - Simplify and harden tree expansion code (#4008, #4015)
- De-duplicate parameter classes in the linear model algorithms (#4013)
- Robust handling of ranges with C++20 span in
gpu_exact
andgpu_coord_descent
(#4020, #4029) - Simplify tree training code (#3825). Also use Span class for robust handling of ranges.
Maintenance: testing, continuous integration, build system
- Disallow
std::regex
since it's not supported by GCC 4.8.x (#3870) - Add multi-GPU tests for coordinate descent algorithm for linear models (#3893, #3974)
- Enforce naming style in Python lint (#3896)
- Refactor Python tests (#3897, #3901): Use pytest exclusively, display full trace upon failure
- Address
DeprecationWarning
when using Python collections (#3909) - Use correct group for maven site plugin (#3937)
- Jenkins CI is now using on-demand EC2 instances exclusively, due to unreliability of Spot instances (#3948)
- Better GPU performance logging (#3945)
- Fix GPU tests on machines with only 1 GPU (#4053)
- Eliminate CRAN check warnings and notes (#3988)
- Add unit tests for tree serialization (#3989)
- Add unit tests for tree fitting functions in
hist
(#4155) - Add a unit test for
gpu_exact
algorithm (#4020) - Correct JVM CMake GPU flag (#4071)
- Fix failing Travis CI on Mac (#4086)
- Speed up Jenkins by not compiling CMake (#4099)
- Analyze C++ and CUDA code using clang-tidy, as part of Jenkins CI pipeline (#4034)
- Fix broken R test: Install Homebrew...
This is a stable release of 0.81 version
New feature: feature interaction constraints
- Users are now able to control which features (independent variables) are allowed to interact by specifying feature interaction constraints (#3466).
- Tutorial is available, as well as R and Python examples.
New feature: learning to rank using scikit-learn interface
- Learning to rank task is now available for the scikit-learn interface of the Python package (#3560, #3848). It is now possible to integrate the XGBoost ranking model into the scikit-learn learning pipeline.
- Examples of using
XGBRanker
class is found at demo/rank/rank_sklearn.py.
New feature: R interface for SHAP interactions
- SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. Previously, this feature was only available from the Python package; now it is available from the R package as well (#3636).
New feature: GPU predictor now use multiple GPUs to predict
- GPU predictor is now able to utilize multiple GPUs at once to accelerate prediction (#3738)
New feature: Scale distributed XGBoost to large-scale clusters
- Fix OS file descriptor limit assertion error on large cluster (#3835, dmlc/rabit#73) by replacing
select()
based AllReduce/Broadcast withpoll()
based implementation. - Mitigate tracker "thundering herd" issue on large cluster. Add exponential backoff retry when workers connect to tracker.
- With this change, we were able to scale to 1.5k executors on a 12 billion row dataset after some tweaks here and there.
New feature: Additional objective functions for GPUs
- New objective functions ported to GPU:
hinge
,multi:softmax
,multi:softprob
,count:poisson
,reg:gamma
,reg:tweedie
. - With supported objectives, XGBoost will select the correct devices based on your system and
n_gpus
parameter.
Major bug fix: learning to rank with XGBoost4J-Spark
- Previously,
repartitionForData
would shuffle data and lose ordering necessary for ranking task. - To fix this issue, data points within each RDD partition is explicitly group by their group (query session) IDs (#3654). Also handle empty RDD partition carefully (#3750).
Major bug fix: early stopping fixed in XGBoost4J-Spark
- Earlier implementation of early stopping had incorrect semantics and didn't let users to specify direction for optimizing (maximize / minimize)
- A parameter
maximize_evaluation_metrics
is defined so as to tell whether a metric should be maximized or minimized as part of early stopping criteria (#3808). Also early stopping now has correct semantics.
API changes
- Column sampling by level (
colsample_bylevel
) is now functional forhist
algorithm (#3635, #3862) - GPU tag
gpu:
for regression objectives are now deprecated. XGBoost will select the correct devices automatically (#3643) - Add
disable_default_eval_metric
parameter to disable default metric (#3606) - Experimental AVX support for gradient computation is removed (#3752)
- XGBoost4J-Spark
- Add
rank:ndcg
andrank:map
to supported objectives (#3697)
- Add
- Python package
- Add
callbacks
argument tofit()
function of sciki-learn API (#3682) - Add
XGBRanker
to scikit-learn interface (#3560, #3848) - Add
validate_features
argument topredict()
function of scikit-learn API (#3653) - Allow scikit-learn grid search over parameters specified as keyword arguments (#3791)
- Add
coef_
andintercept_
as properties of scikit-learn wrapper (#3855). Some scikit-learn functions expect these properties.
- Add
Performance improvements
- Address very high GPU memory usage for large data (#3635)
- Fix performance regression within
EvaluateSplits()
ofgpu_hist
algorithm. (#3680)
Bug-fixes
- Fix a problem in GPU quantile sketch with tiny instance weights. (#3628)
- Fix copy constructor for
HostDeviceVectorImpl
to prevent dangling pointers (#3657) - Fix a bug in partitioned file loading (#3673)
- Fixed an uninitialized pointer in
gpu_hist
(#3703) - Reshared data among GPUs when number of GPUs is changed (#3721)
- Add back
max_delta_step
to split evaluation (#3668) - Do not round up integer thresholds for integer features in JSON dump (#3717)
- Use
dmlc::TemporaryDirectory
to handle temporaries in cross-platform way (#3783) - Fix accuracy problem with
gpu_hist
whenmin_child_weight
andlambda
are set to 0 (#3793) - Make sure that
tree_method
parameter is recognized and not silently ignored (#3849) - XGBoost4J-Spark
- Make sure
thresholds
are considered when executingpredict()
method (#3577) - Avoid losing precision when computing probabilities by converting to
Double
early (#3576) getTreeLimit()
should returnInt
(#3602)- Fix checkpoint serialization on HDFS (#3614)
- Throw
ControlThrowable
instead ofInterruptedException
so that it is properly re-thrown (#3632) - Remove extraneous output to stdout (#3665)
- Allow specification of task type for custom objectives and evaluations (#3646)
- Fix distributed updater check (#3739)
- Fix issue when spark job execution thread cannot return before we execute
first()
(#3758)
- Make sure
- Python package
- R package
Maintenance: testing, continuous integration, build system
- Add sanitizers tests to Travis CI (#3557)
- Add NumPy, Matplotlib, Graphviz as requirements for doc build (#3669)
- Comply with CRAN submission policy (#3660, #3728)
- Remove copy-paste error in JVM test suite (#3692)
- Disable flaky tests in
R-package/tests/testthat/test_update.R
(#3723) - Make Python tests compatible with scikit-learn 0.20 release (#3731)
- Separate out restricted and unrestricted tasks, so that pull requests don't build downloadable artifacts (#3736)
- Add multi-GPU unit test environment (#3741)
- Allow plug-ins to be built by CMake (#3752)
- Test wheel compatibility on CPU containers for pull requests (#3762)
- Fix broken doc build due to Matplotlib 3.0 release (#3764)
- Produce
xgboost.so
for XGBoost-R on Mac OSX, so thatmake install
works (#3767) - Retry Jenkins CI tests up to 3 times to improve reliability (#3769, #3769, #3775, #3776, #3777)
- Add basic unit tests for
gpu_hist
algorithm (#3785) - Fix Python environment for distributed unit tests (#3806)
- Test wheels on CUDA 10.0 container for compatibility (#3838)
- Fix JVM doc build (#3853)
Maintenance: Refactor C++ code for legibility and maintainability
- Merge generic device helper functions into
GPUSet
class (#3626) - Re-factor column sampling logic into
ColumnSampler
class (#3635, #3637) - Replace
std::vector
withHostDeviceVector
inMetaInfo
andSparsePage
(#3446) - Simplify
DMatrix
class (#3395) - De-duplicate CPU/GPU code using
Transform
class (#3643, #3751) - Remove obsoleted
QuantileHistMaker
class (#3761) - Remove obsoleted
NoConstraint
class (#3792)
Other Features
- C++20-compliant Span class for safe pointer indexing (#3548, #3588)
- Add helper functions to manipulate multiple GPU devices (#3693)
- XGBoost4J-Spark
- Allow specifying host ip from the
xgboost-tracker.properties file
(#3833). This comes in handy whenhosts
files doesn't correctly define localhost.
- Allow specifying host ip from the
Usability Improvements
- Add reference to GitHub repository in
pom.xml
of JVM packages (#3589) - Add R demo of multi-class classification (#3695)
- Document JSON dump functionality (#3600, #3603)
- Document CUDA requirement and lack of external memory for GPU algorithms (#3624)
- Document LambdaMART objectives, both pairwise and listwise (#3672)
- Document
aucpr
evaluation metric (#3687) - Document gblinear parameters:
feature_selector
andtop_k
(#3780) - Add instructions for using MinGW-built XGBoost with Python. (#3774)
- Removed nonexistent parameter
use_buffer
from documentation (#3610) - Update Python API doc to include all classes and members (#3619, #3682)
- Fix typos and broken links in documentation (#3618, #3640, #3676, #3713, #3759, #3784, #3843, #3852)
- Binary classification demo should produce LIBSVM with 0-based indexing (#3652)
- Process data once for Python and CLI examples of learning to rank (#3666)
- Include full text of Apache 2.0 license in the repository (#3698)
- Save predictor parameters in model file (#3856)
- JVM packages
- Python package
- Document that custom objective can't contain colon (:) (#3601)
- Show a better error message for failed library loading (#3690)
- Document that feature importance is unavailable for non-tree learners (#3765)
- Document behavior of
get_fscore()
for zero-importance features (#3763) - Recommend pickling as the way to s...
This is a stable release of 0.80 version
- JVM packages received a major upgrade: To consolidate the APIs and improve the user experience, we refactored the design of XGBoost4J-Spark in a significant manner. (#3387)
- Consolidated APIs: It is now much easier to integrate XGBoost models into a Spark ML pipeline. Users can control behaviors like output leaf prediction results by setting corresponding column names. Training is now more consistent with other Estimators in Spark MLLIB: there is now one single method
fit()
to train decision trees. - Better user experience: we refactored the parameters relevant modules in XGBoost4J-Spark to provide both camel-case (Spark ML style) and underscore (XGBoost style) parameters
- A brand-new tutorial is available for XGBoost4J-Spark.
- Latest API documentation is now hosted at https://xgboost.readthedocs.io/.
- Consolidated APIs: It is now much easier to integrate XGBoost models into a Spark ML pipeline. Users can control behaviors like output leaf prediction results by setting corresponding column names. Training is now more consistent with other Estimators in Spark MLLIB: there is now one single method
- XGBoost documentation now keeps track of multiple versions:
- Latest master: https://xgboost.readthedocs.io/en/latest
- 0.80 stable: https://xgboost.readthedocs.io/en/release_0.80
- 0.72 stable: https://xgboost.readthedocs.io/en/release_0.72
- Support for per-group weights in ranking objective (#3379)
- Fix inaccurate decimal parsing (#3546)
- New functionality
- Query ID column support in LIBSVM data files (#2749). This is convenient for performing ranking task in distributed setting.
- Hinge loss for binary classification (
binary:hinge
) (#3477) - Ability to specify delimiter and instance weight column for CSV files (#3546)
- Ability to use 1-based indexing instead of 0-based (#3546)
- GPU support
- Quantile sketch, binning, and index compression are now performed on GPU, eliminating PCIe transfer for 'gpu_hist' algorithm (#3319, #3393)
- Upgrade to NCCL2 for multi-GPU training (#3404).
- Use shared memory atomics for faster training (#3384).
- Dynamically allocate GPU memory, to prevent large allocations for deep trees (#3519)
- Fix memory copy bug for large files (#3472)
- Python package
- Importing data from Python datatable (#3272)
- Pre-built binary wheels available for 64-bit Linux and Windows (#3424, #3443)
- Add new importance measures 'total_gain', 'total_cover' (#3498)
- Sklearn API now supports saving and loading models (#3192)
- Arbitrary cross validation fold indices (#3353)
predict()
function in Sklearn API usesbest_ntree_limit
if available, to make early stopping easier to use (#3445)- Informational messages are now directed to Python's
print()
rather than standard output (#3438). This way, messages appear inside Jupyter notebooks.
- R package
- Oracle Solaris support, per CRAN policy (#3372)
- JVM packages
- Refactored C++ DMatrix class for simplicity and de-duplication (#3301)
- Refactored C++ histogram facilities (#3564)
- Refactored constraints / regularization mechanism for split finding (#3335, #3429). Users may specify an elastic net (L2 + L1 regularization) on leaf weights as well as monotonic constraints on test nodes. The refactor will be useful for a future addition of feature interaction constraints.
- Statically link
libstdc++
for MinGW32 (#3430) - Enable loading from
group
,base_margin
andweight
(see here) for Python, R, and JVM packages (#3431) - Fix model saving for
count:possion
so thatmax_delta_step
doesn't get truncated (#3515) - Fix loading of sparse CSC matrix (#3553)
- Fix incorrect handling of
base_score
parameter for Tweedie regression (#3295)