Release This is a stable release of 0.80 version · dmlc/xgboost

JVM packages received a major upgrade: To consolidate the APIs and improve the user experience, we refactored the design of XGBoost4J-Spark in a significant manner. (#3387)
- Consolidated APIs: It is now much easier to integrate XGBoost models into a Spark ML pipeline. Users can control behaviors like output leaf prediction results by setting corresponding column names. Training is now more consistent with other Estimators in Spark MLLIB: there is now one single method fit() to train decision trees.
- Better user experience: we refactored the parameters relevant modules in XGBoost4J-Spark to provide both camel-case (Spark ML style) and underscore (XGBoost style) parameters
- A brand-new tutorial is available for XGBoost4J-Spark.
- Latest API documentation is now hosted at https://xgboost.readthedocs.io/.
XGBoost documentation now keeps track of multiple versions:
- Latest master: https://xgboost.readthedocs.io/en/latest
- 0.80 stable: https://xgboost.readthedocs.io/en/release_0.80
- 0.72 stable: https://xgboost.readthedocs.io/en/release_0.72
Support for per-group weights in ranking objective (#3379)
Fix inaccurate decimal parsing (#3546)
New functionality
- Query ID column support in LIBSVM data files (#2749). This is convenient for performing ranking task in distributed setting.
- Hinge loss for binary classification (binary:hinge) (#3477)
- Ability to specify delimiter and instance weight column for CSV files (#3546)
- Ability to use 1-based indexing instead of 0-based (#3546)
GPU support
- Quantile sketch, binning, and index compression are now performed on GPU, eliminating PCIe transfer for 'gpu_hist' algorithm (#3319, #3393)
- Upgrade to NCCL2 for multi-GPU training (#3404).
- Use shared memory atomics for faster training (#3384).
- Dynamically allocate GPU memory, to prevent large allocations for deep trees (#3519)
- Fix memory copy bug for large files (#3472)
Python package
- Importing data from Python datatable (#3272)
- Pre-built binary wheels available for 64-bit Linux and Windows (#3424, #3443)
- Add new importance measures 'total_gain', 'total_cover' (#3498)
- Sklearn API now supports saving and loading models (#3192)
- Arbitrary cross validation fold indices (#3353)
- predict() function in Sklearn API uses best_ntree_limit if available, to make early stopping easier to use (#3445)
- Informational messages are now directed to Python's print() rather than standard output (#3438). This way, messages appear inside Jupyter notebooks.
R package
- Oracle Solaris support, per CRAN policy (#3372)
JVM packages
- Single-instance prediction (#3464)
- Pre-built JARs are now available from Maven Central (#3401)
- Add NULL pointer check (#3021)
- Consider spark.task.cpus when controlling parallelism (#3530)
- Handle missing values in prediction (#3529)
- Eliminate outputs of System.out (#3572)
Refactored C++ DMatrix class for simplicity and de-duplication (#3301)
Refactored C++ histogram facilities (#3564)
Refactored constraints / regularization mechanism for split finding (#3335, #3429). Users may specify an elastic net (L2 + L1 regularization) on leaf weights as well as monotonic constraints on test nodes. The refactor will be useful for a future addition of feature interaction constraints.
Statically link libstdc++ for MinGW32 (#3430)
Enable loading from group, base_margin and weight (see here) for Python, R, and JVM packages (#3431)
Fix model saving for count:possion so that max_delta_step doesn't get truncated (#3515)
Fix loading of sparse CSC matrix (#3553)
Fix incorrect handling of base_score parameter for Tweedie regression (#3295)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This is a stable release of 0.80 version