Skip to content

This is a stable release of 0.80 version

Compare
Choose a tag to compare
@hcho3 hcho3 released this 13 Aug 08:41
· 3806 commits to master since this release
96826a3
  • JVM packages received a major upgrade: To consolidate the APIs and improve the user experience, we refactored the design of XGBoost4J-Spark in a significant manner. (#3387)
    • Consolidated APIs: It is now much easier to integrate XGBoost models into a Spark ML pipeline. Users can control behaviors like output leaf prediction results by setting corresponding column names. Training is now more consistent with other Estimators in Spark MLLIB: there is now one single method fit() to train decision trees.
    • Better user experience: we refactored the parameters relevant modules in XGBoost4J-Spark to provide both camel-case (Spark ML style) and underscore (XGBoost style) parameters
    • A brand-new tutorial is available for XGBoost4J-Spark.
    • Latest API documentation is now hosted at https://xgboost.readthedocs.io/.
  • XGBoost documentation now keeps track of multiple versions:
  • Support for per-group weights in ranking objective (#3379)
  • Fix inaccurate decimal parsing (#3546)
  • New functionality
    • Query ID column support in LIBSVM data files (#2749). This is convenient for performing ranking task in distributed setting.
    • Hinge loss for binary classification (binary:hinge) (#3477)
    • Ability to specify delimiter and instance weight column for CSV files (#3546)
    • Ability to use 1-based indexing instead of 0-based (#3546)
  • GPU support
    • Quantile sketch, binning, and index compression are now performed on GPU, eliminating PCIe transfer for 'gpu_hist' algorithm (#3319, #3393)
    • Upgrade to NCCL2 for multi-GPU training (#3404).
    • Use shared memory atomics for faster training (#3384).
    • Dynamically allocate GPU memory, to prevent large allocations for deep trees (#3519)
    • Fix memory copy bug for large files (#3472)
  • Python package
    • Importing data from Python datatable (#3272)
    • Pre-built binary wheels available for 64-bit Linux and Windows (#3424, #3443)
    • Add new importance measures 'total_gain', 'total_cover' (#3498)
    • Sklearn API now supports saving and loading models (#3192)
    • Arbitrary cross validation fold indices (#3353)
    • predict() function in Sklearn API uses best_ntree_limit if available, to make early stopping easier to use (#3445)
    • Informational messages are now directed to Python's print() rather than standard output (#3438). This way, messages appear inside Jupyter notebooks.
  • R package
    • Oracle Solaris support, per CRAN policy (#3372)
  • JVM packages
    • Single-instance prediction (#3464)
    • Pre-built JARs are now available from Maven Central (#3401)
    • Add NULL pointer check (#3021)
    • Consider spark.task.cpus when controlling parallelism (#3530)
    • Handle missing values in prediction (#3529)
    • Eliminate outputs of System.out (#3572)
  • Refactored C++ DMatrix class for simplicity and de-duplication (#3301)
  • Refactored C++ histogram facilities (#3564)
  • Refactored constraints / regularization mechanism for split finding (#3335, #3429). Users may specify an elastic net (L2 + L1 regularization) on leaf weights as well as monotonic constraints on test nodes. The refactor will be useful for a future addition of feature interaction constraints.
  • Statically link libstdc++ for MinGW32 (#3430)
  • Enable loading from group, base_margin and weight (see here) for Python, R, and JVM packages (#3431)
  • Fix model saving for count:possion so that max_delta_step doesn't get truncated (#3515)
  • Fix loading of sparse CSC matrix (#3553)
  • Fix incorrect handling of base_score parameter for Tweedie regression (#3295)