All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Updated
tables
dependency to v3.9.x, which fixes issues with installation of the package. This requires Python 3.9 or above, however, so we require that here as well.
- The
Boot
class now hassave
andload
methods, which usesjoblib
under the hood.
- The dataset-specific dependencies are now put in a separate
datasets
extra, to make the core package more lean. You can install the package with all the dependencies usingpip install doubt[datasets]
.
- Now also allows
pandas
2.x.x versions.
- Updated
urllib3
to 2.0.7 due to a security update.
- Now saves the models during training with a
Boot
and reuses those during inference, speeding up inference. Thanks to @andrepugni for this contribution!
- Downgraded
tables
to 3.7.x to fix an installation bug. - Downgraded
scikit-learn
to >=1.1,<1.3, as the decision tree API in v1.3 is incompatible with the previous ones. This will be dealt with separately in the future.
### Fixed
- When
return_all
is specified inBoot.predict
and multiple samples have been inputted, then it now returns an array of shape(num_samples, num_boots)
rather than the previous(num_boots, num_samples)
.
- Added a
return_all
argument to theBoot.predict
method, which will override theuncertainty
andquantiles
arguments and return the raw bootstrap distribution over which the quantiles would normally be calculated. This allows other uses of the bootstrap distribution than for computing prediction intervals.
- Previously, all the trees in
QuantileRegressionForest
were the same. This has now been fixed. Thanks to @gugerlir for noticing this! - The
random_seed
argument inQuantileRegressionTree
andQuantileRegressionForest
has been changed torandom_state
to be consistent withDecisionTreeRegressor
, and to avoid anAttributeError
when accessing the estimators of aQuantileRegressionForest
.
### Added
- The
QuantileRegressionForest
now has afeature_importances_
attribute.
Boot.fit
andBoot.predict
methods are now parallelised, speeding up both training and prediction time a bit.- Updated
README
to include generalised linear models, rather than only mentioning linear regression.
- Removed mention of
PyTorch
model support, as that has not been implemented yet
- The
verbose
argument toQuantileRegressionForest
also displays a progress bar during inference now.
- Fixed
QuantileRegressionForest.__repr__
.
- Added a
verbose
argument toQuantileRegressionForest
, which displays a progress bar during training.
- The default value of
QuantileRegressionForest.min_samples_leaf
has changed from 1 to 5, to ensure that the quantiles can always be computed sensibly with the default setting.
- The
logkow
feature in theFishBioconcentration
dataset is now converted into a float, rather than a string. - Typo in example script in
README
- Added
__repr__
toQuantileRegressor
QuantileLinearRegression
has been removed, andQuantileRegressor
should be used instead
- Added
quantiles
argument toQuantileRegressionTree
andBoot
, as an alternative to specifyinguncertainty
, if you want to return specific quantiles. - Added general
QuantileRegressor
, which can wrap any general linear model for quantile predictions.
- The predictions in
Boot.predict
were based on a fitting of the model to one of the bootstrapped datasets. It is now based on the entire dataset, which in particular means that the predictions will be deterministic. The intervals will still be stochastic, as they should be.
- Updated Numpy random number generation to their new API
- All residuals in
Boot
are now calculated during fitting, which should decrease the prediction times a tiny bit.
- Package no longer relies on
statsmodels
- A handful of docstring style changes to yield a cleaner Sphinx documentation
- Sphinx documentation
- Implemented
score
method toQuantileLinearRegression
, which either outputs the mean negative pinball loss function, or the R^2 value - Added more documentation to
QuantileLinearRegression
- Outputs more informative error message when a singular feature matrix is
being used with
QuantileLinearRegression
- Datasets look prettier in notebooks now
- Removed docstring comments about closing datasets after use, as this is automatic
- Small mistake in the computation of the prediction intervals in
Boot.predict
, where the definition ofgeneralisation
should be the difference of the means of the residuals, and not the difference between the individual quantiles. Makes a very tiny difference to the prediction intervals. Thanks to Bryan Shalloway for catching this mistake.
Boot.__repr__
was not working properly
- Added proper
__repr__
descriptions to all models
- Changed the ordering of
Dataset.split
toX_train
,X_test
,y_train
andy_test
, to agree withscikit-learn
- Moved some
Dataset
attributes to the private API