Skip to content

Commit

Permalink
added rank minrelation coefficient, add blomqvist beta, add wisorized…
Browse files Browse the repository at this point in the history
… correlation, add cumulative skew, changed docs, clean up tests
  • Loading branch information
glevv committed Dec 23, 2023
1 parent d5158cc commit ad5e716
Show file tree
Hide file tree
Showing 14 changed files with 242 additions and 54 deletions.
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@ repository-code: 'https://github.com/glevv/obscure_stats'
repository-artifact: 'https://pypi.org/project/obscure_stats'
abstract: Collection of lesser-known statistical measures
license: MIT
version: 0.1.7
version: 0.1.8
date-released: '2023-10-21'
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,12 @@
* Area Under the Skewness Curve (weighted and unweighted);
* Bickel Mode Skewness Coefficient;
* Bowley Skewness Coefficient;
* Cumulative Skewness Coefficient;
* Forhad-Shorna Rank Skewness Coefficient;
* Groeneveld Skewness Coefficient;
* Hossain-Adnan Skewness Coefficient;
* Kelly Skewness Coefficient;
* L-Skewness;
* L-Skewness Coefficient;
* Medeen Skewness Coefficient;
* Pearson Median Skewness Coefficient;
* Pearson Mode Skewness Coefficient.
Expand All @@ -53,18 +54,21 @@
* Moors Octile Kurtosis;
* Reza-Ma Kurtosis.
- Collection of measures of association - `obscure_stats/association`:
* Chatterjee Xi correlation Coefficient (original and symmetric versions);
* Blomqvist's Beta;
* Chatterjee Xi Correlation Coefficient (original and symmetric versions);
* Concordance Correlation Coefficient;
* Concordance Rate;
* Rank Minrelation Coefficient;
* Tanimoto Similarity;
* Winsorized Correlation Coefficient;
* Zhang I Correlation Coefficient.
- Collection of measures of qualitative variation - `obscure_stats/variation`:
* AVDev;
* B Index;
* Extropy;
* Gibbs M1;
* Gibbs M2;
* ModVR;
* Negative Extropy;
* RanVR.

## Installation
Expand Down
11 changes: 5 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,26 +1,25 @@
[tool.poetry]
name = "obscure_stats"
version = "0.1.7"
version = "0.1.8"
description = "Collection of lesser-known statistical functions"
authors = ["Gleb Levitski"]
authors = ["Hleb Levitski"]
readme = "README.md"
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Science/Research",
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3 :: Only",
"Topic :: Software Development",
"Topic :: Scientific/Engineering",
"Topic :: Software Development",
"Typing :: Typed",
"Operating System :: OS Independent",
"Natural Language :: English",
]

[tool.poetry.dependencies]
Expand Down
6 changes: 6 additions & 0 deletions src/obscure_stats/association/__init__.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
"""Association module."""

from .association import (
blomqvistbeta,
chatterjeexi,
concordance_corrcoef,
concordance_rate,
rank_minrelation_coefficient,
symmetric_chatterjeexi,
tanimoto_similarity,
winsorized_correlation,
zhangi,
)

__all__ = [
"blomqvistbeta",
"chatterjeexi",
"concordance_corrcoef",
"concordance_rate",
"rank_minrelation_coefficient",
"symmetric_chatterjeexi",
"tanimoto_similarity",
"winsorized_correlation",
"zhangi",
]
155 changes: 140 additions & 15 deletions src/obscure_stats/association/association.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,9 @@ def chatterjeexi(x: np.ndarray, y: np.ndarray) -> float:
Parameters
----------
x : array_like
Measured values.
Input array.
y : array_like
Target values.
Input array.
Returns
-------
Expand Down Expand Up @@ -118,9 +118,9 @@ def concordance_corrcoef(x: np.ndarray, y: np.ndarray) -> float:
Parameters
----------
x : array_like
Measured values.
Input array.
y : array_like
Reference values.
Input array.
Returns
-------
Expand Down Expand Up @@ -162,14 +162,14 @@ def concordance_rate(
Parameters
----------
x : array_like
Measured values.
Input array.
y : array_like
Reference values.
Input array.
Returns
-------
cr : float.
The value of the quadrant count ratio.
The value of the concordance rate.
References
----------
Expand Down Expand Up @@ -213,14 +213,14 @@ def symmetric_chatterjeexi(x: np.ndarray, y: np.ndarray) -> float:
Parameters
----------
x : array_like
Measured values.
Input array.
y : array_like
Target values.
Input array.
Returns
-------
sxi : float.
The value of the xi correlation coefficient.
The value of the symmetric xi correlation coefficient.
References
----------
Expand Down Expand Up @@ -266,9 +266,9 @@ def zhangi(x: np.ndarray, y: np.ndarray) -> float:
Parameters
----------
x : array_like
Measured values.
Input array.
y : array_like
Reference values.
Input array.
Returns
-------
Expand Down Expand Up @@ -309,14 +309,14 @@ def tanimoto_similarity(x: np.ndarray, y: np.ndarray) -> float:
Parameters
----------
x : array_like
Measured values.
Input array.
y : array_like
Reference values.
Input array.
Returns
-------
ts : float.
The value of the tanimoto similarity measure
The value of the Tanimoto similarity measure
References
----------
Expand All @@ -336,3 +336,128 @@ def tanimoto_similarity(x: np.ndarray, y: np.ndarray) -> float:
xx = np.mean(x**2)
yy = np.mean(y**2)
return xy / (xx + yy - xy)


def blomqvistbeta(x: np.ndarray, y: np.ndarray) -> float:
"""Calculate Blomqvist's beta.
Also known as medial correlation. It is similar to Spearman Rho
and Kendall Tau correlations, but have some advantages over them.
Parameters
----------
x : array_like
Input array.
y : array_like
Input array.
Returns
-------
bb : float.
The value of the Blomqvist's beta.
References
----------
Blomqvist, N. (1950).
On a measure of dependence between two random variables.
Annals of Mathematical Statistics, 21, 593-600.
Schmid, F.; Schmidt, R. (2007).
Nonparametric Inference on Multivariate Versions of
Blomqvist's Beta and Related Measures of Tail Dependence.
Metrika, 66(3), 323-354.
See Also
--------
scipy.stats.spearmanr - Spearman R coefficient.
scipy.stats.kendalltau - Kendall Tau coefficient.
"""
if _check_arrays(x, y):
return np.nan
x, y = _prep_arrays(x, y)
med_x = np.median(x)
med_y = np.median(y)
return np.mean(np.sign((x - med_x) * (y - med_y)))


def winsorized_correlation(x: np.ndarray, y: np.ndarray, k: float = 0.1) -> float:
"""Calculate winsorized correlation coefficient.
This correlation is a robust alternative of the Pearson correlation.
Parameters
----------
x : array_like
Input array.
y : array_like
Input array.
k : float
The percentages of values to winsorize on each side of the arrays.
Returns
-------
wcr : float.
The value of the winsorized correlation.
References
----------
Wilcox, R. R. (1993).
Some Results on a Winsorized Correlation Coefficient.
British Journal of Mathematical and Statistical Psychology, 46, 339-349.
See Also
--------
scipy.stats.pearsonr - Pearson correlation coefficient.
"""
if _check_arrays(x, y):
return np.nan
x, y = _prep_arrays(x, y)
x_w = stats.mstats.winsorize(x, (k, k))
y_w = stats.mstats.winsorize(y, (k, k))
return np.corrcoef(x_w, y_w)[0, 1]


def rank_minrelation_coefficient(x: np.ndarray, y: np.ndarray) -> float:
"""Calculate rank minrelation coefficient.
This measure estimates p(y > x) when x and y are continuous random variables.
In short, if a variable x exhibits a minrelation to y then,
as x increases, y is likely to increases too.
Parameters
----------
x : array_like
Input array.
y : array_like
Input array.
Returns
-------
rmc : float.
The value of the rank minrelation coefficient.
References
----------
Meyer, P. E. (2013).
A Rank Minrelation-Majrelation Coefficient.
arXiv preprint arXiv:1305.2038.
Notes
-----
This measure is assymetric: (x, y) != (y, x).
See Also
--------
Concordance rate.
Concordance correlation coefficient.
"""
if _check_arrays(x, y):
return np.nan
x, y = _prep_arrays(x, y)
n_sq = len(x) ** 2
rank_x_inc = (np.argsort(x) + 1) ** 2 / n_sq - 0.5
rank_y_inc = (np.argsort(y) + 1) ** 2 / n_sq - 0.5
rank_y_dec = 0.5 - (np.argsort(-y) + 1) ** 2 / n_sq
lower = np.sum((-rank_x_inc < rank_y_inc) * (rank_x_inc + rank_y_inc) ** 2)
higher = np.sum((rank_x_inc > rank_y_dec) * (rank_x_inc - rank_y_dec) ** 2)
return (lower - higher) / (lower + higher)
2 changes: 1 addition & 1 deletion src/obscure_stats/central_tendency/central_tendency.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ def half_sample_mode(x: np.ndarray) -> float:
Returns
-------
hsm : float
The value of Half Sample Mode.
The value of half sample mode.
References
----------
Expand Down
8 changes: 4 additions & 4 deletions src/obscure_stats/dispersion/dispersion.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ def morisita_index(x: np.ndarray) -> float:


def standard_quantile_absolute_deviation(x: np.ndarray) -> float:
"""Calculate Standard quantile absolute deviation.
"""Calculate standard quantile absolute deviation.
This measure is a robust measure of dispersion, that has higher
gaussian efficiency, but lower breaking point than MAD.
Expand All @@ -247,7 +247,7 @@ def standard_quantile_absolute_deviation(x: np.ndarray) -> float:
Returns
-------
sqad : float
The value of the SQAD.
The value of the standard quantile absolute deviation.
References
----------
Expand Down Expand Up @@ -276,7 +276,7 @@ def shamos_estimator(x: np.ndarray) -> float:
Returns
-------
se : float
The value of Hodges-Lehmann-Sen estimator.
The value of Shamos estimator.
References
----------
Expand Down Expand Up @@ -311,7 +311,7 @@ def coefficient_of_range(x: np.ndarray) -> float:
Returns
-------
cr : float
The value of the linear coefficient of variation.
The value of the range coefficient.
References
----------
Expand Down
2 changes: 2 additions & 0 deletions src/obscure_stats/skewness/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
auc_skew_gamma,
bickel_mode_skew,
bowley_skew,
cumulative_skew,
forhad_shorna_rank_skew,
groeneveld_skew,
hossain_adnan_skew,
Expand All @@ -19,6 +20,7 @@
"auc_skew_gamma",
"bickel_mode_skew",
"bowley_skew",
"cumulative_skew",
"forhad_shorna_rank_skew",
"groeneveld_skew",
"hossain_adnan_skew",
Expand Down
Loading

0 comments on commit ad5e716

Please sign in to comment.