Skip to content

Commit

Permalink
Merge pull request #6 from glevv/r_entropy
Browse files Browse the repository at this point in the history
Add Renyi entropy and McIntosh's D
  • Loading branch information
glevv authored Jan 5, 2024
2 parents ad5e716 + 0ef7cbf commit 2d03f5c
Show file tree
Hide file tree
Showing 10 changed files with 144 additions and 29 deletions.
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@ repository-code: 'https://github.com/glevv/obscure_stats'
repository-artifact: 'https://pypi.org/project/obscure_stats'
abstract: Collection of lesser-known statistical measures
license: MIT
version: 0.1.8
version: 0.1.9
date-released: '2023-10-21'
2 changes: 1 addition & 1 deletion CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
This projects adopts Python Software Foundation Code of Conduct, [please read it here](https://www.python.org/psf/conduct/).
This project adopts the Python Software Foundation Code of Conduct; [please read it here](https://www.python.org/psf/conduct/).
48 changes: 34 additions & 14 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,50 @@

Thank you for considering contributing to this project!

All contributions are appreciated, from reporting bugs to implementing new features. This guide will try to help you make
the first steps.
All contributions are appreciated, from reporting bugs to implementing new features. This guide will try to help you take the first steps.

## Setup

- [First, fork the repository](https://docs.github.com/en/get-started/quickstart/fork-a-repo).
- Copy forked repositrory to local machine.
- Copy forked repositrory to local machine:
```bash
>>> git clone https://github.com/<username>/obscure_stats.git
>>> cd obscure_stats
```
- This project uses `poetry` as a pakcage manager. [How to install poetry](https://python-poetry.org/docs/#installation).
- Set up local enviroment that poetry will use. You can do it with [pyenv](https://github.com/pyenv/pyenv#installation) or venv or any other enviroment manager that you like.
- run `poetry init` to initialize your local enviroment.
- This project uses `poetry` as a pakcage manager. See how to install it [here](https://python-poetry.org/docs/#installation).
- Set up local enviroment that `poetry` will use. You can do it with [pyenv](https://github.com/pyenv/pyenv#installation) or venv or any other enviroment manager that you like.
- to initialize your local enviroment run:
```bash
>>> poetry init
```
- You are good to go!

## Workflow

- Every change should be tested - you need to add new tests for the new functionality (`pytest` and `pytest-cov` will do help you with this).
- Every change should be documented - you need to add docstring (`numpy` style) with reference to scientific paper (preprints accepted).
- Every change should be clean - you need to run linters, formatters, typechekers (`ruff` and `mypy` will take care of this).
- Every change should be tested; you need to add new tests for the new functionality (`pytest` and `pytest-cov` will help you with this).
- Every change should be documented; you need to add a docstring (`numpy` style) with reference to a scientific paper (preprints accepted).
- Every change should be clean; you need to run linters, formatters, typechekers (`ruff` and `mypy` will take care of this).

After you have made some changes to the codebase, you should run the following commands:
```python
>>> poetry run ruff check . --fix
```
This command will run linters and other useful stuff and try to fix all the problems. If something is unfixable automatically, you should try to fix it manually.

```python
>>> poetry run ruff format .
```
This command will run autoformatter.

```python
>>> poetry run mypy .
```
This command will run type checker. All typing problems should be fixed.

```python
>>> pytest --cov-report term-missing --cov=obscure_stats
```
This command will run the test suite. All tests should pass, as well as codecoverage should be high enough.


After you have made some changes to the codebase you should run following commands:
- `poetry run ruff check . --fix` - this command will run linters and other useful stuff and try to fix all the problems. If something is unfixable automatically you should try to fix it manually.
- `poetry run ruff format .` - this command will run autoformatter.
- `poetry run mypy .` - this command will type checker. All typing problems should be fixed.
- `pytest --cov-report term-missing --cov=obscure_stats` - this command will run the test suite. All tests should pass, as well as codecoverage should be high enough.
Happy coding!
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,11 @@
* B Index;
* Gibbs M1;
* Gibbs M2;
* McIntosh's D;
* ModVR;
* Negative Extropy;
* RanVR.
* RanVR;
* Rényi entropy.

## Installation

Expand All @@ -94,8 +96,14 @@ Robust measure of central tendency = 1.09±0.42

## Code of Conduct

This projects adopts Python Software Foundation Code of Conduct, [please read it here](https://www.python.org/psf/conduct/).
This project adopts the Python Software Foundation Code of Conduct; [please read it here](https://www.python.org/psf/conduct/).

## Contributing

If you would like to contribute, you can read a short guide [here](https://github.com/glevv/obscure_stats/blob/main/CONTRIBUTING.md).

## License

The content of this repository is licensed under a [MIT license](https://github.com/glevv/obscure_stats/blob/main/LICENSE).
The content of this repository is licensed under a [MIT license](https://github.com/glevv/obscure_stats/blob/main/LICENSE.txt).

This repository bundles several libraries that are compatibly licensed. A full list can be found [here](https://github.com/glevv/obscure_stats/blob/main/LICENSES_bundled.txt).
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "obscure_stats"
version = "0.1.8"
version = "0.1.9"
description = "Collection of lesser-known statistical functions"
authors = ["Hleb Levitski"]
readme = "README.md"
Expand Down
1 change: 1 addition & 0 deletions src/obscure_stats/central_tendency/central_tendency.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@ def half_sample_mode(x: np.ndarray) -> float:
--------
scipy.stats.mode - Mode estimator.
"""
# heavily inspired by https://github.com/cran/modeest/blob/master/R/hsm.R
y = np.sort(x)
y = y[np.isfinite(y)]
_corner_cases = (4, 3) # for 4 samples and 3 samples
Expand Down
2 changes: 1 addition & 1 deletion src/obscure_stats/kurtosis/kurtosis.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def moors_kurt(x: np.ndarray) -> float:
The meaning of kurtosis: Darlington reexamined.
The American Statistician, 40 (4): 283-284,
"""
return np.nanvar(np.square(stats.zscore(x, nan_policy="omit"))) + 1
return np.nanvar(stats.zscore(x, nan_policy="omit") ** 2) + 1


def moors_octile_kurt(x: np.ndarray) -> float:
Expand Down
4 changes: 4 additions & 0 deletions src/obscure_stats/variation/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,21 @@
b_index,
gibbs_m1,
gibbs_m2,
mcintosh_d,
mod_vr,
negative_extropy,
range_vr,
renyi_entropy,
)

__all__ = [
"avdev",
"b_index",
"gibbs_m1",
"gibbs_m2",
"mcintosh_d",
"mod_vr",
"negative_extropy",
"range_vr",
"renyi_entropy",
]
80 changes: 72 additions & 8 deletions src/obscure_stats/variation/variation.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Module for measures of categorical variations."""


import math
from collections import Counter

import numpy as np
Expand Down Expand Up @@ -33,8 +33,7 @@ def mod_vr(x: np.ndarray) -> float:
The Western Political Quarterly. 26 (2): 325-343.
"""
cnts = np.asarray(list(Counter(x).values()))
n = len(x)
return 1 - np.max(cnts) / n
return 1 - np.max(cnts) / len(x)


def range_vr(x: np.ndarray) -> float:
Expand Down Expand Up @@ -105,7 +104,7 @@ def gibbs_m1(x: np.ndarray) -> float:
Special case of Tsallis entropy (alpha = 2).
"""
freq = np.asarray(list(Counter(x).values())) / len(x)
return 1 - np.sum(np.square(freq))
return 1 - np.sum(freq**2)


def gibbs_m2(x: np.ndarray) -> float:
Expand Down Expand Up @@ -135,7 +134,7 @@ def gibbs_m2(x: np.ndarray) -> float:
"""
freq = np.asarray(list(Counter(x).values())) / len(x)
k = len(freq)
return (k / (k - 1)) * (1 - np.sum(np.square(freq)))
return (k / (k - 1)) * (1 - np.sum(freq**2))


def b_index(x: np.ndarray) -> float:
Expand Down Expand Up @@ -164,7 +163,7 @@ def b_index(x: np.ndarray) -> float:
"""
n = len(x)
freq = np.asarray(list(Counter(x).values())) / n
return 1 - np.sqrt(1 - np.square(stats.gmean(freq * len(freq) / n)))
return 1 - (1 - (stats.gmean(freq * len(freq) / n)) ** 2) ** 0.5


def avdev(x: np.ndarray) -> float:
Expand Down Expand Up @@ -198,6 +197,44 @@ def avdev(x: np.ndarray) -> float:
return 1 - (np.sum(np.abs(freq - mean)) / (2 * mean * max(k - 1, 1)))


def renyi_entropy(x: np.ndarray, alpha: float = 2) -> float:
"""Calculate Renyi entropy (bits).
Rényi entropy is a quantity that generalizes various notions of entropy,
including Hartley entropy, Shannon entropy, collision entropy, and min-entropy.
Low values of Rényi entropy correspond to lower variation and
high values to higher variation.
Parameters
----------
x : array_like
Input array.
alpha : float
Order of the Rényi entropy
Returns
-------
ren : float
The value of Rényi entropy.
References
----------
Rényi, A. (1961).
On measures of information and entropy.
Proceedings of the fourth Berkeley Symposium on Mathematics,
Statistics and Probability 1960. pp. 547-561.
"""
if alpha < 0:
msg = "Parameter alpha should be positive!"
raise ValueError(msg)
freq = np.asarray(list(Counter(x).values())) / len(x)
if alpha == 1:
# return Shannon entropy to avoid division by 0
return -np.sum(freq * np.log2(freq))
return 1 / (1 - alpha) * math.log2(np.sum(freq**alpha))


def negative_extropy(x: np.ndarray) -> float:
"""Calculate Negative Information Extropy (bits).
Expand Down Expand Up @@ -225,5 +262,32 @@ def negative_extropy(x: np.ndarray) -> float:
Statistical Science, 30(1), 40-58.
"""
freq = np.asarray(list(Counter(x).values())) / len(x)
p = 1.0 - freq + 1e-7
return -np.sum(p * np.log2(p))
p_inv = 1.0 - freq
return -np.sum(p_inv * np.log2(p_inv))


def mcintosh_d(x: np.ndarray) -> float:
"""Calculate McIntosh's D.
Ranges from 0 to 1, where 0 corresponds to no diversity,
and 1 to maximum diversity.
Parameters
----------
x : array_like
Input array.
Returns
-------
mid : float
The value of McIntosh's D.
References
----------
McIntosh, R. P. (1967).
An index of diversity and the relation of certain concepts to diversity.
Ecology, 48(3), 392-404.
"""
n = len(x)
counts = np.asarray(list(Counter(x).values()))
return (n - np.sum(counts**2)) / (n - n**0.5)
18 changes: 18 additions & 0 deletions tests/test_variation.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,23 @@
b_index,
gibbs_m1,
gibbs_m2,
mcintosh_d,
mod_vr,
negative_extropy,
range_vr,
renyi_entropy,
)

all_functions = [
avdev,
b_index,
gibbs_m1,
gibbs_m2,
mcintosh_d,
mod_vr,
negative_extropy,
range_vr,
renyi_entropy,
]


Expand Down Expand Up @@ -69,3 +73,17 @@ def test_statistic_with_nans(
if np.isnan(func(c_array_nan)):
msg = "Statistic should not return nans."
raise ValueError(msg)


def test_renyi_entropy_edgecases(c_array_obj: np.ndarray) -> None:
"""Test for different edgecases of Renyi entropy."""
with pytest.raises(ValueError, match="alpha should be positive"):
renyi_entropy(c_array_obj, alpha=-1)
renyi_0 = renyi_entropy(c_array_obj, alpha=0)
if renyi_0 != pytest.approx(2.321928):
msg = f"Results from the test and paper do not match, got {renyi_0}"
raise ValueError(msg)
renyi_1 = renyi_entropy(c_array_obj, alpha=1)
if renyi_1 != pytest.approx(2.040373):
msg = f"Results from the test and paper do not match, got {renyi_1}"
raise ValueError(msg)

0 comments on commit 2d03f5c

Please sign in to comment.