Skip to content

Commit

Permalink
ENH: add the percentileofscore gufunc.
Browse files Browse the repository at this point in the history
  • Loading branch information
WarrenWeckesser committed Sep 25, 2024
1 parent e5f59de commit 6731c9d
Show file tree
Hide file tree
Showing 5 changed files with 243 additions and 24 deletions.
100 changes: 78 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,28 +82,29 @@ processed with the script in `ufunclab/tools/conv_template.py`.

*Functions that reduce a one-dimensional array to one or two numbers.*

| Function | Description |
| -------- | ----------- |
| [`first`](#first) | First value that matches a target comparison |
| [`argfirst`](#argfirst) | Index of the first occurrence of a target comparison |
| [`argmin`](#argmin) | Like `numpy.argmin`, but a gufunc |
| [`argmax`](#argmax) | Like `numpy.argmax`, but a gufunc |
| [`minmax`](#minmax) | Minimum and maximum |
| [`argminmax`](#argminmax) | Indices of the min and the max |
| [`min_argmin`](#min_argmin) | Minimum value and its index |
| [`max_argmax`](#max_argmax) | Maximum value and its index |
| [`searchsortedl`](#searchsortedl) | Find position for given element in sorted seq. |
| [`searchsortedr`](#searchsortedr) | Find position for given element in sorted seq. |
| [`peaktopeak`](#peaktopeak) | Alternative to `numpy.ptp` |
| [`all_same`](#all_same) | Check all values are the same |
| [`gmean`](#gmean) | Geometric mean |
| [`hmean`](#hmean) | Harmonic mean |
| [`meanvar`](#meanvar) | Mean and variance |
| [`mad`](#mad) | Mean absolute difference (MAD) |
| [`rmad`](#rmad) | Relative mean absolute difference (RMAD) |
| [`gini`](#gini) | Gini coefficient |
| [`rms`](#rms) | Root-mean-square for real and complex inputs |
| [`vnorm`](#vnorm) | Vector norm |
| Function | Description |
| -------- | ----------- |
| [`first`](#first) | First value that matches a target comparison |
| [`argfirst`](#argfirst) | Index of the first occurrence of a target comparison |
| [`argmin`](#argmin) | Like `numpy.argmin`, but a gufunc |
| [`argmax`](#argmax) | Like `numpy.argmax`, but a gufunc |
| [`minmax`](#minmax) | Minimum and maximum |
| [`argminmax`](#argminmax) | Indices of the min and the max |
| [`min_argmin`](#min_argmin) | Minimum value and its index |
| [`max_argmax`](#max_argmax) | Maximum value and its index |
| [`searchsortedl`](#searchsortedl) | Find position for given element in sorted seq. |
| [`searchsortedr`](#searchsortedr) | Find position for given element in sorted seq. |
| [`peaktopeak`](#peaktopeak) | Alternative to `numpy.ptp` |
| [`all_same`](#all_same) | Check all values are the same |
| [`gmean`](#gmean) | Geometric mean |
| [`hmean`](#hmean) | Harmonic mean |
| [`meanvar`](#meanvar) | Mean and variance |
| [`mad`](#mad) | Mean absolute difference (MAD) |
| [`rmad`](#rmad) | Relative mean absolute difference (RMAD) |
| [`gini`](#gini) | Gini coefficient |
| [`rms`](#rms) | Root-mean-square for real and complex inputs |
| [`vnorm`](#vnorm) | Vector norm |
| [`percentileofscore`](#percentileofscore) | Percentile of score (like `scipy.stats.percentileofscore`) |

*Functions that reduce two one-dimensional arrays to a number.*

Expand Down Expand Up @@ -1370,6 +1371,61 @@ with orders 1, 2, 3, and inf. (Note that `abs(z)` is [2, 5, 0, 14].)
array([21. , 15. , 14.22263137, 14. ])
```

### `percentileofscore`

`percentileofscore(x, score, kind)`, a gufunc with signature `(n),(),()->()`,
performs the same calculation as `scipy.stats.percentileofscore`. As a gufunc,
it broadcasts all of its arguments, including `kind`.

Unlike the `kind` parameter of `scipy.stats.percentilescore`, here the third
argument is required, and it must be an integer type. The allowed values
are `ufunclab.op.KIND_RANK`, `ufunclab.op.KIND_WEAK`, `ufunclab.op.KIND_STRICT`
and `ufunclab.op.KIND_MEAN`.

```
>>> import numpy as np
>>> from ufunclab import percentileofscore, op
>>> a = [1, 2, 3, 3, 4]
>>> percentileofscore(a, [2, 2.5, 3, 3.5], op.KIND_RANK)
array([40., 40., 70., 80.])
```

With broadcasting, all for kinds of the calculation can be performed with
one function call.

```
>>> percentileofscore(a, 3, [op.KIND_RANK, op.KIND_WEAK, op.KIND_STRICT, op.KIND_MEAN])
array([70., 80., 40., 60.])
```

A more common application of broadcasting might be to compute the percentile
of a score in several arrays.

```
>>> rng = np.random.default_rng(121263137472525314065)
>>> x = rng.integer(0, 20, size=(3, 16))
array([[19, 0, 13, 0, 6, 19, 17, 9, 0, 9, 11, 14, 6, 15, 18, 3],
[ 7, 0, 16, 3, 5, 1, 1, 16, 0, 16, 3, 14, 5, 2, 10, 6],
[17, 10, 5, 17, 8, 11, 13, 18, 15, 7, 3, 3, 6, 0, 17, 15]])
```

Get the percentile of 10 in each row of x:

```
>>> percentileofscore(x, 10, op.KIND_RANK)
array([50., 75., 50.])
```

Get the percentiles of 9 and 10 for each row. For broadcasting, the `score`
has shape (2, 1).

```
>>> percentileofscore(x, [[9], [10]], op.KIND_RANK)
array([[46.875, 68.75 , 43.75 ],
[50. , 75. , 50. ]])
```

#### `vdot`

`vdot(x, y)` is the vector dot product of the real floating point vectors
Expand Down
41 changes: 41 additions & 0 deletions src/percentileofscore/define_cxx_gufunc_extmod.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import numpy as np
from ufunc_config_types import UFuncExtMod, UFunc, UFuncSource


PERCENTILEOFSCORE_DOCSTRING = """\
percentileofscore(x, score, kind, /, ...)
See `scipy.stats.percentileofscore`.
"""

input_types = [np.dtype('int8'), np.dtype('uint8'),
np.dtype('int16'), np.dtype('uint16'),
np.dtype('int32'), np.dtype('uint32'),
np.dtype('int64'), np.dtype('uint64'),
np.dtype('f'), np.dtype('d')]

type_chars = [typ.char for typ in input_types]
type_sigs = [f'{c}{c}p->d' for c in type_chars]

percentileofscore_src = UFuncSource(
funcname='percentileofscore_core_calc',
typesignatures=type_sigs,
)

percentileofscore = UFunc(
name='percentileofscore',
header='percentileofscore_gufunc.h',
docstring=PERCENTILEOFSCORE_DOCSTRING,
signature='(n),(),()->()',
sources=[percentileofscore_src],
)


extmod = UFuncExtMod(
module='_percentileofscore',
docstring=("This extension module defines the gufuncs 'percentileofscore'."),
ufuncs=[percentileofscore],
# The call `status = add_percentileofscore_constants(module);` will be
# added to the end of the module init function.
extra_module_funcs=['add_percentileofscore_kind_constants']
)
110 changes: 110 additions & 0 deletions src/percentileofscore/percentileofscore_gufunc.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@

#ifndef PERCENTILEOFSCORE_GUFUNC_H
#define PERCENTILEOFSOCRE_GUFUNC_H

#define PY_SSIZE_T_CLEAN
#include "Python.h"

#define NPY_NO_DEPRECATED_API NPY_API_VERSION
#include "numpy/ndarraytypes.h"

#include "../src/util/strided.hpp"


enum percentileofscore_kind: int {
RANK = 0,
WEAK,
STRICT,
MEAN,
};

template<typename T>
static inline void
count(npy_intp n,
const T *p_x,
npy_intp stride,
const T score,
npy_intp *nless,
npy_intp *nequal,
npy_intp *ngreater)
{
*nless = 0;
*nequal = 0;
*ngreater = 0;
for (npy_intp k = 0; k < n; ++k) {
T value = get(p_x, stride, k);
if (value < score) {
++*nless;
}
else if(value == score) {
++*nequal;
}
else {
++*ngreater;
}
}
}

//
// `percentileofscore_core_calc`, the C++ core function
// for the gufunc `percentileofscore` with signature '(n),(),()->()'.
//
template<typename T>
static void
percentileofscore_core_calc(
npy_intp n, // core dimension n
T *p_x, // pointer to first element of x, a strided 1-d array with n elements
npy_intp x_stride, // stride (in bytes) for elements of x
T *p_score,
npy_intp *p_kind,
double *p_out
)
{
npy_intp nless, nequal, ngreater, sum;

count(n, p_x, x_stride, *p_score, &nless, &nequal, &ngreater);

switch (static_cast<int>(*p_kind)) {
case RANK:
sum = 2*nless + nequal;
if (nequal > 0) {
++sum;
}
*p_out = sum*50.0/n;
return;
case WEAK:
*p_out = (nless + nequal)*100.0/n;
return;
case STRICT:
*p_out = nless*100.0/n;
return;
case MEAN:
*p_out = (2*nless + nequal)*50.0/n;
return;
}
}

//--------------------------------------------------------

//
// The name of this function is listed in the `extra_module_funcs`
// attribute of the UFuncExtMod object that defines the gufuncs
// `first` and `argfirst`. A call of this function will be added
// to the end of the generated extension module.
//
int add_percentileofscore_kind_constants(PyObject *module)
{
// Expose the numerical values RANK, WEAK, STRICT and MEAN as integers
// in this module.
const char *opnames[] = {"_RANK", "_WEAK", "_STRICT", "_MEAN"};
const int opcodes[] = {RANK, WEAK, STRICT, MEAN};
for (int k = 0; k < 4; ++k) {
int status = PyModule_AddIntConstant(module, opnames[k], opcodes[k]);
if (status == -1) {
return -1;
}
}
return 0;
}

#endif
15 changes: 13 additions & 2 deletions ufunclab/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@
# To allow `op` to be created here, ._first is not lazy-loaded.
# It is imported here to have access to the constants _LT, _LE, etc.
from ._first import first, argfirst, _LT, _LE, _EQ, _NE, _GT, _GE

# Similarly, percentileofscore is not lazy-loaded.
# It is imported here to have access to the constants _RANK, _WEAK, _STRICT
# and _MEAN.
from ._percentileofscore import percentileofscore, _RANK, _WEAK, _STRICT, _MEAN

import numpy as _np


Expand Down Expand Up @@ -114,12 +120,17 @@ class op:
NE = _np.int8(_NE)
GT = _np.int8(_GT)
GE = _np.int8(_GE)
KIND_RANK = _np.intp(_RANK)
KIND_WEAK = _np.intp(_WEAK)
KIND_STRICT = _np.intp(_STRICT)
KIND_MEAN = _np.intp(_MEAN)



del _np, _LT, _LE, _EQ, _NE, _GT, _GE
del _np, _LT, _LE, _EQ, _NE, _GT, _GE, _RANK, _WEAK, _STRICT, _MEAN

__all__ = sorted(list(_name_to_module.keys()) +
['first', 'argfirst', 'op'])
['first', 'argfirst', 'op', 'percentileofscore'])


def __dir__():
Expand Down
1 change: 1 addition & 0 deletions ufunclab/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ gufunc_cxx_src_dirs = [
'multivariate_logbeta',
'nextn',
'one_hot',
'percentileofscore',
'sosfilter',
'tri_area',
'vnorm',
Expand Down

0 comments on commit 6731c9d

Please sign in to comment.