Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept for a powerful bh107 module #622

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dionhaefner
Copy link
Collaborator

@dionhaefner dionhaefner commented Jun 3, 2019

The idea is that there should be 2 modes of using bh107:

1. Implicitly through NumPy

(via __array_function__, __array_ufunc__, __array_interface__, requires recent NumPy)

Example usage:

import numpy as np

a = bh107.random.rand(10, 10)
np_array = np.random.rand(10, 10)

np.mean(a)  # returns bh107 array, computed via bohrium

np.maximum(a, np_array)  # returns bh107 array, np_array is cast to BhArray before computation

np.fft.fft(a)  # not implemented in bh107, computed via NumPy, returns bh107 array

bh107.BhArray.from_numpy(np.fft.fft(a.asnumpy()))  # what is done under the hood

np.add.reduce(a, at=mask)  # fancy argument we don't support, pass through NumPy, returns bh107 array

plt.imshow(a)  # works, because matplotlib casts its inputs via np.array()

some_other_library.my_fancy_function(a)  # probably raises an exception, users should use a.asnumpy()

The behavior of some of the more problematic implicit conversions could be controlled by an environment variable:

>>> import os
>>> os.environ['BH107_ON_NUMPY_CONVERSION'] = 'warn'  # default

>>> np.array(bh_array)  # should be using bh_array.asnumpy or bh_array.copy2numpy
ImplicitConversionWarning: Copying bh107 array to NumPy

>>> plt.imshow(bh_array)  # warns on implicit conversion via np.array()
ImplicitConversionWarning: Copying bh107 array to NumPy

>>> np.fft.fft(a)
ImplicitConversionWarning: Copying bh107 array to NumPy

>>> np.add.reduce(a, at=mask)
ImplicitConversionWarning: Copying bh107 array to NumPy

Other possible values for BH107_ON_NUMPY_CONVERSION would then be ignore and raise.

2. Explicitly

Guaranteed to have no NumPy interactions, but is restricted to the functions we implement.

Examples:

>>> import bh107

>>> a = bh107.random.rand(10, 10)

>>> bh.mean(a)   # good
>>> a.mean()  # equivalent
>>> bh.maximum(a, np_array)  # good

>>> bh.fft.fttn(a)  # oops
AttributeError: module 'bh107' has no attribute 'fft'

(best used with BH107_ON_NUMPY_CONVERSION=raise).


What is the gain compared to bohrium?

  • bh107.BhArray does not subclass np.ndarray, so it cannot be passed to C APIs without explicit conversion to NumPy, so no more corruption (fingers crossed)
  • Give control back to the user
  • We are explicit about what is supported and what isn't: dir(bh107)
  • Explicit feature support lets us meaningfully measure coverage and build a well-tested library
  • Barely losing any flexibility

Happy for any comments / API suggestions.

@dionhaefner
Copy link
Collaborator Author

dionhaefner commented Jun 10, 2019

To-do:

  • Check whether given dtype is supported by the Bohrium backend
  • Implement warning mechanism
  • Check whether a ufunc was called with unsupported options and fall back to NumPy
  • Implement more bh107 functions:
    • mean / median / percentile (requires sort)
    • stdev / variance
    • abs
    • squeeze
    • Abbreviations for reduction / accumulations (all, any, sum, prod, cumsum, cumprod, max, min)
    • linspace / logspace
    • transpose / reshape / roll / rollaxis
    • argmax / argmin
    • nan-aware reductions (nanargmax, nanargmin, nansum, nanprod, nancumsum, nancumprod, nanmax, nanmin, nanpercentile, nanmedian, nanmean, nanstd, nanvar)
    • stack / concatenate / append / hstack / vstack / dstack
    • round / clip
    • newaxis
    • flatten / flat
    • tri-adic where
    • diff
    • diag / eye
    • matrix multiplication / dot via BLAS?
  • Decide whether to support method functions (e.g. BhArray.mean)
  • Add bh107 functions as methods

This list of functions should be about everything we can reasonably support. Bohrium is a high-performance tool, so I don't really see a place for (1) stuff that is notoriously hard to parallelize (full sorts), (2) IO tools (loadtxt and friends), (3) advanced index tricks (unary where, einsum, argpartition, indices), or is (4) just too fringe (pad, piecewise, heaviside, packbits, date stuff).

This is not to say that all of these features need to be part of this PR, just brainstorming on my side.

Sorry, something went wrong.

@dionhaefner
Copy link
Collaborator Author

👋 @madsbk, do you have an opinion on methods on BhArray? I am talking about things like

>>> a = bh107.array([1, 2, 3])
>>> a.sum()
6

As I see it, they often make for a slick interface without the need to even import the module where the array comes from. However, I see some problems with that:

  • People might expect us to provide all of NumPy's supported methods, which seem too many for us to handle (and can / will change between NumPy versions):

    ['all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'cumprod', 'cumsum', 'diagonal', 'dot', 'fill', 'flat', 'flatten', 'imag', 'max', 'mean', 'min', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'trace', 'transpose', 'var']

  • It is ambiguous which function should be dispatched from the method, i.e., whether the method versions are allowed to fall back to NumPy or not.

I see three different ways forward:

  1. Only add the methods that are natively supported on BhArrays, and never fall back to NumPy.
  2. Try and be as compatible with NumPy as possible by adding all of the above methods, fall back to NumPy if necessary.
  3. Do not supply any methods, and force users to use the bh107 / NumPy functional interfaces.

I quite like 1, but happy to hear your input.

@madsbk
Copy link
Contributor

madsbk commented Jun 12, 2019

Agree, I also like option 1 the best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants