numba acceleration #14

dsfulf · 2019-10-14T22:32:07Z

Given that a lot of geostatspy is written in pure Python, I would like to offer the suggestion that some minor refactoring be performed to enable adding numba @njit decorators to compute-intensive functions.

For example, taking the geostatspy.varmapv function, we can split the mainpulation of the pandas.DataFrame object from the numerical code:

def varmapv(df,xcol,ycol,vcol,tmin,tmax,nxlag,nylag,dxlag,dylag,minnp,isill): 

    # Parameters - consistent with original GSLIB    
    # df - DataFrame with the spatial data, xcol, ycol, vcol coordinates and property columns
    # tmin, tmax - property trimming limits
    # xlag, xltol - lag distance and lag distance tolerance
    # nlag - number of lags to calculate
    # azm, atol - azimuth and azimuth tolerance
    # bandwh - horizontal bandwidth / maximum distance offset orthogonal to azimuth
    # isill - 1 for standardize sill

    # Load the data

    df_extract = df.loc[(df[vcol] >= tmin) & (df[vcol] <= tmax)]    # trim values outside tmin and tmax
    nd = len(df_extract)
    x = df_extract[xcol].values
    y = df_extract[ycol].values
    vr = df_extract[vcol].values
    
    # Summary statistics for the data after trimming
   ...

After refactoring:

from numba import njit

def varmapv(df, xcol, ycol, vcol, tmin, tmax, nxlag, nylag, dxlag, dylag, minnp, isill): 

    # Parameters - consistent with original GSLIB    
    # df - DataFrame with the spatial data, xcol, ycol, vcol coordinates and property columns
    # tmin, tmax - property trimming limits
    # xlag, xltol - lag distance and lag distance tolerance
    # nlag - number of lags to calculate
    # azm, atol - azimuth and azimuth tolerance
    # bandwh - horizontal bandwidth / maximum distance offset orthogonal to azimuth
    # isill - 1 for standardize sill

    # Load the data
    df_extract = df.loc[(df[vcol] >= tmin) & (df[vcol] <= tmax)]    # trim values outside tmin and tmax
    nd = len(df_extract)
    x = df_extract[xcol].values
    y = df_extract[ycol].values
    vr = df_extract[vcol].values
    
    return _varmapv(nd, x, y, vr, nxlag, nylag, dxlag, dylag, minnp, isill)

@njit
def _varmapv(nd, x, y, vr, nxlag, nylag, dxlag, dylag, minnp, isill):

    
    # Summary statistics for the data after trimming
    ...

Timing of the current implementation is 580 ms on my machine, while the numba decorated version is is 2.02 ms

For scaling up to several thousand data points, a factor of over 100x is considerable!

Further optimization can be performed for functions that are parallelizable, letting numba release the GIL and optimize the function for multi-processing / multi-threading.

The text was updated successfully, but these errors were encountered:

GeostatsGuy · 2020-01-12T16:39:07Z

Thank you dsfulf. Now that I have some more time (last term was crazy), I'll ask my graduate students to implement this.

Thank you for contributing to geostatspy, Michael

PauloCarvalhoRJ · 2020-01-13T01:20:25Z

It is certainly worth it. I numbafied parts of my Python code and the speedup is indeed in the order of the hundreds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numba acceleration #14

numba acceleration #14

dsfulf commented Oct 14, 2019 •

edited

Loading

GeostatsGuy commented Jan 12, 2020

PauloCarvalhoRJ commented Jan 13, 2020

numba acceleration #14

numba acceleration #14

Comments

dsfulf commented Oct 14, 2019 • edited Loading

GeostatsGuy commented Jan 12, 2020

PauloCarvalhoRJ commented Jan 13, 2020

dsfulf commented Oct 14, 2019 •

edited

Loading