You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given that a lot of geostatspy is written in pure Python, I would like to offer the suggestion that some minor refactoring be performed to enable adding numba@njit decorators to compute-intensive functions.
For example, taking the geostatspy.varmapv function, we can split the mainpulation of the pandas.DataFrame object from the numerical code:
def varmapv(df,xcol,ycol,vcol,tmin,tmax,nxlag,nylag,dxlag,dylag,minnp,isill):
# Parameters - consistent with original GSLIB # df - DataFrame with the spatial data, xcol, ycol, vcol coordinates and property columns# tmin, tmax - property trimming limits# xlag, xltol - lag distance and lag distance tolerance# nlag - number of lags to calculate# azm, atol - azimuth and azimuth tolerance# bandwh - horizontal bandwidth / maximum distance offset orthogonal to azimuth# isill - 1 for standardize sill# Load the datadf_extract = df.loc[(df[vcol] >= tmin) & (df[vcol] <= tmax)] # trim values outside tmin and tmaxnd = len(df_extract)
x = df_extract[xcol].values
y = df_extract[ycol].values
vr = df_extract[vcol].values
# Summary statistics for the data after trimming
...
After refactoring:
from numba import njit
def varmapv(df, xcol, ycol, vcol, tmin, tmax, nxlag, nylag, dxlag, dylag, minnp, isill):
# Parameters - consistent with original GSLIB # df - DataFrame with the spatial data, xcol, ycol, vcol coordinates and property columns# tmin, tmax - property trimming limits# xlag, xltol - lag distance and lag distance tolerance# nlag - number of lags to calculate# azm, atol - azimuth and azimuth tolerance# bandwh - horizontal bandwidth / maximum distance offset orthogonal to azimuth# isill - 1 for standardize sill# Load the datadf_extract = df.loc[(df[vcol] >= tmin) & (df[vcol] <= tmax)] # trim values outside tmin and tmaxnd = len(df_extract)
x = df_extract[xcol].values
y = df_extract[ycol].values
vr = df_extract[vcol].values
return _varmapv(nd, x, y, vr, nxlag, nylag, dxlag, dylag, minnp, isill)
@njit
def _varmapv(nd, x, y, vr, nxlag, nylag, dxlag, dylag, minnp, isill):
# Summary statistics for the data after trimming
...
Timing of the current implementation is 580 ms on my machine, while the numba decorated version is is 2.02 ms
For scaling up to several thousand data points, a factor of over 100x is considerable!
Further optimization can be performed for functions that are parallelizable, letting numba release the GIL and optimize the function for multi-processing / multi-threading.
The text was updated successfully, but these errors were encountered:
Given that a lot of
geostatspy
is written in pure Python, I would like to offer the suggestion that some minor refactoring be performed to enable addingnumba
@njit
decorators to compute-intensive functions.For example, taking the
geostatspy.varmapv
function, we can split the mainpulation of thepandas.DataFrame
object from the numerical code:After refactoring:
Timing of the current implementation is 580 ms on my machine, while the numba decorated version is is 2.02 ms
For scaling up to several thousand data points, a factor of over 100x is considerable!
Further optimization can be performed for functions that are parallelizable, letting
numba
release the GIL and optimize the function for multi-processing / multi-threading.The text was updated successfully, but these errors were encountered: