Is there any more efficient way to extract WPTO Hindcast data for large amount of lat-lon pairs? #253
-
Hi all, I've spent some weeks figuring out how to extract WPTO hindcast for a large number of coordinate pairs. Both using rex and MHKit.
from mhkit import wave
import numpy as np
import pandas as pd
# creating tuple of coordinate pairs
coord = pd.read_csv('grid_20km_spacing_eastcoast_.csv')
coord = [tuple(row) for row in coord.to_records(index=False)]
coord
# check if the coordinates are within the domain
data_type = '3-hour' # setting the data type to the 3-hour dataset
parameter = 'significant_wave_height'
rg = []
for i in range(len(coord)):
region = wave.io.hindcast.hindcast.region_selection(coord[i])
rg.append(region)
print(rg)
print(len(rg))
#get all the dataset for spatial and temporal extent
years = np.arange(1979,2011,1)
swh = pd.DataFrame()
md = pd.DataFrame()
for i in range(len(years))
for j in range(len(coord)):
Hs, metadata= wave.io.hindcast.hindcast.request_wpto_point_data(data_type,parameter,coord[j],years[i])
swh[i] = Hs
md = pd.concat([md, metadata])
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
Hey @radityadanu 👋 and thank you for your interest in MHKiT. I get the same issue as you and it is due to the API. We are doing some overhauls to MHKiT improve our responses as best as possible from our end but we are quite limited in working with the hindcast API. The summary is the API you are hitting will only allow you a few calls before it cuts you off. So at a minimum you need to add a wait time between API calls. Moreover I would reccomend saving each file between calls and performing a check to see where you left off. For a suggestion I have grabbed your solution you posted on the rex GitHub which was very helpful and done an outline of what I suggest above:
I have put 3 seconds here which is very conservative but I actually think you will still run into issues. Please follow up and let me know how this goes. |
Beta Was this translation helpful? Give feedback.
-
Hi @ssolson, I think it was again API related, after 3,629 min (approx. 3 days), an error occurred with following traceback: ---------------------------------------------------------------------------
OSError Traceback (most recent call last)
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\h5pyd\_hl\dataset.py:1171, in Dataset.__getitem__(self, args, new_dtype)
1170 try:
-> 1171 rsp = self.GET(req, params=params, format="binary")
1172 except IOError as ioe:
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\h5pyd\_hl\base.py:980, in HLObject.GET(self, req, params, use_cache, format)
979 if len(http_chunks) == 0:
--> 980 raise IOError("no data returned")
981 if len(http_chunks) == 1:
982 # can return first and only chunk as response
OSError: no data returned
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
Cell In[8], line 15
13 for j in range(start_j if i == start_i else 0, len(coord_tuples)):
14 print(i, j)
---> 15 with WaveX(wave_file[i], hsds=True) as f:
16 lat_lon_swh = f.get_lat_lon_df('significant_wave_height', coord_tuples[j])
18 # Save to file
Cell In[8], line 16
14 print(i, j)
15 with WaveX(wave_file[i], hsds=True) as f:
---> 16 lat_lon_swh = f.get_lat_lon_df('significant_wave_height', coord_tuples[j])
18 # Save to file
19 lat_lon_swh.to_csv(f'results_{i}_{j}.csv')
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource_extraction\resource_extraction.py:789, in ResourceX.get_lat_lon_df(self, ds_name, lat_lon, check_lat_lon)
766 """
767 Extract timeseries of site(s) nearest to given lat_lon(s) and return
768 as a DataFrame
(...)
786 Time-series DataFrame for given site(s) and dataset
787 """
788 gid = self.lat_lon_gid(lat_lon, check_lat_lon=check_lat_lon)
--> 789 df = self.get_gid_df(ds_name, gid)
791 return df
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource_extraction\resource_extraction.py:1913, in WaveX.get_gid_df(self, ds_name, gid)
1911 df = df.reshape(ax1, ax2)
1912 else:
-> 1913 df = self[ds_name, :, gid]
1914 index = pd.Index(data=self.time_index, name='time_index')
1916 if isinstance(gid, (int, np.integer)):
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource_extraction\resource_extraction.py:104, in ResourceX.__getitem__(self, keys)
103 def __getitem__(self, keys):
--> 104 return self.resource[keys]
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:674, in BaseResource.__getitem__(self, keys)
671 raise ResourceRuntimeError(msg)
673 else:
--> 674 out = self._get_ds(ds, ds_slice)
676 return out
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:1337, in BaseResource._get_ds(self, ds_name, ds_slice)
1335 if len(ds_slice) > len(ds.shape):
1336 return self._get_ds_with_repeated_values(ds, ds_name, ds_slice)
-> 1337 return ResourceDataset.extract(ds, ds_slice,
1338 scale_attr=self.SCALE_ATTR,
1339 add_attr=self.ADD_ATTR,
1340 unscale=self._unscale)
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:568, in ResourceDataset.extract(cls, ds, ds_slice, scale_attr, add_attr, unscale)
548 """
549 Extract data from Resource Dataset
550
(...)
563 Flag to unscale dataset data, by default True
564 """
565 dset = cls(ds, scale_attr=scale_attr, add_attr=add_attr,
566 unscale=unscale)
--> 568 return dset[ds_slice]
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:57, in ResourceDataset.__getitem__(self, ds_slice)
54 def __getitem__(self, ds_slice):
55 ds_slice = parse_slice(ds_slice)
---> 57 return self._get_ds_slice(ds_slice)
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:538, in ResourceDataset._get_ds_slice(self, ds_slice)
536 out = self._extract_list_slice(ds_slice)
537 else:
--> 538 out = self._extract_ds_slice(ds_slice)
540 if self._unscale:
541 out = self._unscale_data(out)
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:483, in ResourceDataset._extract_ds_slice(self, ds_slice)
480 if ax_idx is not None:
481 idx_slice += (ax_idx,)
--> 483 out = self.ds[slices]
485 # check to see if idx_slice needs to be applied
486 if any(s != slice(None) if isinstance(s, slice) else True
487 for s in idx_slice):
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\h5pyd\_hl\dataset.py:1180, in Dataset.__getitem__(self, args, new_dtype)
1178 break
1179 else:
-> 1180 raise IOError(f"Error retrieving data: {ioe.errno}")
1181 if isinstance(rsp, str):
1182 # hexencoded response?
1183 # this is returned by API Gateway for lamba responses
1184 rsp = bytes.fromhex(rsp)
OSError: Error retrieving data: None``` |
Beta Was this translation helpful? Give feedback.
-
The API returned no data. It has kinda cut you off but that is why we saved along the way so we could pick up where we left off. |
Beta Was this translation helpful? Give feedback.
-
Radit I apologize on behalf of whoever created this API. This has happend to me before as well. There comes a point where the API will no longer respond to me no matter if I change the API key, or try to VPN in. I at this point have historically worked on other things and eventually something in the system resets. I would be interested if you:
I know that ultimatly these responses are deeply unsatisfying and work arounds. If I had any control over the system at all I promise I would help you further. I encourage you to continue following up with the NREL team. I have hit many dead ends chasing down who is incharge of this system and they tend to not respond. Hopefully in the mean time you and I can figure out how to work within this black box! Please let me know what you end up trying next. |
Beta Was this translation helpful? Give feedback.
Hey @radityadanu 👋 and thank you for your interest in MHKiT.
I get the same issue as you and it is due to the API. We are doing some overhauls to MHKiT improve our responses as best as possible from our end but we are quite limited in working with the hindcast API.
The summary is the API you are hitting will only allow you a few calls before it cuts you off. So at a minimum you need to add a wait time between API calls. Moreover I would reccomend saving each file between calls and performing a check to see where you left off.
For a suggestion I have grabbed your solution you posted on the rex GitHub which was very helpful and done an outline of what I suggest above: