Is there any more efficient way to extract WPTO Hindcast data for large amount of lat-lon pairs? #253

danuriyanto · 2023-07-28T13:50:35Z

danuriyanto
Jul 28, 2023

Hi all,

I've spent some weeks figuring out how to extract WPTO hindcast for a large number of coordinate pairs. Both using rex and MHKit.
However, it results in error 503, which I assume that's because I requested too much data. I do not want to post this in issues, as I believe that is because of my inefficient coding. Here is the code and error provided. Anyone has the same case and let me know if you have a better method. Thank you!

ERROR:root:got <class 'requests.exceptions.RetryError'> exception: HTTPSConnectionPool(host='developer.nrel.gov', port=443): Max retries exceeded with url: /api/hsds/?domain=%2Fnrel%2FUS_wave%2FAtlantic&api_key=rcufYNIUpUJPlUP5n637v0O03DKEljn5IfyabLFZ (Caused by ResponseError('too many 503 error responses'))

from mhkit import wave
import numpy as np
import pandas as pd

# creating tuple of coordinate pairs
coord = pd.read_csv('grid_20km_spacing_eastcoast_.csv')
coord = [tuple(row) for row in coord.to_records(index=False)]
coord

# check if the coordinates are within the domain
data_type = '3-hour' # setting the data type to the 3-hour dataset
parameter = 'significant_wave_height'
rg = []

for i in range(len(coord)): 
    region = wave.io.hindcast.hindcast.region_selection(coord[i])
    rg.append(region)
print(rg)
print(len(rg))

#get all the dataset for spatial and temporal extent
years = np.arange(1979,2011,1)
swh = pd.DataFrame()
md = pd.DataFrame()
for i in range(len(years))
    for j in range(len(coord)):
        Hs, metadata= wave.io.hindcast.hindcast.request_wpto_point_data(data_type,parameter,coord[j],years[i])
        swh[i] = Hs
        md = pd.concat([md, metadata])

grid_20km_spacing_eastcoast_.csv

Answered by ssolson

Jul 28, 2023

Hey @radityadanu 👋 and thank you for your interest in MHKiT.

I get the same issue as you and it is due to the API. We are doing some overhauls to MHKiT improve our responses as best as possible from our end but we are quite limited in working with the hindcast API.

The summary is the API you are hitting will only allow you a few calls before it cuts you off. So at a minimum you need to add a wait time between API calls. Moreover I would reccomend saving each file between calls and performing a check to see where you left off.

For a suggestion I have grabbed your solution you posted on the rex GitHub which was very helpful and done an outline of what I suggest above:

from rex import WaveX
…

View full answer

ssolson · 2023-07-28T15:08:17Z

ssolson
Jul 28, 2023
Maintainer

Hey @radityadanu 👋 and thank you for your interest in MHKiT.

I get the same issue as you and it is due to the API. We are doing some overhauls to MHKiT improve our responses as best as possible from our end but we are quite limited in working with the hindcast API.

The summary is the API you are hitting will only allow you a few calls before it cuts you off. So at a minimum you need to add a wait time between API calls. Moreover I would reccomend saving each file between calls and performing a check to see where you left off.

For a suggestion I have grabbed your solution you posted on the rex GitHub which was very helpful and done an outline of what I suggest above:

from rex import WaveX
import h5pyd
import pandas as pd
import numpy as np
import time
import os
import glob

# reading the coordinate tuple
coord_MA = pd.read_csv('grid_20km_spacing_eastcoast_.csv')
coord_MA = [tuple(row) for row in coord_MA.to_records(index=False)]

# list of NREL Hindcast data by year
year = np.arange(1979, 2011, 1, dtype=int)
wave_file = []
for i in year:
    a = "/nrel/US_wave/Atlantic/Atlantic_wave_"+str(i)+".h5"
    wave_file.append(a)

# Determine the starting point
existing_files = glob.glob('results_*.csv')
if existing_files:
    last_file = max(existing_files, key=os.path.getctime)
    start_i, start_j = map(int, last_file.split('_')[1:3])  # get indices from filename
else:
    start_i, start_j = 0, 0

for i in range(start_i, len(wave_file)):
    for j in range(start_j if i == start_i else 0, len(coord_MA)):
        print(i, j)
        with WaveX(wave_file[i], hsds=True) as f:
            lat_lon_swh = f.get_lat_lon_df('significant_wave_height', coord_MA[j])
        
        # Save to file
        lat_lon_swh.to_csv(f'results_{i}_{j}.csv')

        time.sleep(3)  # wait for 3 seconds

I have put 3 seconds here which is very conservative but I actually think you will still run into issues.

Please follow up and let me know how this goes.

3 replies

danuriyanto Jul 28, 2023
Author

Hi @ssolson, thank you for always being very helpful!

Thanks again for explaining the API limitation and the workarounds; I will try this and see if it can efficiently handle the requests. I will let you know how this goes!

Danu

danuriyanto Jul 31, 2023
Author

Hi @ssolson, the safest waiting time is at least 30 sec. I can extract all the coordinate pairs for one year with no errors! I am now processing to get the rest of the 31 years, and I will let you know if I am stuck with an error with this code.

Thank you,
Danu

ssolson Jul 31, 2023
Maintainer

Wow. I wanted to say something like 30s but it seemed so unreasonable compared to a normal API. Super happy this is working for you now. This is very hel;pful feedback as we look to improve our request in the future we can make these settings the default. Please let me know how the rest of the data fetching goes!

danuriyanto · 2023-07-31T20:00:46Z

danuriyanto
Jul 31, 2023
Author

Hi @ssolson,

I think it was again API related, after 3,629 min (approx. 3 days), an error occurred with following traceback:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\h5pyd\_hl\dataset.py:1171, in Dataset.__getitem__(self, args, new_dtype)
   1170 try:
-> 1171     rsp = self.GET(req, params=params, format="binary")
   1172 except IOError as ioe:

File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\h5pyd\_hl\base.py:980, in HLObject.GET(self, req, params, use_cache, format)
    979 if len(http_chunks) == 0:
--> 980     raise IOError("no data returned")
    981 if len(http_chunks) == 1:
    982     # can return first and only chunk as response

OSError: no data returned

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
Cell In[8], line 15
     13 for j in range(start_j if i == start_i else 0, len(coord_tuples)):
     14     print(i, j)
---> 15     with WaveX(wave_file[i], hsds=True) as f:
     16         lat_lon_swh = f.get_lat_lon_df('significant_wave_height', coord_tuples[j])
     18     # Save to file

Cell In[8], line 16
     14 print(i, j)
     15 with WaveX(wave_file[i], hsds=True) as f:
---> 16     lat_lon_swh = f.get_lat_lon_df('significant_wave_height', coord_tuples[j])
     18 # Save to file
     19 lat_lon_swh.to_csv(f'results_{i}_{j}.csv')

File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource_extraction\resource_extraction.py:789, in ResourceX.get_lat_lon_df(self, ds_name, lat_lon, check_lat_lon)
    766 """
    767 Extract timeseries of site(s) nearest to given lat_lon(s) and return
    768 as a DataFrame
   (...)
    786     Time-series DataFrame for given site(s) and dataset
    787 """
    788 gid = self.lat_lon_gid(lat_lon, check_lat_lon=check_lat_lon)
--> 789 df = self.get_gid_df(ds_name, gid)
    791 return df

File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource_extraction\resource_extraction.py:1913, in WaveX.get_gid_df(self, ds_name, gid)
   1911     df = df.reshape(ax1, ax2)
   1912 else:
-> 1913     df = self[ds_name, :, gid]
   1914     index = pd.Index(data=self.time_index, name='time_index')
   1916 if isinstance(gid, (int, np.integer)):

File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource_extraction\resource_extraction.py:104, in ResourceX.__getitem__(self, keys)
    103 def __getitem__(self, keys):
--> 104     return self.resource[keys]

File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:674, in BaseResource.__getitem__(self, keys)
    671         raise ResourceRuntimeError(msg)
    673 else:
--> 674     out = self._get_ds(ds, ds_slice)
    676 return out

File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:1337, in BaseResource._get_ds(self, ds_name, ds_slice)
   1335 if len(ds_slice) > len(ds.shape):
   1336     return self._get_ds_with_repeated_values(ds, ds_name, ds_slice)
-> 1337 return ResourceDataset.extract(ds, ds_slice,
   1338                                scale_attr=self.SCALE_ATTR,
   1339                                add_attr=self.ADD_ATTR,
   1340                                unscale=self._unscale)

File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:568, in ResourceDataset.extract(cls, ds, ds_slice, scale_attr, add_attr, unscale)
    548 """
    549 Extract data from Resource Dataset
    550 
   (...)
    563     Flag to unscale dataset data, by default True
    564 """
    565 dset = cls(ds, scale_attr=scale_attr, add_attr=add_attr,
    566            unscale=unscale)
--> 568 return dset[ds_slice]

File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:57, in ResourceDataset.__getitem__(self, ds_slice)
     54 def __getitem__(self, ds_slice):
     55     ds_slice = parse_slice(ds_slice)
---> 57     return self._get_ds_slice(ds_slice)

File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:538, in ResourceDataset._get_ds_slice(self, ds_slice)
    536         out = self._extract_list_slice(ds_slice)
    537 else:
--> 538     out = self._extract_ds_slice(ds_slice)
    540 if self._unscale:
    541     out = self._unscale_data(out)

File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\rex\resource.py:483, in ResourceDataset._extract_ds_slice(self, ds_slice)
    480     if ax_idx is not None:
    481         idx_slice += (ax_idx,)
--> 483 out = self.ds[slices]
    485 # check to see if idx_slice needs to be applied
    486 if any(s != slice(None) if isinstance(s, slice) else True
    487        for s in idx_slice):

File c:\Users\Danu\miniconda3\envs\rex\Lib\site-packages\h5pyd\_hl\dataset.py:1180, in Dataset.__getitem__(self, args, new_dtype)
   1178         break
   1179     else:
-> 1180         raise IOError(f"Error retrieving data: {ioe.errno}")
   1181 if isinstance(rsp, str):
   1182     # hexencoded response?
   1183     # this is returned by API Gateway for lamba responses
   1184     rsp = bytes.fromhex(rsp)

OSError: Error retrieving data: None```

0 replies

ssolson · 2023-07-31T21:28:50Z

ssolson
Jul 31, 2023
Maintainer

Your error is the first one

The API returned no data. It has kinda cut you off but that is why we saved along the way so we could pick up where we left off.

1 reply

danuriyanto Aug 2, 2023
Author

Hi @ssolson: [update about my data fetching progress...]

After my last update (successfully getting all the 1979 data from my coordinates list), I've been facing a constant error in the data fetching. I tried to:

Extend the waiting time to 600 sec (to clear out the possible waiting time issue).
Apply a new API key (to clear out the possible too much data request from one API).
Apply a fresh new Python environment (to clear the possible issue of dependency error--I am using Python 3.9).
Skip the current coordinate & year (I thought that particular dataset in the current [i] and [j] was problematic).

I still cannot continue the data fetching with the same error as above. I also tried accessing it via the MHKIT wave module and using maps.nrel.gov/marine, but again, there was no response. I think that's the case of data storage issues.

ssolson · 2023-08-02T15:48:48Z

ssolson
Aug 2, 2023
Maintainer

Radit I apologize on behalf of whoever created this API.

This has happend to me before as well. There comes a point where the API will no longer respond to me no matter if I change the API key, or try to VPN in. I at this point have historically worked on other things and eventually something in the system resets.

I would be interested if you:

Try a different machine

I have a hunch that bc chaninging the API key does not work that they might be using the MAC address. Although I might be consiracy land here as I have not checked if this is possible.

Using a non VPN different IP address

Once I traveled across the US and the API calls started responding to my laptop. So this was a new machine and IP address but I have not replicated this behavior. Maybe try at friends house or quick attempt at a coffee shop.

I know that ultimatly these responses are deeply unsatisfying and work arounds. If I had any control over the system at all I promise I would help you further. I encourage you to continue following up with the NREL team. I have hit many dead ends chasing down who is incharge of this system and they tend to not respond.

Hopefully in the mean time you and I can figure out how to work within this black box! Please let me know what you end up trying next.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any more efficient way to extract WPTO Hindcast data for large amount of lat-lon pairs? #253

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Is there any more efficient way to extract WPTO Hindcast data for large amount of lat-lon pairs? #253

danuriyanto Jul 28, 2023

Replies: 4 comments · 4 replies

ssolson Jul 28, 2023 Maintainer

danuriyanto Jul 28, 2023 Author

danuriyanto Jul 31, 2023 Author

ssolson Jul 31, 2023 Maintainer

danuriyanto Jul 31, 2023 Author

ssolson Jul 31, 2023 Maintainer

danuriyanto Aug 2, 2023 Author

ssolson Aug 2, 2023 Maintainer

danuriyanto
Jul 28, 2023

Replies: 4 comments 4 replies

ssolson
Jul 28, 2023
Maintainer

danuriyanto Jul 28, 2023
Author

danuriyanto Jul 31, 2023
Author

ssolson Jul 31, 2023
Maintainer

danuriyanto
Jul 31, 2023
Author

ssolson
Jul 31, 2023
Maintainer

danuriyanto Aug 2, 2023
Author

ssolson
Aug 2, 2023
Maintainer