Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Replace ECS with Harmony #613

Draft
wants to merge 20 commits into
base: low-hanging-refactors
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions HARMONY_MIGRATION_NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
## Assumptions that are different in Harmony

* We can't use short name and version with Harmony like we do with ECS, we have to use
Concept ID (or DOI). We need to get this from CMR using short name and version.
* Variable subsetting won't be supported on day 1.
* All the ICESat-2 products we currently support will not be supported on day 1.
* <https://nsidc.atlassian.net/wiki/spaces/DAACSW/pages/222593028/ICESat-2+data+sets+and+versions+we+are+supporting+for+Harmony>
* ECS and CMR shared some parameters. This is not the case with Harmony.


## Getting started on development

### Work so far

Work in progress is on the `harmony` branch. This depends on the `low-hanging-refactors`
branch being merged. A PR is open.

In addition to this work, refactoring, type checking, and type annotations have been
added to the codebase to support the migration to Harmony.


### Familiarize with Harmony

* Check out this amazing notebook provided by Amy Steiker and Patrick Quinn:
<https://github.com/nasa/harmony/blob/main/docs/Harmony%20API%20introduction.ipynb>
* Review the interactive API documentation:
<https://harmony.earthdata.nasa.gov/docs/api/>


### Getting started replacing ECS with Harmony

1. Find the `WIP` commit (`ac916d6`) and use `git reset` to restore the changes into the
working tree. There are several breakpoints set, as well as an artificially
introduced exception class to help trace and narrow the code paths during
refactoring.
2. Exercise a specific code path. For example:

```python
import icepyx as ipx
import datetime as dt

q = ipx.Query(
product="ATL06",
version="006",
spatial_extent=[-90, 68, 48, 90],
# "./doc/source/example_notebooks/supporting_files/simple_test_poly.gpkg",
date_range={
"start_date": dt.datetime(2018, 10, 10, 0, 10, 0),
"end_date": dt.datetime(2018, 10, 18, 14, 45, 30),
# "end_date": '2019-02-28',
}
)

q.download_granules("/tmp/icepyx")
```

3. Identify the first query to ECS. Queries, except the capabilities query in
`is2ref.py`, are formed from constants in `urls.py`. Continue this practice. Harmony
URLs in this file are placeholders.
4. Determine an equivalent Harmony query. The Harmony Coverages API has an equivalent to
the capabilities query in `is2ref.py`, for example.
5. Raise `RefactoringException` at the top of any functions or methods which currently
speak to ECS. This will help us find and delete those "dead code" functions later,
and prevent them from being inadvertently executed.
6. Write new functions or methods which speak to Harmony instead. It's important to
encapsulate the communication with the Harmony API in a single function. This may
mean replacing one function with several smaller functions during refactoring.
7. Maintain the high standard of documentation in the code. Include examples as doctests
in the new functions. Use Numpy style docstrings. **DO NOT** include type information
in docstrings -- write type annotations instead. They will be automatically
documented by the documentation generator.
8. Repeat from step 3 for the next EGI query.

### Watch out for broken assumptions

It's important to note that two major assumptions will require significant refactoring.
The type annotations will help with this process!

1. Broken assumption: "CMR and EGI share parameter sets". My mental model looks like:
* Current: User passes in parameters to `Query(...)`. Those params are used to generate
separate "CMR parameters" and "reqparams". "CMRparams" are spatial and temporal
parameters compatible with CMR. I'm not sure about the naming of "reqparams", but I
think of them as the EGI parameters (which may include more than the user passed, like
`page_size`) _minus_ the CMR spatial and temporal parameters. The actual queries
submitted to CMR and EGI are based on those generated parameter sets.
* Future: In Harmony-land, the shared parameter assumption is broken. CMR and Harmony's
Coverages API have completely parameter sets. The code can be drastically simplified:
User passes in parameters to `Query(...)`. Those params are used directly to generate
both CMR and Harmony queries without an intervening layer. E.g.
2. Broken assumption: "We can query with only short_name and version number". Harmony
requires a unique identifier (concept ID or DOI). E.g.:
<https://harmony.earthdata.nasa.gov/capabilities?collectionId=C1261703129-EEDTEST>
.
Since we want the user to be able to provide short_name and version, implementing the
concept ID as a `@cached_property` on `Query` which asks CMR for the concept ID makes
sense to me.


### Don't forget to enhance along the way

* Now that we're ripping things apart and changing parameters, I think it's important to
replace the TypedDict annotations we're using with Pydantic models. This will enable us
to better encapsulate validation code that's currently spread around.


## Integrating with other ongoing Icepyx work

Harmony is a major breaking change, so we'll be releasing it in Icepyx v2.

We know the community wants to break the API in some other ways, so we want to include those in v2 as well!

* Some of Icepyx's Query functionality is already served by earthaccess; refactor or replace the `Query` class?
* ?

Jessica is currently determining who can help work on these changes, and what that looks like. *If you, the
Harmony/ECS migration developer, identify opportunities to easily replace portions of Icepyx with _earthaccess_
or other libraries, take advantage of that opportunity.

## FAQ

### Which API?

Harmony has two APIs:

* [OGC Environmental Data Retrieval API](https://harmony.earthdata.nasa.gov/docs/edr-api)
* [OGC Coverages API](https://harmony.earthdata.nasa.gov/docs/api/)

Which should be used and when and why?


#### "Answer"

Use the [OGC Coverages API](https://harmony.earthdata.nasa.gov/docs/api/)!

> My take is that we ought to focus on the Coverages API for ICESat-2, since we aren’t
> making use of the new parameters. And this is what they primarily support. But I don’t
> have a good handle on whether we ought to pursue the EDR API at any point.
>
> - Amy Steiker

See this thread on EOSDIS Slack for more details:

<https://nsidc.slack.com/archives/CLC2SR1S6/p1716482829956969>
14 changes: 0 additions & 14 deletions icepyx/__init__.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,3 @@
from warnings import warn

deprecation_msg = """icepyx v1.x is being deprecated; the back-end systems on which it relies
will be shut down as of late 2024. At that time, upgrade to icepyx v2.x, which uses the
new NASA Harmony back-end, will be required. Please see
<https://icepyx.readthedocs.io/en/latest/user_guide/changelog/v1.3.0.html> for more
information!
"""
# IMPORTANT: This is being done before the other icepyx imports because the imported
# code changes warning filters. If this is done after the imports, the warning won't
# work.
warn(deprecation_msg, FutureWarning, stacklevel=2)


from _icepyx_version import version as __version__

from icepyx.core.query import GenQuery, Query
Expand Down
24 changes: 24 additions & 0 deletions icepyx/core/cmr.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
from typing import Final

import requests

from icepyx.core.urls import COLLECTION_SEARCH_BASE_URL

CMR_PROVIDER: Final = "NSIDC_CPRD"


def get_concept_id(*, product: str, version: str) -> str:
response = requests.get(
COLLECTION_SEARCH_BASE_URL,
params={
"short_name": product,
"version": version,
"provider": CMR_PROVIDER,
},
)
metadata = response.json()["feed"]["entry"]

if len(metadata) != 1:
raise RuntimeError(f"Expected 1 result from CMR, received {metadata}")

return metadata[0]["id"]


# TODO: Extract CMR collection query from granules.py
8 changes: 8 additions & 0 deletions icepyx/core/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,11 @@ class ExhaustiveTypeGuardException(TypeGuardException):
Used exclusively in cases where the typechecker needs a typeguard to tell it that a
check is exhaustive.
"""


class RefactoringException(Exception):
def __str__(self):
return (
"This code is being refactored."
" The code after this exception is expected to require major changes."
)
13 changes: 8 additions & 5 deletions icepyx/core/granules.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,12 @@
from icepyx.core.cmr import CMR_PROVIDER
import icepyx.core.exceptions
from icepyx.core.types import (
CMRParams,

Check failure on line 23 in icepyx/core/granules.py

View workflow job for this annotation

GitHub Actions / test

"CMRParams" is unknown import symbol (reportAttributeAccessIssue)
EGIRequiredParamsDownload,

Check failure on line 24 in icepyx/core/granules.py

View workflow job for this annotation

GitHub Actions / test

"EGIRequiredParamsDownload" is unknown import symbol (reportAttributeAccessIssue)
EGIRequiredParamsSearch,

Check failure on line 25 in icepyx/core/granules.py

View workflow job for this annotation

GitHub Actions / test

"EGIRequiredParamsSearch" is unknown import symbol (reportAttributeAccessIssue)
)
from icepyx.core.urls import DOWNLOAD_BASE_URL, GRANULE_SEARCH_BASE_URL, ORDER_BASE_URL
from icepyx.uat import EDL_ACCESS_TOKEN

Check failure on line 28 in icepyx/core/granules.py

View workflow job for this annotation

GitHub Actions / test

Import "icepyx.uat" could not be resolved (reportMissingImports)


def info(grans: list[dict]) -> dict[str, Union[int, float]]:
Expand Down Expand Up @@ -228,7 +229,11 @@
# if not hasattr(self, 'avail'):
self.avail = []

headers = {"Accept": "application/json", "Client-Id": "icepyx"}
headers = {
"Accept": "application/json",
"Client-Id": "icepyx",
"Authorization": f"Bearer {EDL_ACCESS_TOKEN}",
}
# note we should also check for errors whenever we ping NSIDC-API -
# make a function to check for errors

Expand Down Expand Up @@ -332,6 +337,7 @@
--------
query.Query.order_granules
"""
raise icepyx.core.exceptions.RefactoringException

self.get_avail(CMRparams, reqparams)

Expand Down Expand Up @@ -366,6 +372,7 @@
total_pages,
" is submitting to NSIDC",
)
breakpoint()
request_params.update({"page_num": page_num})

request = self.session.get(ORDER_BASE_URL, params=request_params)
Expand Down Expand Up @@ -523,10 +530,6 @@
--------
query.Query.download_granules
"""
"""
extract : boolean, default False
Unzip the downloaded granules.
"""

# DevNote: this will replace any existing orderIDs with the saved list
# (could create confusion depending on whether download was interrupted or kernel restarted)
Expand Down
13 changes: 13 additions & 0 deletions icepyx/core/harmony.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from typing import Any

import requests

from icepyx.core.urls import CAPABILITIES_BASE_URL


def get_capabilities(concept_id: str) -> dict[str, Any]:
response = requests.get(
CAPABILITIES_BASE_URL,
params={"collectionId": concept_id},
)
return response.json()
15 changes: 11 additions & 4 deletions icepyx/core/is2ref.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
import numpy as np
import requests

from icepyx.core.urls import COLLECTION_SEARCH_BASE_URL, EGI_BASE_URL
from icepyx.core.exceptions import RefactoringException
from icepyx.core.urls import COLLECTION_SEARCH_BASE_URL

# ICESat-2 specific reference functions

Expand Down Expand Up @@ -92,16 +93,21 @@ def about_product(prod: str) -> dict:
# DevGoal: use a mock of this output to test later functions, such as displaying options and widgets, etc.
# options to get customization options for ICESat-2 data (though could be used generally)
def _get_custom_options(session, product, version):
"""
Get lists of what customization options are available for the product from NSIDC.
"""
"""Get lists of available customization options from Harmony."""
raise RefactoringException

cust_options = {}

if session is None:
raise ValueError(
"Don't forget to log in to Earthdata using query.earthdata_login()"
)

# concept_id_query_url = f"{COLLECTION_SEARCH_BASE_URL}?short_name={product}&version={version}"
# concept_id = session.get(concept_id_query_url).json()["feed"]["entry"][-1]["id"]
# capability_url = f"{CAPABILITIES_BASE_URL}?collectionId={concept_id}"
# response_json = session.get(capability_url).json()

capability_url = f"{EGI_BASE_URL}/capabilities/{product}.{version}.xml"
response = session.get(capability_url)
root = ET.fromstring(response.content)
Expand All @@ -111,6 +117,7 @@ def _get_custom_options(session, product, version):
cust_options.update({"options": subagent})

# reformatting
# cust_options.update({"fileformats": response_json["outputFormats"]})
formats = [Format.attrib for Format in root.iter("Format")]
format_vals = [formats[i]["value"] for i in range(len(formats))]
try:
Expand Down
27 changes: 19 additions & 8 deletions icepyx/core/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,17 @@

import icepyx.core.APIformatting as apifmt
from icepyx.core.auth import EarthdataAuthMixin
from icepyx.core.exceptions import DeprecationError
from icepyx.core.cmr import get_concept_id
from icepyx.core.exceptions import DeprecationError, RefactoringException
import icepyx.core.granules as granules
from icepyx.core.granules import Granules
import icepyx.core.is2ref as is2ref
import icepyx.core.spatial as spat
import icepyx.core.temporal as tp
from icepyx.core.types import (
CMRParams,

Check failure on line 21 in icepyx/core/query.py

View workflow job for this annotation

GitHub Actions / test

"CMRParams" is unknown import symbol (reportAttributeAccessIssue)
EGIParamsSubset,

Check failure on line 22 in icepyx/core/query.py

View workflow job for this annotation

GitHub Actions / test

"EGIParamsSubset" is unknown import symbol (reportAttributeAccessIssue)
EGIRequiredParams,

Check failure on line 23 in icepyx/core/query.py

View workflow job for this annotation

GitHub Actions / test

"EGIRequiredParams" is unknown import symbol (reportAttributeAccessIssue)
EGIRequiredParamsDownload,
)
import icepyx.core.validate_inputs as val
Expand Down Expand Up @@ -464,6 +465,13 @@
self.spatial_extent, self.dates, self.product, self.product_version
)

@cached_property
def concept_id(self) -> str:
return get_concept_id(
product=self.product,
version=self.product_version,
)

@property
def dataset(self) -> Never:
"""
Expand Down Expand Up @@ -605,6 +613,7 @@
>>> reg_a.reqparams # doctest: +SKIP
{'short_name': 'ATL06', 'version': '006', 'page_size': 2000, 'page_num': 1, 'request_mode': 'async', 'include_meta': 'Y', 'client_string': 'icepyx'}
"""
raise RefactoringException

if not hasattr(self, "_reqparams"):
self._reqparams = apifmt.Parameters("required", reqtype="search")
Expand Down Expand Up @@ -641,6 +650,8 @@
{'time': '2019-02-20T00:00:00,2019-02-28T23:59:59',
'bbox': '-55.0,68.0,-48.0,71.0'}
"""
raise RefactoringException

if not hasattr(self, "_subsetparams"):
self._subsetparams = apifmt.Parameters("subset")

Expand Down Expand Up @@ -977,16 +988,16 @@

Parameters
----------
verbose : boolean, default False
verbose :
Print out all feedback available from the order process.
Progress information is automatically printed regardless of the value of verbose.
subset : boolean, default True
subset :
Apply subsetting to the data order from the NSIDC, returning only data that meets the
subset parameters. Spatial and temporal subsetting based on the input parameters happens
by default when subset=True, but additional subsetting options are available.
Spatial subsetting returns all data that are within the area of interest (but not complete
granules. This eliminates false-positive granules returned by the metadata-level search)
email: boolean, default False
email :
Have NSIDC auto-send order status email updates to indicate order status as pending/completed.
The emails are sent to the account associated with your Earthdata account.
**kwargs : key-value pairs
Expand All @@ -1013,6 +1024,8 @@
.
Retry request status is: complete
"""
breakpoint()
raise RefactoringException

if not hasattr(self, "reqparams"):
self.reqparams
Expand Down Expand Up @@ -1106,10 +1119,6 @@
See Also
--------
granules.download
"""
"""
extract : boolean, default False
Unzip the downloaded granules.

Examples
--------
Expand All @@ -1131,6 +1140,8 @@
or len(self.granules.orderIDs) == 0
):
self.order_granules(verbose=verbose, subset=subset, **kwargs)
breakpoint()
raise RefactoringException

self.granules.download(verbose, path, restart=restart)

Expand Down
Loading
Loading