icesat2py · mfisher87 · Sep 16, 2024 · Sep 17, 2024 · Sep 17, 2024 · Sep 17, 2024
diff --git a/HARMONY_MIGRATION_NOTES.md b/HARMONY_MIGRATION_NOTES.md
@@ -0,0 +1,143 @@
+## Assumptions that are different in Harmony
+
+* We can't use short name and version with Harmony like we do with ECS, we have to use
+  Concept ID (or DOI). We need to get this from CMR using short name and version.
+* Variable subsetting won't be supported on day 1.
+* All the ICESat-2 products we currently support will not be supported on day 1.
+    * <https://nsidc.atlassian.net/wiki/spaces/DAACSW/pages/222593028/ICESat-2+data+sets+and+versions+we+are+supporting+for+Harmony>
+* ECS and CMR shared some parameters. This is not the case with Harmony.
+
+
+## Getting started on development
+
+### Work so far
+
+Work in progress is on the `harmony` branch. This depends on the `low-hanging-refactors`
+branch being merged. A PR is open.
+
+In addition to this work, refactoring, type checking, and type annotations have been
+added to the codebase to support the migration to Harmony.
+
+
+### Familiarize with Harmony
+
+* Check out this amazing notebook provided by Amy Steiker and Patrick Quinn:
+  <https://github.com/nasa/harmony/blob/main/docs/Harmony%20API%20introduction.ipynb>
+* Review the interactive API documentation:
+  <https://harmony.earthdata.nasa.gov/docs/api/>
+
+
+### Getting started replacing ECS with Harmony
+
+1. Find the `WIP` commit (`ac916d6`) and use `git reset` to restore the changes into the
+   working tree. There are several breakpoints set, as well as an artificially
+   introduced exception class to help trace and narrow the code paths during
+   refactoring.
+2. Exercise a specific code path. For example:
+
+    ```python
+    import icepyx as ipx
+    import datetime as dt
+
+    q = ipx.Query(
+        product="ATL06",
+        version="006",
+        spatial_extent=[-90, 68, 48, 90],
+        # "./doc/source/example_notebooks/supporting_files/simple_test_poly.gpkg",
+        date_range={
+            "start_date": dt.datetime(2018, 10, 10, 0, 10, 0),
+            "end_date": dt.datetime(2018, 10, 18, 14, 45, 30),
+            # "end_date": '2019-02-28',
+        }
+    )
+
+    q.download_granules("/tmp/icepyx")
+    ```
+
+3. Identify the first query to ECS. Queries, except the capabilities query in
+   `is2ref.py`, are formed from constants in `urls.py`. Continue this practice. Harmony
+   URLs in this file are placeholders.
+4. Determine an equivalent Harmony query. The Harmony Coverages API has an equivalent to
+   the capabilities query in `is2ref.py`, for example.
+5. Raise `RefactoringException` at the top of any functions or methods which currently
+   speak to ECS. This will help us find and delete those "dead code" functions later,
+   and prevent them from being inadvertently executed.
+6. Write new functions or methods which speak to Harmony instead. It's important to
+   encapsulate the communication with the Harmony API in a single function. This may
+   mean replacing one function with several smaller functions during refactoring.
+7. Maintain the high standard of documentation in the code. Include examples as doctests
+   in the new functions. Use Numpy style docstrings. **DO NOT** include type information
+   in docstrings -- write type annotations instead. They will be automatically
+   documented by the documentation generator.
+8. Repeat from step 3 for the next EGI query.
+
+### Watch out for broken assumptions
+
+It's important to note that two major assumptions will require significant refactoring.
+The type annotations will help with this process!
+
+1. Broken assumption: "CMR and EGI share parameter sets". My mental model looks like:
+  * Current: User passes in parameters to `Query(...)`. Those params are used to generate
+    separate "CMR parameters" and "reqparams". "CMRparams" are spatial and temporal
+    parameters compatible with CMR. I'm not sure about the naming of "reqparams", but I
+    think of them as the EGI parameters (which may include more than the user passed, like
+    `page_size`) _minus_ the CMR spatial and temporal parameters. The actual queries
+    submitted to CMR and EGI are based on those generated parameter sets.
+  * Future: In Harmony-land, the shared parameter assumption is broken. CMR and Harmony's
+    Coverages API have completely parameter sets. The code can be drastically simplified:
+    User passes in parameters to `Query(...)`. Those params are used directly to generate
+    both CMR and Harmony queries without an intervening layer. E.g.
+2. Broken assumption: "We can query with only short_name and version number". Harmony
+   requires a unique identifier (concept ID or DOI). E.g.:
+   <https://harmony.earthdata.nasa.gov/capabilities?collectionId=C1261703129-EEDTEST>
+   .
+   Since we want the user to be able to provide short_name and version, implementing the
+   concept ID as a `@cached_property` on `Query` which asks CMR for the concept ID makes
+   sense to me.
+
+
+### Don't forget to enhance along the way
+
+* Now that we're ripping things apart and changing parameters, I think it's important to
+  replace the TypedDict annotations we're using with Pydantic models. This will enable us
+  to better encapsulate validation code that's currently spread around.
+
+
+## Integrating with other ongoing Icepyx work
+
+Harmony is a major breaking change, so we'll be releasing it in Icepyx v2.
+
+We know the community wants to break the API in some other ways, so we want to include those in v2 as well!
+
+* Some of Icepyx's Query functionality is already served by earthaccess; refactor or replace the `Query` class?
+* ?
+
+Jessica is currently determining who can help work on these changes, and what that looks like. *If you, the
+Harmony/ECS migration developer, identify opportunities to easily replace portions of Icepyx with _earthaccess_
+or other libraries, take advantage of that opportunity.
+
+## FAQ
+
+### Which API?
+
+Harmony has two APIs:
+
+* [OGC Environmental Data Retrieval API](https://harmony.earthdata.nasa.gov/docs/edr-api)
+* [OGC Coverages API](https://harmony.earthdata.nasa.gov/docs/api/)
+
+Which should be used and when and why?
+
+
+#### "Answer"
+
+Use the [OGC Coverages API](https://harmony.earthdata.nasa.gov/docs/api/)!
+
+> My take is that we ought to focus on the Coverages API for ICESat-2, since we aren’t
+> making use of the new parameters. And this is what they primarily support. But I don’t
+> have a good handle on whether we ought to pursue the EDR API at any point.
+>
+> - Amy Steiker
+
+See this thread on EOSDIS Slack for more details:
+
+<https://nsidc.slack.com/archives/CLC2SR1S6/p1716482829956969>
diff --git a/icepyx/__init__.py b/icepyx/__init__.py
@@ -1,17 +1,3 @@
-from warnings import warn
-
-deprecation_msg = """icepyx v1.x is being deprecated; the back-end systems on which it relies
-will be shut down as of late 2024. At that time, upgrade to icepyx v2.x, which uses the
-new NASA Harmony back-end, will be required. Please see
-<https://icepyx.readthedocs.io/en/latest/user_guide/changelog/v1.3.0.html> for more
-information!
-"""
-# IMPORTANT: This is being done before the other icepyx imports because the imported
-# code changes warning filters. If this is done after the imports, the warning won't
-# work.
-warn(deprecation_msg, FutureWarning, stacklevel=2)
-
-
 from _icepyx_version import version as __version__
 
 from icepyx.core.query import GenQuery, Query

diff --git a/icepyx/core/cmr.py b/icepyx/core/cmr.py
@@ -1,3 +1,27 @@
 from typing import Final
 
+import requests
+
+from icepyx.core.urls import COLLECTION_SEARCH_BASE_URL
+
 CMR_PROVIDER: Final = "NSIDC_CPRD"
+
+
+def get_concept_id(*, product: str, version: str) -> str:
+    response = requests.get(
+        COLLECTION_SEARCH_BASE_URL,
+        params={
+            "short_name": product,
+            "version": version,
+            "provider": CMR_PROVIDER,
+        },
+    )
+    metadata = response.json()["feed"]["entry"]
+
+    if len(metadata) != 1:
+        raise RuntimeError(f"Expected 1 result from CMR, received {metadata}")
+
+    return metadata[0]["id"]
+
+
+# TODO: Extract CMR collection query from granules.py
diff --git a/icepyx/core/exceptions.py b/icepyx/core/exceptions.py
@@ -53,3 +53,11 @@ class ExhaustiveTypeGuardException(TypeGuardException):
     Used exclusively in cases where the typechecker needs a typeguard to tell it that a
     check is exhaustive.
     """
+
+
+class RefactoringException(Exception):
+    def __str__(self):
+        return (
+            "This code is being refactored."
+            " The code after this exception is expected to require major changes."
+        )
diff --git a/icepyx/core/granules.py b/icepyx/core/granules.py
@@ -20,11 +20,12 @@
 from icepyx.core.cmr import CMR_PROVIDER
 import icepyx.core.exceptions
 from icepyx.core.types import (
    CMRParams,
    EGIRequiredParamsDownload,
     EGIRequiredParamsSearch,
 )
 from icepyx.core.urls import DOWNLOAD_BASE_URL, GRANULE_SEARCH_BASE_URL, ORDER_BASE_URL
+from icepyx.uat import EDL_ACCESS_TOKEN
 
 
 def info(grans: list[dict]) -> dict[str, Union[int, float]]:
@@ -228,7 +229,11 @@
         # if not hasattr(self, 'avail'):
         self.avail = []
 
-        headers = {"Accept": "application/json", "Client-Id": "icepyx"}
+        headers = {
+            "Accept": "application/json",
+            "Client-Id": "icepyx",
+            "Authorization": f"Bearer {EDL_ACCESS_TOKEN}",
+        }
         # note we should also check for errors whenever we ping NSIDC-API -
         # make a function to check for errors
 
@@ -332,6 +337,7 @@
         --------
         query.Query.order_granules
         """
+        raise icepyx.core.exceptions.RefactoringException
 
         self.get_avail(CMRparams, reqparams)
 
@@ -366,6 +372,7 @@
                 total_pages,
                 " is submitting to NSIDC",
             )
+            breakpoint()
             request_params.update({"page_num": page_num})
 
             request = self.session.get(ORDER_BASE_URL, params=request_params)
@@ -523,10 +530,6 @@
         --------
         query.Query.download_granules
         """
-        """
-        extract : boolean, default False
-            Unzip the downloaded granules.
-        """
 
         # DevNote: this will replace any existing orderIDs with the saved list
         # (could create confusion depending on whether download was interrupted or kernel restarted)

diff --git a/icepyx/core/harmony.py b/icepyx/core/harmony.py
@@ -0,0 +1,13 @@
+from typing import Any
+
+import requests
+
+from icepyx.core.urls import CAPABILITIES_BASE_URL
+
+
+def get_capabilities(concept_id: str) -> dict[str, Any]:
+    response = requests.get(
+        CAPABILITIES_BASE_URL,
+        params={"collectionId": concept_id},
+    )
+    return response.json()
diff --git a/icepyx/core/is2ref.py b/icepyx/core/is2ref.py
@@ -8,7 +8,8 @@
 import numpy as np
 import requests
 
-from icepyx.core.urls import COLLECTION_SEARCH_BASE_URL, EGI_BASE_URL
+from icepyx.core.exceptions import RefactoringException
+from icepyx.core.urls import COLLECTION_SEARCH_BASE_URL
 
 # ICESat-2 specific reference functions
 
@@ -92,16 +93,21 @@ def about_product(prod: str) -> dict:
 # DevGoal: use a mock of this output to test later functions, such as displaying options and widgets, etc.
 # options to get customization options for ICESat-2 data (though could be used generally)
 def _get_custom_options(session, product, version):
-    """
-    Get lists of what customization options are available for the product from NSIDC.
-    """
+    """Get lists of available customization options from Harmony."""
+    raise RefactoringException
+
     cust_options = {}
 
     if session is None:
         raise ValueError(
             "Don't forget to log in to Earthdata using query.earthdata_login()"
         )
 
+    # concept_id_query_url = f"{COLLECTION_SEARCH_BASE_URL}?short_name={product}&version={version}"
+    # concept_id = session.get(concept_id_query_url).json()["feed"]["entry"][-1]["id"]
+    # capability_url = f"{CAPABILITIES_BASE_URL}?collectionId={concept_id}"
+    # response_json = session.get(capability_url).json()
+
     capability_url = f"{EGI_BASE_URL}/capabilities/{product}.{version}.xml"
     response = session.get(capability_url)
     root = ET.fromstring(response.content)
@@ -111,6 +117,7 @@ def _get_custom_options(session, product, version):
     cust_options.update({"options": subagent})
 
     # reformatting
+    # cust_options.update({"fileformats": response_json["outputFormats"]})
     formats = [Format.attrib for Format in root.iter("Format")]
     format_vals = [formats[i]["value"] for i in range(len(formats))]
     try:

diff --git a/icepyx/core/query.py b/icepyx/core/query.py
@@ -10,16 +10,17 @@
 
 import icepyx.core.APIformatting as apifmt
 from icepyx.core.auth import EarthdataAuthMixin
-from icepyx.core.exceptions import DeprecationError
+from icepyx.core.cmr import get_concept_id
+from icepyx.core.exceptions import DeprecationError, RefactoringException
 import icepyx.core.granules as granules
 from icepyx.core.granules import Granules
 import icepyx.core.is2ref as is2ref
 import icepyx.core.spatial as spat
 import icepyx.core.temporal as tp
 from icepyx.core.types import (
    CMRParams,
    EGIParamsSubset,
    EGIRequiredParams,
    EGIRequiredParamsDownload,
 )
 import icepyx.core.validate_inputs as val
@@ -464,6 +465,13 @@
             self.spatial_extent, self.dates, self.product, self.product_version
         )
 
+    @cached_property
+    def concept_id(self) -> str:
+        return get_concept_id(
+            product=self.product,
+            version=self.product_version,
+        )
+
     @property
     def dataset(self) -> Never:
         """
@@ -605,6 +613,7 @@
         >>> reg_a.reqparams # doctest: +SKIP
         {'short_name': 'ATL06', 'version': '006', 'page_size': 2000, 'page_num': 1, 'request_mode': 'async', 'include_meta': 'Y', 'client_string': 'icepyx'}
         """
+        raise RefactoringException
 
         if not hasattr(self, "_reqparams"):
             self._reqparams = apifmt.Parameters("required", reqtype="search")
@@ -641,6 +650,8 @@
         {'time': '2019-02-20T00:00:00,2019-02-28T23:59:59',
         'bbox': '-55.0,68.0,-48.0,71.0'}
         """
+        raise RefactoringException
+
         if not hasattr(self, "_subsetparams"):
             self._subsetparams = apifmt.Parameters("subset")
 
@@ -977,16 +988,16 @@
 
         Parameters
         ----------
-        verbose : boolean, default False
+        verbose :
             Print out all feedback available from the order process.
             Progress information is automatically printed regardless of the value of verbose.
-        subset : boolean, default True
+        subset :
             Apply subsetting to the data order from the NSIDC, returning only data that meets the
             subset parameters. Spatial and temporal subsetting based on the input parameters happens
             by default when subset=True, but additional subsetting options are available.
             Spatial subsetting returns all data that are within the area of interest (but not complete
             granules. This eliminates false-positive granules returned by the metadata-level search)
-        email: boolean, default False
+        email :
             Have NSIDC auto-send order status email updates to indicate order status as pending/completed.
             The emails are sent to the account associated with your Earthdata account.
         **kwargs : key-value pairs
@@ -1013,6 +1024,8 @@
         .
         Retry request status is: complete
         """
+        breakpoint()
+        raise RefactoringException
 
         if not hasattr(self, "reqparams"):
             self.reqparams
@@ -1106,10 +1119,6 @@
         See Also
         --------
         granules.download
-        """
-        """
-        extract : boolean, default False
-            Unzip the downloaded granules.
 
         Examples
         --------
@@ -1131,6 +1140,8 @@
                 or len(self.granules.orderIDs) == 0
             ):
                 self.order_granules(verbose=verbose, subset=subset, **kwargs)
+        breakpoint()
+        raise RefactoringException
 
         self.granules.download(verbose, path, restart=restart)