-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding marble specific fields to item properties #30
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -80,6 +80,8 @@ class CMIP6ItemProperties(STACItemProperties, validate_assignment=True): | |
license: str | ||
grid: str | ||
mip_era: str | ||
is_replica: bool = Field(..., serialization_alias="mrbl:is_replica") | ||
host_node: str = Field(..., serialization_alias="mrbl:host_node") | ||
Comment on lines
+83
to
+84
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These attributes would apply to every dataset hosted on the node, so I suggest they should be defined inside a Parent class that every collection inherits from. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we could also avoid the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we should use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I understood your intention and I think the proposal can be useful. But I didn't understand (or can't quite visualize) your proposal for how to do that. Could you please elaborate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This class inherits from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right I see. |
||
|
||
model_config = ConfigDict(alias_generator=add_cmip6_prefix, populate_by_name=True) | ||
|
||
|
@@ -108,7 +110,9 @@ class CMIP6populator(STACpopulatorBase): | |
item_properties_model = CMIP6ItemProperties | ||
item_geometry_model = GeoJSONPolygon | ||
|
||
def __init__(self, stac_host: str, data_loader: GenericLoader, update: Optional[bool] = False) -> None: | ||
def __init__( | ||
self, stac_host: str, data_loader: GenericLoader, host_node: str, update: Optional[bool] = False | ||
) -> None: | ||
"""Constructor | ||
|
||
:param stac_host: URL to the STAC API | ||
|
@@ -117,6 +121,7 @@ def __init__(self, stac_host: str, data_loader: GenericLoader, update: Optional[ | |
:type thredds_catalog_url: str | ||
""" | ||
super().__init__(stac_host, data_loader, update) | ||
self.host_node = host_node | ||
|
||
@staticmethod | ||
def make_cmip6_item_id(attrs: MutableMapping[str, Any]) -> str: | ||
|
@@ -146,11 +151,20 @@ def create_stac_item(self, item_name: str, item_data: MutableMapping[str, Any]) | |
""" | ||
iid = self.make_cmip6_item_id(item_data["attributes"]) | ||
|
||
try: | ||
item = STAC_item_from_metadata(iid, item_data, self.item_properties_model, self.item_geometry_model) | ||
except pydantic_core._pydantic_core.ValidationError: | ||
print(f"ERROR: ValidationError for {iid}") | ||
return -1 | ||
# Adding some Marble network related data to the item properties. | ||
# is_replica is always False when using this app since this app is only used to populate the hostnode | ||
# with the data residing on that node. Replica STAC items on other nodes in the Marble network will be | ||
# populated using a different app, and there the is_replica flag will be True. | ||
Comment on lines
+156
to
+157
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what is meant exactly by replica here. In the ESGF world, this has to do with who mirrors from whom. Will MARBLE include replication of datasets? Also, more generally, if you are going to curate copies of ESGF files that you aggregate yourself, how are you going to determine whether one of the files making the aggregation has been updated, and this update needs to be propagated to your own version? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, you say that replica items would be populated by another app. Why would that be? The replica is going to be a netCDF file sitting on a THREDDS server. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or do you mean the replica of a STAC item (just the catalog entry, not the file) ? In that case, I think we need to work out how STAC catalogs are going to be shared from one node to another. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not, datasets will not be replicated. We don't have enough space on the network. What I meant is that we have plans to ensure that STAC entries from one node are visible in the STAC browser of another node. As an example, at some point the Ouranos STAC catalog will also have a "CMIP6-UofT" collection whose entries be (near) duplicates of the entries on UofT's STAC catalog for the "CMIP6-UofT" (all the properties will be same and crucially, access links will be same), but
It is my understanding that when files are updated on ESGF, they are marked with a new version number, existing files are never overridden. We should follow the same approach. The aggregations will be for all files in a single version. If, for example, one of the files in say an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes just STAC items. We will be doing this by completing the implementation of STACLoader. This will crawl a STAC catalog, get the JSON data, modify it as needed (for example to change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something to consider for mirrors or "replica": https://github.com/stac-extensions/alternate-assets |
||
item_data["attributes"]["is_replica"] = False | ||
item_data["attributes"]["host_node"] = self.host_node | ||
|
||
item = STAC_item_from_metadata(iid, item_data, self.item_properties_model, self.item_geometry_model) | ||
# try: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove debugging code |
||
# item = STAC_item_from_metadata(iid, item_data, self.item_properties_model, self.item_geometry_model) | ||
# except pydantic_core._pydantic_core.ValidationError as e: | ||
# print(f"ERROR: ValidationError for {iid}") | ||
# # e.errors() | ||
# return -1 | ||
|
||
# Add the CMIP6 STAC extension | ||
item.stac_extensions.append( | ||
|
@@ -173,6 +187,7 @@ def create_stac_item(self, item_name: str, item_data: MutableMapping[str, Any]) | |
parser = argparse.ArgumentParser(prog="CMIP6 STAC populator") | ||
parser.add_argument("stac_host", type=str, help="STAC API address") | ||
parser.add_argument("thredds_catalog_URL", type=str, help="URL to the CMIP6 THREDDS catalog") | ||
parser.add_argument("host_node", type=str, help="Name of the Marble node on which the data resides") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @fmigneault that this could be a flagged option. If we don't make this optional then we can never use this implementation to populate a stac instance that is not part of the Marble network. However, I think that we should add some validation checks here as well if
If any of these conditions are false, we should raise an error with an appropriate message. |
||
parser.add_argument("--update", action="store_true", help="Update collection and its items") | ||
|
||
args = parser.parse_args() | ||
|
@@ -187,5 +202,5 @@ def create_stac_item(self, item_name: str, item_data: MutableMapping[str, Any]) | |
# To be implemented | ||
data_loader = ErrorLoader(args.error_file) | ||
|
||
c = CMIP6populator(args.stac_host, data_loader, args.update) | ||
c = CMIP6populator(args.stac_host, data_loader, args.host_node, args.update) | ||
c.ingest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this command is becoming too cluttered with optional arguments.
Can you add a
--host-node
option or similar?I think it should use the URL of the
STAC_HOST
by default.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can streamline this.