Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Global Forest Watch download script #34

Open
rudokemper opened this issue Dec 11, 2024 · 0 comments
Open

Add a Global Forest Watch download script #34

rudokemper opened this issue Dec 11, 2024 · 0 comments
Labels
connectors Connector scripts for ETL from upstream data sources feature New specs for new behavior

Comments

@rudokemper
Copy link
Member

Feature Request

Let's add a script to download Global Forest Watch "Integrated Deforestation Alerts", and possibly also GLAD alerts, as a fallback source for change detection alerts for users who do not have access to bespoke alerts provided via alerts_gcs.

See here and here for additional rationale.

For reference, some past work to fetch and enqueue GFW in the thespian Python framework:

class GFWPull(TimerMixin, Component):
    """Polls the Global Forest Watch API for new change detection alerts."""

    def __init__(self, name="gfw-pull", initial_config={}):
        super().__init__(name, initial_config)

    def _gfw_http_headers(self):
        return {
            "x-api-key": self.local_config.get("auth_token"),
            "Content-Type": "application/json",
        }

    def _gfw_request_body(self, type_of_alert, coordinates, min_date):

        return {
            "geometry": {
                "type": "Polygon",
                "coordinates": coordinates,
            },
            "sql": f"SELECT latitude, longitude, {type_of_alert}__date, {type_of_alert}__confidence FROM results WHERE {type_of_alert}__date >= '{min_date}'",
        }

    def fetch_submissions(self):
        self.log.debug(f"Fetching submissions now!")

        seen_submissions = self.local_config.get("seen_submissions", {})

        type_of_alert = self.local_config.get("type_of_alert")
        coordinates = self.local_config.get("geo").get("coordinates")
        min_date = self.local_config.get("min_date")

        # GFW API documentation: https://www.globalforestwatch.org/help/developers/guides/query-data-for-a-custom-geometry/
        # TODO: Handle maximum allowed payload size of 6291556 bytes
        response = requests.post(
            f"https://data-api.globalforestwatch.org/dataset/{type_of_alert}/latest/query",
            headers=self._gfw_http_headers(),
            json=self._gfw_request_body(type_of_alert, coordinates, min_date),
        ).json()
        results = response.get("data", [])

        new_ids = set()

        for raw_submission in results:
            # GFW alerts do not have IDs. So, let's create a unique id by combining date, latitude, and longitude, and removing non-integer characters.
            id = f'{raw_submission["gfw_integrated_alerts__date"]}{raw_submission["latitude"]}{raw_submission["longitude"]}'
            id = re.sub(r'\D', '', id)

            seen_hash = seen_submissions.get(id)
            new_ids.add(id)   

            # ID is not present in previously seen IDs, or this alert has
            # been updated since it was last seen.
            if seen_hash is None or seen_hash != id:

                # Structure the feature as a GeoJSON Feature
                # (since that is frizzle's lingua franca for now)
                feature = {
                    "type": "Feature",
                    "id": id,
                    "properties": {
                        "_id": id,
                        "date": raw_submission["gfw_integrated_alerts__date"],
                        "confidence": raw_submission["gfw_integrated_alerts__confidence"],
                        "type_of_alert": type_of_alert,
                    },
                    "geometry": {
                        "type": "Point",
                        "coordinates": [raw_submission["longitude"], raw_submission["latitude"]],
                    }
                }

                self.enqueue(self.name, feature)

                # NOTE: Adding a submission to "raw submissions" does not mean
                # it has been successfully processed. It is the
                # responsibility of downstream listeners to mark messages
                # as "received" only if they have been successfully
                # processed, as is typical in message-passing systems.
                seen_submissions[id] = id
                self.set_config("seen_submissions", seen_submissions)
        
        seen_ids = set(seen_submissions.keys())

        deleted = seen_ids.difference(new_ids)
        handle_deleted_items(deleted, seen_ids, self.enqueue, self.set_config, self.name)

        self.log.debug(f"seen_submissions: {len(seen_submissions)}")
@rudokemper rudokemper added the feature New specs for new behavior label Dec 11, 2024
@rudokemper rudokemper changed the title [frizzle] Add a Global Forest Watch download script [connectors] Add a Global Forest Watch download script Dec 16, 2024
@rudokemper rudokemper added the connectors Connector scripts for ETL from upstream data sources label Jan 3, 2025
@rudokemper rudokemper changed the title [connectors] Add a Global Forest Watch download script Add a Global Forest Watch download script Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
connectors Connector scripts for ETL from upstream data sources feature New specs for new behavior
Projects
None yet
Development

No branches or pull requests

1 participant