From b2cc1d1424cab6ef28e176327afb7cb3e1318ace Mon Sep 17 00:00:00 2001 From: Andrew Huang Date: Wed, 27 Mar 2024 11:55:16 -0700 Subject: [PATCH] Rename --- notebooks/2.0_Spectral_Clustering_PC.ipynb | 957 +++++++++++++++++++++ 1 file changed, 957 insertions(+) create mode 100644 notebooks/2.0_Spectral_Clustering_PC.ipynb diff --git a/notebooks/2.0_Spectral_Clustering_PC.ipynb b/notebooks/2.0_Spectral_Clustering_PC.ipynb new file mode 100644 index 0000000..50a74be --- /dev/null +++ b/notebooks/2.0_Spectral_Clustering_PC.ipynb @@ -0,0 +1,957 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "95d6fc25-545a-4bcc-97ff-a17df7f6082e", + "metadata": {}, + "source": [ + "![Spectral Clustering](images/spectral_clustering_lake.png \"Spectral Clustering\")" + ] + }, + { + "cell_type": "markdown", + "id": "4257bf18-ba7c-42a6-81e3-2b5e48b3bc8b", + "metadata": {}, + "source": [ + "# Spectral Clustering" + ] + }, + { + "cell_type": "markdown", + "id": "796926b4-a134-4daf-ae19-46978acf5e89", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "9e7946b7-d7c6-4d60-bfb2-474188cfeb54", + "metadata": {}, + "source": [ + "## Overview" + ] + }, + { + "cell_type": "markdown", + "id": "09b5645f-ea45-4f49-9811-18bff4192034", + "metadata": {}, + "source": [ + "The current notebook will demonstrate a simplified machine learning approach to observe the change in a lake water's extent across time. In order to identify the water, we can use spectral clustering to classify each grid cell into a category based on the similarity of the combined set of pixels across [wavelength-bands](./0.0_Intro_Landsat) in our image stacks.\n", + "\n", + "Our example approach uses a version of spectral clustering from [dask_ml](http://ml.dask.org/clustering.html#spectral-clustering) that is a scalable equivalent of what is available in [scikit-learn](https://scikit-learn.org/stable/modules/clustering.html#spectral-clustering). We will begin this approach with a single image stack and then conduct a direct comparison on the results from different time points.\n", + "\n", + "This workflow uses data from Microsoft Planetary Computer but it can be adapted to work with any data ingestion approach from this cookbook." + ] + }, + { + "cell_type": "markdown", + "id": "f2d4e405-956d-4400-86f5-219664ef7b79", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "\n", + "| Concepts | Importance | Notes |\n", + "| --- | --- | --- |\n", + "| [Data Ingestion - Planetary Computer](1.0_Data_Ingestion-Planetary_Computer.ipynb) | Necessary | |\n", + "|[scikit-learn](https://scikit-learn.org/stable/modules/clustering.html#spectral-clustering) | Helpful | Spectral clustering |\n", + "| [dask_ml](http://ml.dask.org/clustering.html#spectral-clustering) | Helpful | Spectral clustering at scale | \n", + "\n", + "\n", + "- **Time to learn**: 20 minutes." + ] + }, + { + "cell_type": "markdown", + "id": "dd99cc55-fb0f-4bdf-bc7a-82044188a2f2", + "metadata": {}, + "source": [ + "## Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e106b4ce-e682-4a71-817b-966bbf926989", + "metadata": {}, + "outputs": [], + "source": [ + "# Data\n", + "import numpy as np\n", + "import odc.stac\n", + "import pandas as pd\n", + "import planetary_computer\n", + "import pystac_client\n", + "import xarray as xr\n", + "from dask.distributed import Client\n", + "from pystac.extensions.eo import EOExtension as eo\n", + "\n", + "# Analysis\n", + "from dask_ml.cluster import SpectralClustering\n", + "\n", + "# Viz\n", + "import hvplot.xarray" + ] + }, + { + "cell_type": "markdown", + "id": "2eef62e1-2df6-4e67-9e10-92c9bf74c136", + "metadata": {}, + "source": [ + "## Loading Data" + ] + }, + { + "cell_type": "markdown", + "id": "41855f82-661f-4c02-9c48-6893a694bfe4", + "metadata": {}, + "source": [ + "Let's start by loading some Landsat data. These steps are covered in the [Data Ingestion - Planetary Computer](1.0_Data_Ingestion-Planetary_Computer.ipynb) prerequisite." + ] + }, + { + "cell_type": "markdown", + "id": "5dbc14ea-a560-4b01-81c0-4fc01f767de9", + "metadata": {}, + "source": [ + "### Search the catalog" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5059488-8f7e-446f-97d4-992eab1e7928", + "metadata": {}, + "outputs": [], + "source": [ + "catalog = pystac_client.Client.open(\n", + " \"https://planetarycomputer.microsoft.com/api/stac/v1\",\n", + " modifier=planetary_computer.sign_inplace,\n", + ")\n", + "\n", + "bbox = [-118.89, 38.54, -118.57, 38.84] # Region over a lake in Nevada, USA\n", + "datetime = \"2017-06-01/2017-09-30\" # Summer months of 2017\n", + "collection = \"landsat-c2-l2\"\n", + "platform = \"landsat-8\"\n", + "cloudy_less_than = 1 # percent\n", + "\n", + "search = catalog.search(\n", + " collections=[\"landsat-c2-l2\"],\n", + " bbox=bbox,\n", + " datetime=datetime,\n", + " query={\"eo:cloud_cover\": {\"lt\": cloudy_less_than}, \"platform\": {\"in\": [platform]}},\n", + ")\n", + "items = search.get_all_items()\n", + "print(f\"Returned {len(items)} Items:\")\n", + "[[i, item.id] for i, item in enumerate(items)]" + ] + }, + { + "cell_type": "markdown", + "id": "e89ef2a4-8b5d-4799-a529-3d9adbc61a89", + "metadata": {}, + "source": [ + "### Load a dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4fbc1c60-ab38-49b8-beb2-167dc4b6f298", + "metadata": {}, + "outputs": [], + "source": [ + "item = items[1] # select one of the results" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d5f6b4f4-80aa-4e82-bc9b-bd70b6d63d49", + "metadata": {}, + "outputs": [], + "source": [ + "assets = []\n", + "for _, asset in item.assets.items():\n", + " try:\n", + " assets.append(asset.extra_fields[\"eo:bands\"][0])\n", + " except:\n", + " pass\n", + "\n", + "cols_ordered = [\n", + " \"common_name\",\n", + " \"description\",\n", + " \"name\",\n", + " \"center_wavelength\",\n", + " \"full_width_half_max\",\n", + "]\n", + "bands = pd.DataFrame.from_dict(assets)[cols_ordered]\n", + "bands" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d17bcf9e-116c-4471-90ca-754a3b3fb305", + "metadata": {}, + "outputs": [], + "source": [ + "ds_2017 = odc.stac.stac_load(\n", + " [item],\n", + " bands=bands.common_name.values,\n", + " bbox=bbox,\n", + " chunks={}, # <-- use Dask\n", + ").isel(time=0)" + ] + }, + { + "cell_type": "markdown", + "id": "03e8f2cb-31ef-4e42-a16f-ffa7f2e79d78", + "metadata": {}, + "source": [ + "### Retain CRS Attribute" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8f8e88f-a517-4a13-b823-e8dff53b0e47", + "metadata": {}, + "outputs": [], + "source": [ + "epsg = item.properties[\"proj:epsg\"]\n", + "ds_2017.attrs[\"crs\"] = f\"epsg:{epsg}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b7d3201-bb41-48d0-90b7-565c2e454f9c", + "metadata": {}, + "outputs": [], + "source": [ + "da_2017 = ds_2017.to_array(dim=\"band\")\n", + "da_2017" + ] + }, + { + "cell_type": "markdown", + "id": "865eb3d1-59a9-401a-b1f1-7588bb5c1700", + "metadata": {}, + "source": [ + "## Reshaping Data\n", + "\n", + "The shape of our data is currently `n_bands`, `n_y`, `n_x`. In order for dask-ml / scikit-learn to consume our data, we'll need to reshape our image stacks into `n_samples, n_features`, where `n_features` is the number of wavelength-bands and `n_samples` is the total number of pixels in each wavelength-band image. Essentially, we'll be creating a vector of pixels out of each image, where each pixel has multiple features (bands), but the ordering of the pixels is no longer relevant to the computation. " + ] + }, + { + "cell_type": "markdown", + "id": "042bfffb-c979-4958-9086-646a83918d61", + "metadata": {}, + "source": [ + "By using xarray methods to flatten the data, we can keep track of the coordinate labels 'x' and 'y' along the way. This means that we have the ability to reshape back to our original array at any time with no information loss!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "653adf8b-da16-4eb6-ae64-d172c8eae75f", + "metadata": {}, + "outputs": [], + "source": [ + "flattened_xda = da_2017.stack(z=(\"x\", \"y\")) # flatten each band\n", + "flattened_t_xda = flattened_xda.transpose(\"z\", \"band\")\n", + "flattened_t_xda" + ] + }, + { + "cell_type": "markdown", + "id": "e427b1dd-08c3-4657-84e7-09b0269edad9", + "metadata": {}, + "source": [ + "## Standardize Data\n", + "\n", + "Now that we have the data in the correct shape, let's standardize (or rescale) the values of the data. We do this to get all the flattened image vectors onto a common scale while preserving the differences in the ranges of values. Again, we'll demonstrate doing this first in NumPy and then xarray." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8cd5c3c-1256-400a-9cea-32debd51ec4d", + "metadata": {}, + "outputs": [], + "source": [ + "with xr.set_options(keep_attrs=True):\n", + " rescaled_xda = (flattened_t_xda - flattened_t_xda.mean()) / flattened_t_xda.std()\n", + "rescaled_xda" + ] + }, + { + "cell_type": "markdown", + "id": "37a47e0b-7f74-404f-8f16-bf80c3bb1995", + "metadata": {}, + "source": [ + "
\n", + "

Info

\n", + " Above, we are using a context manager \"with xr.set_options(keep_attrs=True):\" to retain the array's attributes through the operations. That is, we want any metadata like 'crs' to stay with our result so we can use 'geo=True' in our plotting.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "27d69080-eff4-471e-9536-02f99420bd0a", + "metadata": {}, + "source": [ + "As `rescaled_xda` is still a Dask object, if we wanted to actually run the rescaling at this point (provided that all the data can fit into memory), we would use `rescaled_xda.compute()`." + ] + }, + { + "cell_type": "markdown", + "id": "07b01093-da37-419c-93f2-9bba60be1578", + "metadata": {}, + "source": [ + "\n", + "## ML pipeline\n", + "Now that our data is in the proper shape and value range, we are ready to conduct spectral clustering. Here we will use a version of [spectral clustering from dask_ml](https://ml.dask.org/modules/generated/dask_ml.cluster.SpectralClustering.html) that is a scalable equivalent to operations from Scikit-learn that cluster pixels based on similarity (across all wavelength-bands, which makes it spectral clustering by spectra!)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "62075e17-55d1-470c-8df6-55be6ab895bd", + "metadata": {}, + "outputs": [], + "source": [ + "client = Client(processes=False)\n", + "client" + ] + }, + { + "cell_type": "markdown", + "id": "2d68c9d8-9b46-4123-8ab1-cef625ad9ea6", + "metadata": {}, + "source": [ + "Now we will compute and persist the rescaled data to feed into the ML pipeline. Notice that our `X` matrix below has the shape: `n_samples, n_features` as discussed earlier. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "083a3ea0-1ae2-42d0-ac83-31d66a01b11d", + "metadata": {}, + "outputs": [], + "source": [ + "X = client.persist(rescaled_xda)\n", + "X.shape" + ] + }, + { + "cell_type": "markdown", + "id": "c0f22024-87fd-49d2-8e06-6e6eff714708", + "metadata": {}, + "source": [ + "First we will set up the model with the number of clusters, and other options." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4afcf563-8d46-4b35-9936-3d330e226d26", + "metadata": {}, + "outputs": [], + "source": [ + "clf = SpectralClustering(\n", + " n_clusters=4,\n", + " random_state=0,\n", + " gamma=None,\n", + " kmeans_params={\"init_max_iter\": 5},\n", + " persist_embedding=True,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "0666c6af-b948-41ed-a789-7f7cae9459f0", + "metadata": {}, + "source": [ + "**This next step is the slow part.** We'll fit the model to our matrix `X`. Depending on your setup, it could take seconds to minutes to run depending on the size of our data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "372c890c-8ce7-4ce8-958b-43909c3de0e9", + "metadata": {}, + "outputs": [], + "source": [ + "%time clf.fit(X)" + ] + }, + { + "cell_type": "markdown", + "id": "60e746bd-5aa6-436a-84de-3b93e1bed1dd", + "metadata": {}, + "source": [ + "Let's check the shape of the result:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "73e9af39-dadf-4956-b23e-5dc3feb65c56", + "metadata": {}, + "outputs": [], + "source": [ + "labels = clf.assign_labels_.labels_.compute()\n", + "labels.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9e1a56c-0d6e-4c89-bbe8-0c8d9d0dd05f", + "metadata": {}, + "outputs": [], + "source": [ + "labels" + ] + }, + { + "cell_type": "markdown", + "id": "e8e59036-e8c0-4a0e-8d78-9037509b8c90", + "metadata": {}, + "source": [ + "The result is a single vector of cluster labels." + ] + }, + { + "cell_type": "markdown", + "id": "084207cc-3e00-419e-a61d-300ad0868a6e", + "metadata": { + "tags": [] + }, + "source": [ + "## Un-flattening\n", + "\n", + "Once the computation is done, we can use the coordinates of our input array to restack our output array back into an image. Again, one of the main benefits of using `xarray` for this stacking and unstacking is that it keeps track of the coordinate information for us. " + ] + }, + { + "cell_type": "markdown", + "id": "625ae4c4-64cd-46fd-b6ec-43810c8be359", + "metadata": {}, + "source": [ + "Since the original array is n_samples by n_features (90000, 6) and the cluster label output is (90000,), we just need the coordinates from one of the original features in the shape of n_samples. We can just copy the coordinates from the first input feature and populate is with our output data:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "642cf283-64d8-4c9e-91c3-32be8ec35c4b", + "metadata": {}, + "outputs": [], + "source": [ + "template = flattened_t_xda[:, 0]\n", + "output_array = template.copy(data=labels)\n", + "output_array" + ] + }, + { + "cell_type": "markdown", + "id": "d321a58f-9617-4540-98c7-8a7bccd414ae", + "metadata": {}, + "source": [ + "With this new output array with coordinates copied from the input array, we can unstack back to the original `x` and `y` image dimensions by just using `.unstack()`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d16ec3f0-c6dc-470b-9e56-7c3540066af7", + "metadata": {}, + "outputs": [], + "source": [ + "unstacked_2017 = output_array.unstack()\n", + "unstacked_2017" + ] + }, + { + "cell_type": "markdown", + "id": "588814fe-3776-4ef5-9392-dd60e6e6776c", + "metadata": {}, + "source": [ + "Finally, we can visualize the results! By hovering over the resulting imge, we can see that the lake water has been clustered with a certain label or 'value'." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a9e6863f-0cec-436b-bdcc-9aa533e6df9f", + "metadata": {}, + "outputs": [], + "source": [ + "raw_plot_2017 = da_2017.sel(band=\"red\").hvplot.image(\n", + " x=\"x\", y=\"y\", geo=True, xlabel=\"lon\", ylabel=\"lat\", datashade=True, cmap=\"greys\", title=\"Raw Image 2017\",\n", + ")\n", + "\n", + "result_plot_2017 = unstacked_2017.hvplot(\n", + " x=\"x\", y=\"y\", cmap=\"Set3\", geo=True, xlabel=\"lon\", ylabel=\"lat\", colorbar=False, title=\"Spectral Clustering 2017\",\n", + ")\n", + "\n", + "raw_plot_2017 + result_plot_2017" + ] + }, + { + "cell_type": "markdown", + "id": "ba47bdd3-995b-4b98-8cf0-efeea65688da", + "metadata": {}, + "source": [ + "## Spectral Clustering for 1988" + ] + }, + { + "cell_type": "markdown", + "id": "bea4b2b5-5e74-4f8c-b171-89a24e57b8ae", + "metadata": {}, + "source": [ + "We have conducted the spectral clustering for 2017 and now we want to compare this result to the lake in 1988. Let's load data from 1988 and run the same analysis as above." + ] + }, + { + "cell_type": "markdown", + "id": "f4d284d4-23b9-4c77-9275-58aeb9c6e563", + "metadata": {}, + "source": [ + "We will use the same catalog, but we will search it for a different point in time and different Landsat mission" + ] + }, + { + "cell_type": "markdown", + "id": "dcb18049-fb4c-44e4-9d72-90bd24094bd7", + "metadata": {}, + "source": [ + "### Load the data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e00ab9e6-b36b-47ec-a5a3-4b6bbf6a9ca6", + "metadata": {}, + "outputs": [], + "source": [ + "bbox = [-118.89, 38.54, -118.57, 38.84] # Region over a lake in Nevada, USA\n", + "datetime = \"1988-06-01/1988-09-30\" # Summer months of 1988\n", + "collection = \"landsat-c2-l2\"\n", + "platform = \"landsat-5\" # Searching through an earlier landsat mission\n", + "cloudy_less_than = 1 # percent\n", + "\n", + "search = catalog.search(\n", + " collections=[\"landsat-c2-l2\"],\n", + " bbox=bbox,\n", + " datetime=datetime,\n", + " query={\"eo:cloud_cover\": {\"lt\": cloudy_less_than}, \"platform\": {\"in\": [platform]}},\n", + ")\n", + "\n", + "items = search.get_all_items()\n", + "item = items[1] # select one of the results" + ] + }, + { + "cell_type": "markdown", + "id": "1428fff4-fc72-4692-9c20-e9e168337fc2", + "metadata": {}, + "source": [ + "Notice that Landsat 5 data from 1988 has slightly different spectra than Landsat 8 from 2017. Details like this are important to keep in mind when performing analyses that directly compare across missions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "665e250d-6790-402d-b877-bbe967556089", + "metadata": {}, + "outputs": [], + "source": [ + "assets = []\n", + "for _, asset in item.assets.items():\n", + " try:\n", + " assets.append(asset.extra_fields[\"eo:bands\"][0])\n", + " except:\n", + " pass\n", + "\n", + "cols_ordered = [\n", + " \"common_name\",\n", + " \"description\",\n", + " \"name\",\n", + " \"center_wavelength\",\n", + " \"full_width_half_max\",\n", + "]\n", + "bands = pd.DataFrame.from_dict(assets)[cols_ordered]\n", + "bands" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cef800e2-55f5-4777-90a5-63d466ce5f6f", + "metadata": {}, + "outputs": [], + "source": [ + "ds_1988 = odc.stac.stac_load(\n", + " [item],\n", + " bands=bands.common_name.values,\n", + " bbox=bbox,\n", + " chunks={}, # <-- use Dask\n", + ").isel(time=0)\n", + "\n", + "epsg = item.properties[\"proj:epsg\"]\n", + "ds_1988.attrs[\"crs\"] = f\"epsg:{epsg}\"\n", + "\n", + "da_1988 = ds_1988.to_array(dim=\"band\")\n", + "da_1988" + ] + }, + { + "cell_type": "markdown", + "id": "057bd00b-193a-45b1-97ac-62a77fa0a322", + "metadata": {}, + "source": [ + "### Reshape and Standardize" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ceb4ee87-9a8b-4278-9721-c2af8a56f6b8", + "metadata": {}, + "outputs": [], + "source": [ + "flattened_xda = da_1988.stack(z=(\"x\", \"y\"))\n", + "flattened_t_xda = flattened_xda.transpose(\"z\", \"band\")\n", + "with xr.set_options(keep_attrs=True):\n", + " rescaled_xda = (flattened_t_xda - flattened_t_xda.mean()) / flattened_t_xda.std()\n", + "rescaled_xda" + ] + }, + { + "cell_type": "markdown", + "id": "61bf4b05-3266-4bbd-b618-f4c01b56eccd", + "metadata": {}, + "source": [ + "### Spectral Clustering" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a4a50c6-f6dd-46c2-afb3-95196b341e21", + "metadata": {}, + "outputs": [], + "source": [ + "X = client.persist(rescaled_xda)\n", + "clf = SpectralClustering(\n", + " n_clusters=4,\n", + " random_state=0,\n", + " gamma=None,\n", + " kmeans_params={\"init_max_iter\": 5},\n", + " persist_embedding=True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "229368ab-ed3a-4fc1-b328-d84b7e7ebebb", + "metadata": {}, + "outputs": [], + "source": [ + "%time clf.fit(X)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ec353001-4e0e-44af-b38d-f602f5a1dab8", + "metadata": {}, + "outputs": [], + "source": [ + "labels = clf.assign_labels_.labels_.compute()\n", + "labels.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "937d7db4-b848-499b-8da2-8d00287b5d81", + "metadata": {}, + "outputs": [], + "source": [ + "labels" + ] + }, + { + "cell_type": "markdown", + "id": "1c690262-bb13-48cd-9e24-d3be8ed8fb91", + "metadata": {}, + "source": [ + "### Unstack and Visualize" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba40cfc1-2da7-4be2-80f8-4bd0838dd64a", + "metadata": {}, + "outputs": [], + "source": [ + "template = flattened_t_xda[:, 0]\n", + "output_array = template.copy(data=labels)\n", + "unstacked_1988 = output_array.unstack()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b3be0319-7f51-4384-befe-f0afcd2fecff", + "metadata": {}, + "outputs": [], + "source": [ + "unstacked_1988" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d41ed729-af6c-4817-aad1-154671e743c5", + "metadata": {}, + "outputs": [], + "source": [ + "raw_plot_1988 = da_1988.sel(band=\"red\").hvplot.image(\n", + " x=\"x\", y=\"y\", geo=True, xlabel=\"lon\", ylabel=\"lat\", datashade=True, cmap=\"greys\", title=\"Raw 1988\"\n", + ")\n", + "\n", + "result_plot_1988 = unstacked_1988.hvplot(\n", + " x=\"x\", y=\"y\", cmap=\"Set3\", geo=True, xlabel=\"lon\", ylabel=\"lat\", colorbar=False, title=\"Spectral Clustering 1988\",\n", + ")\n", + "\n", + "raw_plot_1988 + result_plot_1988" + ] + }, + { + "cell_type": "markdown", + "id": "6ff424f8-b243-4a20-a069-ae45f2490370", + "metadata": {}, + "source": [ + "## Spectral Clustering Over Time" + ] + }, + { + "cell_type": "markdown", + "id": "c872fea9-55bc-4c60-8979-91796ef649a9", + "metadata": {}, + "source": [ + "Our hypothesis is that the lake's area is receding over time and so we want to visualize the potential change. Let's first visually compare the plot of the clustering results from the different time points." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b3f7a6d-3651-4324-8bed-a044034e3e67", + "metadata": {}, + "outputs": [], + "source": [ + "result_plot_1988 + result_plot_2017" + ] + }, + { + "cell_type": "markdown", + "id": "4b170f41-3442-4fce-91be-40a41317f38d", + "metadata": {}, + "source": [ + "By hovering over the lake in each image, we can see that the water was labeled ('value') with a certain cluster number in both images. At the time of writing, that cluster label turns out to be `2`. However, it didn't have to turn out to be the same values in each of the images. For instance, it could have been that the water-cluster for the '1988 Labels' was '1' and for the '2017 Labels' it could have been '2' and that would be fine; we would just have adjusted the water cluster label in next step." + ] + }, + { + "cell_type": "markdown", + "id": "3d1aba48-6886-4656-8b2d-964488bcea39", + "metadata": {}, + "source": [ + "
\n", + "

Warning

\n", + " Be sure to check that the label over each lake before proceeding and adjust the values below if necessary.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "0b51d984-1b83-408b-9758-c843e9d8c58e", + "metadata": {}, + "source": [ + "Let's set any value that isn't our water cluster label to 0." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f20df2e1-b66d-4857-a39d-9a16ee50ad22", + "metadata": {}, + "outputs": [], + "source": [ + "water_cluster_1988 = 2\n", + "water_cluster_2017 = 2\n", + "\n", + "water_1988 = unstacked_1988.where(unstacked_1988 == water_cluster_1988, 0)\n", + "water_2017 = unstacked_2017.where(unstacked_2017 == water_cluster_2017, 0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b90cabc5-cac6-4e53-8730-9b808c688104", + "metadata": {}, + "outputs": [], + "source": [ + "water_1988_plot = water_1988.hvplot(\n", + " x=\"x\", y=\"y\", cmap=\"greys\", geo=True, colorbar=False, title=\"1988 Water\"\n", + ")\n", + "\n", + "water_2017_plot = water_2017.hvplot(\n", + " x=\"x\", y=\"y\", cmap=\"greys\", geo=True, colorbar=False, title=\"2017 Water\"\n", + ")\n", + "\n", + "water_1988_plot + water_2017_plot" + ] + }, + { + "cell_type": "markdown", + "id": "313e08b0-bbec-43cf-94fa-a4b8a84e1177", + "metadata": {}, + "source": [ + "Now we can take the difference of these water label arrays to see exactly where the water levels has changed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4af122d1-3044-48cc-a0be-3922839474c1", + "metadata": {}, + "outputs": [], + "source": [ + "with xr.set_options(keep_attrs=True):\n", + " water_diff = water_1988 - water_2017" + ] + }, + { + "cell_type": "markdown", + "id": "d5517bf1-bced-455e-a053-528b47809e0a", + "metadata": {}, + "source": [ + "Red pixels (array value '1') of our image below are where water was lost from 1988 to 2017." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2b9ac483-0240-4ffc-b167-f84eac832a32", + "metadata": {}, + "outputs": [], + "source": [ + "water_diff.hvplot(\n", + " x=\"x\", y=\"y\", cmap='coolwarm', geo=True, xlabel=\"long\", ylabel=\"lat\", colorbar=False, title=\"Water Change 1988-2017\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "1466477f-19b5-4866-940c-5cc52c979b83", + "metadata": {}, + "source": [ + "We did it! We are observing the change in the lake shoreline over time using a simple spectral clustering approach." + ] + }, + { + "cell_type": "markdown", + "id": "d14ad11a-18df-4082-a596-a0be684e75a7", + "metadata": {}, + "source": [ + "Let's finish things off by adding some geo tiles as a background. To only display the colored pixels overlaid on geo tiles, we could either set the array's background value ('0') to 'Not a Number' (NaN), or we could just inform hvPlot that we want the background valued pixels to be transparent with `.redim.nodata(value=0)`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17ad0e8a-c880-4c67-9994-4e1425da2829", + "metadata": {}, + "outputs": [], + "source": [ + "water_diff.hvplot(\n", + " x=\"x\", y=\"y\", width=400, height=400, cmap='coolwarm', geo=True, xlabel=\"lon\", ylabel=\"lat\", alpha=1, colorbar=False, title=\"Water Loss from 1988 to 2017\", tiles=\"ESRI\",\n", + ").redim.nodata(value=0)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "46b24b86-d262-4ce3-8cae-5b4185574dc7", + "metadata": {}, + "source": [ + "___" + ] + }, + { + "cell_type": "markdown", + "id": "072e479f-7b69-495c-83a3-9b77352be007", + "metadata": {}, + "source": [ + "## Summary\n", + "Starting from raw Landsat data, we have used a simple spectral clustering approach to observe the change in a lake water's extent across time.\n", + "\n", + "### What's next?\n", + "Adapt this notebook for your own use case or select another workflow example notebook." + ] + }, + { + "cell_type": "markdown", + "id": "ebee9cbd-ed88-4827-809c-c820b8af509a", + "metadata": {}, + "source": [ + "## Resources and References\n", + "- Authored by Demetris Roumis circa Jan, 2023" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3993488c-4dcf-4613-8c21-b02431112866", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.10" + }, + "vscode": { + "interpreter": { + "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e" + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}