Process and archive complete IWP dataset (high, medium, & low) #6

julietcohen · 2022-10-31T20:04:17Z

@ChandiWitharana and Elias, I’d like your opinions regarding how to archive the PDG data and metadata for the IWP and water/glacier clipped datasets. Elias processed the .shp files last week, and Kastan is running the workflow to stage, rasterize, and create the web tiles. We’re processing both these datasets in advance of NNA (prioritizing the IWP dataset), and we’ll have the .shp, .gpkg, and .tif data files archived on the Arctic Data Center.

Matt suggested 2 ways to archive the data:

Which option is best?

As I document each file type, I'll be checking in with you about metadata and authorship questions.

The text was updated successfully, but these errors were encountered:

julietcohen · 2022-11-01T18:37:28Z

Data Processing Methods:

The water and glacier directory (described as the water and glacier mask in the schematic above) includes shapefiles for: 1) coastal water (to remove polygons that were incorrectly identified as IWP, when in reality it was sea ice) as well as 2) glaciers (to remove polygons that were incorrectly identified as IWP, when in reality it was a glacier). It does not include surface water such as lakes.
glacier shapefile
coastal water shapefile
The reason these files do not include surface water is because MAPLE's processing steps already masked out lakes before the coastal water and glacier mask was applied
Therefore the fully processed files (found on Delta: /scratch/bbki/kastanday/maple_data_xsede_bridges2/glacier_water_cleaned_shp/high_ice) have been cleaned for all 3 types of water (coastal, glaciers, and surface water)

To Do:

~~priority before NNA: archive the IWP data package on ADC~~
within the water and glacier mask data package on ADC, archive the inland water mask applied by MAPLE in addition to the coastal water and glacier mask used by Elias & team (retrieve this mask from the MAPLE workflow?)

julietcohen · 2022-11-03T16:38:37Z

Questions for PDG team

Please provide either answers or links to resources where I can find the following information:

IWP data

~~1. overall temporal coverage of all processed IWP shapefiles (just year(s) will suffice, months & days are optional)~~

~~I can derive this from the shapefilefile names, since the year is included~~

1. is the temporal coverage 2000 -2021 correct? optional to add months/days too
2. overall spatial coverage of all processed IWP files

Northwest and Southeast coordinates for bounding box
Can enter 2 bounding boxes for region since it spans dateline

3. Will we include the term "high ice" in the metadata documentation? Or is this just a term used within the PDG team?
4. All authorship information:

names
emails
ORCID's
specifically what data they contributed to (if not all the data types)
Chandi is listed as the dataset contact?

5. Confirm units are correct for all data attributes

see this draft metadata documentation on ADC test site
units of CentroidX and CentroidY attributes (currently input as meters)
more detailed descriptions for the Class and Sensor attributes
more detailed descriptions for the Length and Width attributes (currently input as major and minor axes of IWP polygons)

6. Review data package title, abstract, methods, and "ethical research practices" drafts

see this draft metadata documentation on ADC test site
Include Anna's suggestion for a "be aware, this data is not perfect, it will be improved" warning to users

7. explanation for file naming template for shapefiles, something like sensor_YYYYMMDD... and so on.
Example files:

GE01_20190805214726_1050010017BFD700_19AUG05214726-M1BS-503537880070_01_P001_u16rf3413_pansh.shp

and the slightly longer filename, with an inserted _R4C1:

WV02_20120813231405_103001001B8A8E00_12AUG13231405-M1BS_R4C1-052719605010_02_P002_u16rf3413_pansh

Separate data package for coastal water, inland water, & glacier mask

overall temporal and spatial coverage (just year(s) will suffice, months & days are optional)

~~I found many water masks Elias mentioned he uploaded to Delta along with the shapefiles, in the folders with _water suffix, such as this one:~~ /scratch/bbki/kastanday/maple_data_xsede_bridges2/glacier_water_cleaned_shp/high_ice/alaska/190_191_192_water/WV03_20200828223924_104001005E631B00_20AUG28223924-M1BS-504712637050_01_P001_u16rf3413_pansh/WV03_20200828223924_104001005E631B00_20AUG28223924-M1BS-504712637050_01_P001_u16rf3413_pansh_watermask.tif
~~some _watermask files are labeled as 2019 and some are 2020, so the temporal coverage for all water masks would be 2019-2020?~~ Update: New water masks provided by Elias because new IWP data (output of pre-processing) was produced in February, with file hierarchy that needs to be retained when archived on ADC. Water masks are stored in Delta /scratch/bbou/julietcohen/water_masks

~~overall temporal and spatial coverage of MAPLE inland water mask (if available?), this may be the same as the temporal/spatial coverage for coastal water and glacier mask and IWP data~~ Update: the masks for all water features have now been combined, so no need to document separate extents for different water features
Can we retrieve the surface water mask used in the MAPLE workflow?

ChandiWitharana · 2022-12-07T18:41:44Z

Either options would be fine by me. I suggest you to implement most effective one. As per the description, it would be Option 2.

ChandiWitharana · 2022-12-07T18:46:48Z

Generally temporal coverage of images could fall in between 2001 - 2021. But majority are post 2008 or 2010.

Spatial coverage of ALL processed files generally fall within Arctic Tundra region and confined to low-, medium, and high-ice areas within Tundra. These terms (high, low, medium ) are Brown et al. 1998

julietcohen · 2022-12-07T18:57:29Z

@ChandiWitharana Thank you for the feedback.

The Abstract and Methods were drafted for option 1, rather than option 2. We can split the package to go with option 2 if @mbjones thinks option 2 makes more sense as well. That would mean submitting 2 more tickets for a total of 3 repositories published by Monday.

julietcohen · 2022-12-07T19:16:44Z

I searched for the Brown et al. 1998 paper you mentioned, and found this: https://nsidc.org/sites/default/files/heginbottometal_1993.pdf

Please let me know if you are referring to a different publication, we can include a formal citation for it in the metadata.

ChandiWitharana · 2022-12-08T14:52:02Z

Better we use this link (https://nsidc.org/data/ggd318/versions/2)
Brown, J., O. Ferrians, J. A. Heginbottom, and E. Melnikov. (2002). Circum-Arctic Map of Permafrost and Ground-Ice Conditions, Version 2 [Data Set]. Boulder, Colorado USA. National Snow and Ice Data Center. https://doi.org/10.7265/skbg-kf16. Date Accessed 12-08-2022.

julietcohen · 2022-12-09T21:35:08Z

From Anna:

For those attending the AGU both and a comment on the ImageryViewer and the IWP in general: Two permafrost science users have sent us some feedback on the IWP dataset in the public version of the ImageryViewer and they find it hard to assess the quality for the IWP dataset due to the poor resolution of the Bing imagery (and the simple fact that the background imagery in the ImageryViewer does not reflect the original Maxar imagery used in developing the IWP dataset). The Maxar license restrictions prevents us from showing the Maxar imagery to the public (can only be accesses/viewed by NSF Arctic funded researchers). So, two take-aways: 1) The meta-data of the IWP dataset need to contain statistics on the efficiency/accuracy of the algorithm. 2) We need to make it clear somewhere in the ImageryViewer that the background Bing imagery is not what was used to create the datasets shown in the ImageryViewer due to license restrictions of satellite imagery.

add statistics to metadata package about accuracy of IWP algorithm

robyngit · 2023-06-14T16:55:41Z

⭐️ Note ⭐️ @julietcohen: @dvirlar2 pre-issued a DOI for the IWP dataset once it is published: doi:10.18739/A2KW57K57

dvirlar2 · 2023-06-14T16:59:19Z

@robyngit @julietcohen I'm happy to publish the dataset with the above DOI once things are ready to go, just let me know! Our code for has changed a little since Juliet was on the curation team, and I wouldn't want y'all to use old code 🙂

mbjones · 2023-06-14T17:01:49Z

@robyngit Thanks for the DOI. For now, I think we could manually configure the DOI to point at a manually-created landing page for the dataset. Once it is published in the ADC, the DOI would then be updated to point at the ADC landing page. Does that sounds reasonable?

mbjones · 2023-06-14T17:48:08Z

Overview of package entity relationships, with the processing steps we associate with each:

flowchart LR
    A[A. Maxar]-->|MAPLE| B(B. IWP Shapefiles)
    B --> |Staging| C(C. IWP Geopackages)
    C --> |Rasterization| D(D. IWP Geotiffs)
    D --> |Web tiling| E[E. IWP PNGs]
    C --> |3dTiling| F(F. IWP 3DTiles)
    Note[Square boxes\n likely not\n to be archived]

robyngit · 2023-06-15T16:30:47Z

For the initial release of the IWP layer we are aiming for mid-July or later to correspond with other announcements. Since this might not leave sufficient time to get all of the metadata in order, we discussed initially publishing a minimal version of the data package so at least we can have a DOI in place, that points to relevant information, in case anyone needs to cite or reference the data.

We envisioned that this MVP data package would comprise just 1) citation info, and 2) abstract, and 3) link to file tree for downloads. However, there are more fields that are required in order to publish a package on the ADC, thus I think the package should contain all the fields that are marked as mandatory in the editor, but exclude all of the entity information for now.

I created a test version of this minimal package that is mostly a copy of what @julietcohen already created but without any files (so no python scripts and no data object descriptions). 📑 The MVP test version is available here.

There are some outstanding issues with the metadata we have:

People
- Need the full list of authors (Dataset Creators, PI, Co-PI, Metadata Creators, Custodians, etc.)
Location
- Need feedback on what to include for location, I entered:
  - Description: Pan-arctic
  - Northwest coordinates: 90, -180
  - Southeast coordinates: 66.5, 180
Methods
- Step 2: Need input from Elias ("Script for this post-processing step: {Elias - point to script here}")
- Step 3, 4, 5: No official releases for viz-* packages
- Study Extent: Define "high", "medium", and "low" ice regions
General Review
- The metadata in its entirety should be reviewed by the IWP team before release

ChandiWitharana · 2023-06-22T18:51:05Z

High, Medium, Low ice regions are categorization adapted from Brown et al. 2002 for image selection and processing purposes. (Brown, J., O. Ferrians, J. A. Heginbottom, and E. Melnikov. (2002). Circum-Arctic Map of Permafrost and Ground-Ice Conditions, Version 2 [Data Set]. Boulder, Colorado USA. National Snow and Ice Data Center. https://doi.org/10.7265/skbg-kf16. Date Accessed 12-08-2022.)

ChandiWitharana · 2023-06-22T18:53:41Z

From UConn side, the team would be:
Chandi Witharana (PI), Mahendra R. Udawalpola (Postdoc), Amal S. Perera (Postdoc), Amit Hasan (Graduate student), Elias Manos (Graduate student)

ChandiWitharana · 2023-06-22T18:54:16Z

Location: Whats given is fine.

ChandiWitharana · 2023-06-22T18:55:32Z

Metadata should be fine

robyngit · 2023-06-22T19:55:45Z

Thanks @ChandiWitharana! What can we include to define what is meant by "high", "medium", and "low" ice regions?

julietcohen · 2023-06-23T16:58:14Z

In @robyngit 's to-do list above, methods steps 3-5 require a release for the PDG packages. I will take care of this so we can include it in the metadata.

dvirlar2 · 2023-06-23T19:24:23Z

Preliminary thoughts after viewing the latest version of the package:

Title, Abstract, and Keywords:

~~The title should include some mention that this is the "High Ice" portion of the IWP overall dataset.~~
- Edit Jun 26: I totally missed that "High Ice" was already in the title, oops!
I'd include an explanation of what "High Ice" means in the context of this dataset. I'd also provide the context of where that definition/delineation comes from (per Chandi's comment above).
- In general I think it'd be a good idea to add a note that there are also medium and low ice versions of the data (and explain those meanings), that we're also planning on making public. Later when we do make those public, we can add links to direct links to them in the abstract, and assign the High Ice dataset can get a new doi at that point. Does that sound like reasonable to you @mbjones?
For keywords, I'd consider adding: CNN, Convolutional Neural Network, ice wedge polygons, and any others that might be relevant.

I have ideas for how to flesh out the methods section, but I'll get to that later when I have more time.

julietcohen · 2023-06-23T22:05:25Z

@dvirlar2 Thank you for the feedback 👍🏼

Regarding your first suggestion: The title does already include "high ice Arctic regions". Would you suggest wording it in a different way or is that sufficient?

Regarding your second suggestion: The Sampling section does already include a short description of what "high ice" means and mentions that there are also medium and low ice regions. "The geographic area sampled is the "high ice" regions of the Arctic, which are those the dataset authors identified to contain a relatively high proportion of ice. The study extent encompasses all high ice regions masked for coastal oceans, glaciers, and surface water. Further additions to this dataset will include "medium ice" and "low ice" regions of the Arctic as well. These regions were classified by less ice content." Perhaps this is not sufficient, or we could put it in a different section so it's more obvious?

julietcohen · 2023-06-23T22:09:30Z

I can move the high ice / medium ice / low ice descriptions from the Sampling section to the Abstract, since it seems that is what you are suggesting since that part is all you had time to review so far.

mbjones · 2023-06-23T22:46:36Z

The high/med/low ice distinctions are not that critical - they essentially signal solely the order in which different spatial regions were processed. Its helpful for people to know that the spatial extent of the dataset will grow over time, but the results in each region are the same, and the divisions between the regions are pretty arbitrary.

dvirlar2 · 2023-06-26T18:31:58Z

Given Chandi's earlier comment and link to the NSIDC dataset, I found this information about the designations between high/med/low ice.

From the user guide:

"High Ice" is characterized by:

greater than 20% for lowlands, highlands, and intra- and intermontane depressions characterized by thick overburden cover (>5-10m), and

greater than 10% for mountains, highlands ridges, and plateaus characterized by thin overburden cover (>5-10m) and exposed bedrock

Medium ice is characterized by 10-20%, and low ice is 0-10%, with no internal breakdown in terrain like the high ice.

From the Hegginbottom paper under the "ATDBs" section:

The relative abundance of ground ice in each map unit is presented in the form of qualitative estimates of the percentage of ice in the upper 10 to 20m of the ground. These estimates include the volume of segregation ice, injection ice and reticulate ice. Three classes are used for ground ice content (high, >20%; medium, 10-20%; and low, <10%) in areas in physiographic class 1, that is for areas of generally thick overburden. For areas of generally thin overburden (physiographic class 2) only two classes of ground ice are mapped, medium to high (>10%) and low (<10%), due in part to paucity of data.

Given the above descriptions, I think it's reasonable to include some combination of the above content that would explain the difference between the High, Medium, and Low ice datasets to users. I think these specific descriptions should go in the Sampling Description section of the dataset like Juliet mentioned above, but there should also be a sentence in the abstract mentioning that an explanation is provided further on in the dataset.

In the Sampling Description section, we should also link to the NSIDC dataset and provide brief direction for users to view the User Guide and Hegginbottom paper for more in-depth information.

dvirlar2 · 2023-06-26T18:35:05Z

Given my own confusion reading the dataset title earlier in this thread, and my experience of having a harder time distinguishing between datasets with very similar titles, I would recommend putting changing the title to something along the lines of

"High Ice: Ice wedge polygon detection in satellite imagery from Arctic regions, Permafrost Discovery Gateway, 2001-2021"

That way, the High, Medium, and Low distinctions are more immediately clear to users. Food for thought

dvirlar2 · 2023-06-26T18:46:09Z

List of Orcid IDs:

Elias Manos: https://orcid.org/0000-0002-7350-0116

Need to verify:

Amal S. Perera: https://orcid.org/0000-0003-3683-2098

Still need to include, if desired by person:

Amit Hasan
Mahendra R. Udawalpola

mbjones · 2023-06-26T20:35:06Z

@dvirlar2 The plan is to update the dataset with new version releases to include all of the high, med, and low ice regions. And we plan to do that soon. So, I think the title should not include that distinction. A proposed title:

"Ice wedge polygon detection in satellite imagery from Pan-Arctic regions, Permafrost Discovery Gateway, 2001-2021"

julietcohen · 2023-06-26T23:16:28Z

The ORCiD Daphne suggested for Amal is correct

amalshehan · 2023-06-27T20:09:42Z

(1) explanation for file naming template for shapefiles,:
[Sensor]_ [Acquisition time stamp]_ [Catalog ID]_ [Original timestamp]_ [Image type(P: panchromatic, M: multispectra][DG product type (1b: standard, 2A: georectified)][Original Image ID].shp

(2) What is the NSF award number? Such as "NSF Award 2240912" and any non-NSF funding info
NSF Award No: 1720875, 1722572, 1927872, 1927723, 1927729

(3) What are the ORCiD's for Mahendra R. Udawalpola and Amit Hasan?
Mahendra R. Udawalpola : 0000-0002-3521-1508
Amit Hasan : 0000-0001-8774-0228

dvirlar2 · 2023-06-28T22:29:38Z

@mbjones thank you for clarifying that! I got confused between this ticket and our meeting last week on how the datasets were going to be broken up. I agree with the title you proposed 👍🏽

dvirlar2 · 2023-07-05T19:29:54Z

I've added @julietcohen 's test version onto the production site. ADC people can view it here. I haven't checked yet who has access to the test version, but I can do that at a later date.

From a curation standpoint, I've:

changed the title per Matt's suggestion above to "Ice wedge polygon detection in satellite imagery from Pan-Arctic regions, Permafrost Discovery Gateway, 2001-2021"
edited the abstract to include the full link to where the data will live
Added the publisher information
Added the NSF awards
Added a discipline annotation to the dataset. I added cryology, but I'm contemplating adding data science and soil science. @mbjones , do you have any strong preferences on that?

To-Do:

@julietcohen: I didn't add the actual data object for the add_date_attribute_footprints.py file. Since the other objects are going to be dummy entities, I decided to only keep the metadata for all the files, the script included. If it should be added and immediately downloadable from the dataset landing page, and not where the rest of the data will live, let me know and I'll add it back!
Author List: From my outside perspective, I think the author list should include @mbjones, @robyngit, and @julietcohen for their roles in leadership and data processing. I'm not sure what role other people on the whole PDG group have played, but it would be good to have a discussion on authorship and to get other people's names and information listed before the deadline.

dvirlar2 · 2023-07-05T19:42:55Z

The following is the placeholder filename for the dummy shapefile in the package:
example_GE01_20110826213903_10504100013F3800_11AUG26213903_M1BS_054019163020_01_P001_u16rf3413_pansh.shp

After reading the structure provided by @amalshehan, I have a few questions:

Are there multiple sensors that were used? If so, we should document the acronyms and their meanings.
What's the difference between the acquisition and original timestamp? In the example above, the formats are different but the times are the same.
What's the difference between standard and georectified images? Would the average, "entry level" person / early career researcher interested in permafrost (but not satellites) know the differences?
What is the format for the original image ID? Do we know this?

julietcohen · 2023-07-05T21:44:41Z

Regarding Daphne's to-do items above:

The script add_date_attribute_footprints.py was written by me as a post-processing step, and this script should be archived either:

Wherever we archive the other post-processing materials from Chandi's team that are not already pointed to in the methods: the cleaning_data and water_mask directories.
or:
Uploaded to the PermafrostDiscoveryGateway/MAPLE_v3 repo

For the cleaning_data and water_mask directories: We already agreed to make a separate data package for the water masks, since it will likely be a helpful dataset for other purposes outside of the ice wedge polygon dataset post-processing. We haven't decided where to store cleaning_data. This can go in the same repository as the shapefiles, geopackages, and rasters, or it's own data package on the ADC.
We also need to upload the footprints directory, which has the same number of files as the shapefiles directory, and has a hierarchy that is important to maintain.

I would also add Kastan Day to the list of dataset contributors for the geopackages and rasters, and Anna Liljedahl.

mbjones · 2023-07-06T01:26:33Z

Regarding the discipline choices, let's ask @amalshehan and @ChandiWitharana review the proposal with a pointer to the ADCAD vocabulary for choices.

julietcohen · 2023-07-06T18:57:49Z

To Do for the IWP metadata package:

Give Anna Liljedahl access
- she said she requested it from the ADC, ORCiD: 0000-0001-7114-6443)
Suggested by Anna: add the following people to "people and associated parties", and give them editing access if they don't already have it (we might have to find their ORCiD's):
- Robyn Thiessen-bock
- Matt Jones
- Kastan Day
- Juliet Cohen
- Mikhail Kanevskiy ([email protected])
- Torre Jorgenson ([email protected])
- Howard Epstein ([email protected])
- Benjamin Jones ([email protected])
- Ronald Daanen ([email protected])
add these 2 awards, sent by Anna: 1927720, 2052107

dvirlar2 · 2023-07-06T21:00:14Z

julietcohen · 2023-07-06T21:28:01Z

Thanks Daphne! I emailed Kastan to confirm that's his ORCiD.

You are correct about distinguishing between the Data Set Creator section and the "people and associated parties", sorry to cause confusion there.

dvirlar2 · 2023-07-07T16:48:25Z

ORCID Updates:

Howard confirmed the orcid listed above does belong to him. I can add access to the dataset later today

julietcohen · 2023-07-07T16:49:33Z

Kastan also confirmed that ORCiD listed above is his

amalshehan · 2023-07-10T15:57:23Z

The following is the placeholder filename for the dummy shapefile in the package: example_GE01_20110826213903_10504100013F3800_11AUG26213903_M1BS_054019163020_01_P001_u16rf3413_pansh.shp

After reading the structure provided by @amalshehan, I have a few questions:

Are there multiple sensors that were used? If so, we should document the acronyms and their meanings.

@dvirlar2 Based on my discussions with @ChandiWitharana we would like to point to PGC data docs for extra details on file naming as the original data was acquired from PGC and the names were maintained as is. If you think that we should document (archive) this I can respond to the specific details you request above.

The PGC data doc I am referring to are
PGC Commercial Satellite Imagery Documentation

PDF: PGC Commercial Satellite Imagery Documentation (umn.edu)

What's the difference between the acquisition and original timestamp? In the example above, the formats are different but the times are the same.

There is no difference. Original time stamp is given by the vendor and the acquisition time stamp is added by PGC.

What's the difference between standard and georectified images? Would the average, "entry level" person / early career researcher interested in permafrost (but not satellites) know the differences?

Georectified images are corrected for any geometric distortions that may be present in the original/standard image due to the approach used to acquire the image.

What is the format for the original image ID? Do we know this?

No

amalshehan · 2023-07-10T16:23:44Z

Regarding the discipline choices, let's ask @amalshehan and @ChandiWitharana review the proposal with a pointer to the ADCAD vocabulary for choices.

@dvirlar2,
Based on discussion with @ChandiWitharana we should add "Data Science" No need to add "Soil Science".

Good to also have Earth Science, Computer Vision, Geo AI, Big Data.

dvirlar2 · 2023-07-12T18:41:45Z

Should verify at some point how Torre Jorgenson wants to be identified in the dataset. Seems he goes by Torre among peers, but is professionally known as Mark. For now I'm putting him down as "M. Torre" in this dataset, and including information based on this recent dataset

dvirlar2 · 2023-07-12T23:13:38Z

Dataset has been finalized from my POV, and I've sent it to Matt and Juliet to review before sending off to others. Can view things here:

Also, I thought I had sent the Academic Ontology for the dataset annotations, but I see that I did not! @amalshehan For context, this ontology is where we pull our "dataset annotations" from. Earlier I had mentioned cryology, soil science, and data science as possible choices. I ended up going with data science and cryology based off of your earlier comments! Let me know if you have any questions 🙂

mbjones · 2023-07-14T22:47:16Z

@julietcohen I rearranged the IWP dataset to streamline the directory structure as we discussed. Here's what I did, and the final file layout:

cd /var/data/10.18739/A2KW57K57/
cd iwp_geopackage_high/
mv staged/gpub020/WGS1984Quad .
mv staged/staging_summary.csv .
mv staged /var/data/submission/pdg/ice-wedge-polygon-data/
cd ../iwp_geotiff_high/
mv geotiff/WGS1984Quad .
mv geotiff/raster_events.csv .
mv geotiff/raster_summary.csv .
mv geotiff/raster_summary_duplicate.csv .
rmdir geotiff
cd ..
tree -L 2 .
.
├── cleaning_materials
│   ├── add_date_attribute_footprints.py
│   └── cleaning_data
├── iwp_geopackage_high
│   ├── staging_summary.csv
│   └── WGS1984Quad
├── iwp_geotiff_high
│   ├── raster_events.csv
│   ├── raster_summary.csv
│   ├── raster_summary_duplicate.csv
│   └── WGS1984Quad
├── iwp_shapefile_detections
│   ├── high
│   ├── low
│   └── medium
└── iwp_shapefile_footprints
    ├── high
    ├── low
    └── medium

I also revised the Mermaid diagram to reflect these changes, and worked a bit on the wording in that diagram:

flowchart LR

    A["Maxar <br> (satellite images)"] -->|MAPLE| B("`**/iwp_shapefile_detections/**
    Format: Shapefile
    Irregularly shaped vector files, one per image`")
    B -->|Create Tiles and <br> Identify Duplicates| C("`**/iwp_geopackage_high/**
    Format: GeoPackage
    Evenly-spaced vector tiles, with duplicates flagged`")
    C -->|Rasterize and remove <br> flagged duplicates| D("`**/iwp_geotiff_high/**
    Format: GeoTIFF
    Evenly-spaced raster tiles, with duplicates removed`")

dvirlar2 · 2023-07-17T17:05:51Z

Edits for next release:

Make the following change to the sampling description via R, not the web editor. For some reason the changes weren't being retained in the editor (bug report). Sampling description should instead be: "Sampling procedures include collecting satellite imagery via the commercial satellite database at the Polar Geospatial Center, University of Minnesota."
Include funding from TTAC and ACCESS to represent the high performance computing resources that were used to create this dataset
Add descriptive entities to the test package for the four csvs listed above: staging_summary.csv, raster_events.csv, raster_summary.csv, raster_summary_duplicate.csv
- copy metadata to production version when we're ready to update it

amalshehan · 2023-07-24T19:40:51Z

@dvirlar2 For the IWP mapping the HPC resources used are from TACC allocation DPP20001 and ACCESS allocation DPP190001. Do you need any other details such as the specific systems used?

julietcohen · 2023-07-26T16:17:22Z

Kenton provided the following to help fill in the ACCESS / TTAC grant info:

National Science Foundation - Leadership Resource Allocation (LRAC): Harnessing big satel-
lite imagery, deep learning, and high-performance computing resources to map pan-Arctic permafrost
thaw, 2020-2022, 94,000 GPU hours and 180 TB, 37.5 TB Tape on Frontera

National Science Foundation - ACCESS Explore: Permafrost Discovery Gateway Pan-Arctic
Dataset Creation, 2022-2023, 380,000 credits

And based on the format of the above ACCEES award, the new allocation info is:

National Science Foundation - ACCESS Discover: Permafrost Discovery Gateway Pan-Arctic
Dataset Creation, 2023-2024, 750,000 credits

julietcohen · 2023-07-26T20:48:30Z

More info from Kenton, the IBM acknowledgement:

IBM-Illinois Discovery Accelerator Institute - Scaling Data-Intensive Discovery Workflows on
the Hybrid Cloud, 2021-2023

IBM-Illinois Discovery Accelerator Institute - HDC: A Full-Stack Solution for the Hybrid Cloud
Marrying Data and Compute, 2023-2025

dvirlar2 · 2023-07-31T16:49:19Z

Thanks @amalshehan and @julietcohen! I think that's all the info I need, but I'll let you know if that changes.

julietcohen · 2024-03-15T22:58:21Z

Since this issue has been stagnant for some time, an update:
The "high" ice IWP dataset has been published (https://arcticdata.io/catalog/view/doi%3A10.18739%2FA2KW57K57) and the "low" and "medium" ice portions have been processed (run through the visualization workflow) but not archived at the ADC. All regions are visible on the PDG demo site.

Since one run on Delta processed the high ice (more than half the data), and the other run processed the low and medium ice, deduplication between those 2 tilesets was not executed. This is because the merging steps executes deduplication for gpkg files that were staged on different nodes. Because merging so many files takes days and depletes our Delta credits, it would be best to finish developing the kubernetes and parsl workflow to run on the NCEAS server (or another server, such as Google Cloud Platform) so we can take advantage of fast and powerful hardware without run time limitations, credit limitations, and memory limitations we experience on Delta.

Tickets to describe the progress of the kubernetes workflow are documented in the viz-workfow repo.

- Also make minor changes to the clip_to_footprint deduplication method Relates to #6

julietcohen added the help wanted Extra attention is needed label Oct 31, 2022

julietcohen self-assigned this Oct 31, 2022

mbjones added this to VizWorkflow Jan 23, 2024

julietcohen added this to Visualization Workflow Jan 23, 2024

robyngit changed the title ~~ADC documentation options for PDG datasets: IWP & water/glacier mask~~ Process and archive complete IWP dataset (high, medium, & low) Dec 13, 2024

robyngit moved this to Backlog in Visualization Workflow Dec 13, 2024

rushirajnenuji pushed a commit that referenced this issue Jan 29, 2025

Make new clip_to_footprint an option in staging

7302f64

- Also make minor changes to the clip_to_footprint deduplication method Relates to #6

Process and archive complete IWP dataset (high, medium, & low) #6

Process and archive complete IWP dataset (high, medium, & low) #6

Comments

julietcohen commented Oct 31, 2022

julietcohen commented Nov 1, 2022 • edited Loading

Data Processing Methods:

To Do:

julietcohen commented Nov 3, 2022 • edited Loading

Questions for PDG team

IWP data

Separate data package for coastal water, inland water, & glacier mask

ChandiWitharana commented Dec 7, 2022

ChandiWitharana commented Dec 7, 2022

julietcohen commented Dec 7, 2022

julietcohen commented Dec 7, 2022

ChandiWitharana commented Dec 8, 2022

julietcohen commented Dec 9, 2022 • edited Loading

robyngit commented Jun 14, 2023

dvirlar2 commented Jun 14, 2023

mbjones commented Jun 14, 2023

mbjones commented Jun 14, 2023

robyngit commented Jun 15, 2023 • edited by julietcohen Loading

ChandiWitharana commented Jun 22, 2023

ChandiWitharana commented Jun 22, 2023

ChandiWitharana commented Jun 22, 2023

ChandiWitharana commented Jun 22, 2023

robyngit commented Jun 22, 2023

julietcohen commented Jun 23, 2023

dvirlar2 commented Jun 23, 2023 • edited by julietcohen Loading

julietcohen commented Jun 23, 2023

julietcohen commented Jun 23, 2023

mbjones commented Jun 23, 2023

dvirlar2 commented Jun 26, 2023 • edited Loading

dvirlar2 commented Jun 26, 2023

dvirlar2 commented Jun 26, 2023 • edited Loading

mbjones commented Jun 26, 2023

julietcohen commented Jun 26, 2023

amalshehan commented Jun 27, 2023 • edited Loading

dvirlar2 commented Jun 28, 2023

dvirlar2 commented Jul 5, 2023

dvirlar2 commented Jul 5, 2023 • edited Loading

julietcohen commented Jul 5, 2023 • edited Loading

mbjones commented Jul 6, 2023

julietcohen commented Jul 6, 2023 • edited Loading

To Do for the IWP metadata package:

dvirlar2 commented Jul 6, 2023 • edited Loading

julietcohen commented Jul 6, 2023

dvirlar2 commented Jul 7, 2023

julietcohen commented Jul 7, 2023

amalshehan commented Jul 10, 2023 • edited Loading

amalshehan commented Jul 10, 2023

dvirlar2 commented Jul 12, 2023

dvirlar2 commented Jul 12, 2023

mbjones commented Jul 14, 2023

dvirlar2 commented Jul 17, 2023 • edited Loading

amalshehan commented Jul 24, 2023

julietcohen commented Jul 26, 2023 • edited Loading

julietcohen commented Jul 26, 2023 • edited Loading

dvirlar2 commented Jul 31, 2023

julietcohen commented Mar 15, 2024

julietcohen commented Nov 1, 2022 •

edited

Loading

julietcohen commented Nov 3, 2022 •

edited

Loading

julietcohen commented Dec 9, 2022 •

edited

Loading

robyngit commented Jun 15, 2023 •

edited by julietcohen

Loading

dvirlar2 commented Jun 23, 2023 •

edited by julietcohen

Loading

dvirlar2 commented Jun 26, 2023 •

edited

Loading

dvirlar2 commented Jun 26, 2023 •

edited

Loading

amalshehan commented Jun 27, 2023 •

edited

Loading

dvirlar2 commented Jul 5, 2023 •

edited

Loading

julietcohen commented Jul 5, 2023 •

edited

Loading

julietcohen commented Jul 6, 2023 •

edited

Loading

dvirlar2 commented Jul 6, 2023 •

edited

Loading

amalshehan commented Jul 10, 2023 •

edited

Loading

dvirlar2 commented Jul 17, 2023 •

edited

Loading

julietcohen commented Jul 26, 2023 •

edited

Loading

julietcohen commented Jul 26, 2023 •

edited

Loading