Skip to content

Latest commit

 

History

History
296 lines (257 loc) · 10 KB

meet-009_2024-05-29.md

File metadata and controls

296 lines (257 loc) · 10 KB

Meeting agenda, notes and actions for 2024-05-29 at 12 noon ET

Hackpad link

Organizer : Daniel Wheeler

Attendees : - Daniel Wheeler (he/him)

  • Hafiz Noman
  • Olga Wodo
  • Marvin Tegeler
  • Steve DeWitt (he/him)
  • Katsuyo Thornton
  • David Montiel

Links

Agenda

  1. Any questions or items to raise for discussion (please add)
  2. Reminders
    • Next office hours:
      • 2024-06-07, Friday, 11AM ET
      • 2024-06-21, Friday, 11AM ET
    • Next WG meet
      • 2024-06-26, Wednesday, 12-noon ET
  3. Summer student
    • Austyn Nguyen will be working with me on the schema design with ro-crate and implementing an example for PFHub. He'll be joining our meetings
  4. Think about rotating schema section editing
  5. Developing a FAIR-compliant Metadata Standard for Phase Field Data using Semantic Web Resources, link on overleaf
    • Does everyone have access?
  6. Computational execution mindmap, see below
    • Divide between persistent and varying metadata
  7. RO-Crate
    • During office hour on 2024-04-12 Michael Selzer strongly advocated for RO-Crate
    • ELN Consortia are using it
    • Based on schema.org
    • In its simplest from an RO-Crate describes a directory structure
    • SWC style lesson
  8. RO-Crate example, see below
  9. Workflow Run Ro-Crate
  10. Failed to setup literature review on google docs. Sorry!
    • I have lit review in personal notes, will try and copy over for next meeting

Notes / Action Items

  • Include lit review notes on google docs
  • Hafiz: how inputs and outputs are related

Computational Execution Mindmap

computational-execution-mindmap

RO-Crate Example

Using Python to create an RO-Crate

I used the pyrocrate tests to figure out the code below. Documentation is poor.

Example of capturing software tools

from rocrate.rocrate import ROCrate
from rocrate.model.person import Person
from rocrate.model.entity import Entity
from rocrate.model.computationalworkflow import ComputationalWorkflow

crate = ROCrate()

yaml = crate.add_file("working/pfhub.yaml", properties={
    "name": "PFHub meta data file",
    "encodingFormat": "text/yaml"
})
csv = crate.add_file("working/free_energy_1a.csv", properties={
    "name": "Free Energy",
    "encodingFormat": "text/csv"
})

license_id = "https://spdx.org/licenses/CC0-1.0"
wheeler_id = "https://orcid.org/0000-0002-2653-7418"
keller_id = "https://orcid.org/0000-0002-2920-8302"



wheeler = crate.add(
    Person(
        crate,
        wheeler_id,
        properties=dict(name="Daniel Wheeler", affiliation="NIST")
    )
)

license = crate.add(Entity(
    crate,
    identifier=license_id,
    properties={
        "@type": "CreativeWork",
        "name": "CC0-1.0",
        "description": "Creative Commons Zero v1.0 Universal",
        "url": "https://creativecommons.org/publicdomain/zero/1.0/"
    }
    )
)

crate.license = license
crate.root_dataset["author"] = wheeler
crate.description = "An example of generating an ro-crate from a PFHub result, for now this is only focused on the computational platform, environment and implementation"

#from metadata list on workflow hub https://about.workflowhub.eu/docs/metadata-list/
crate.root_dataset["title"] = "PFHub title: fipy_1a_travis"
keller = crate.add(
    Person(
        crate,
        keller_id,
        properties=dict(name="Trevor Keller", affiliation="NIST")
    )
)
workflow = crate.add_workflow('https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard.py')
workflow.programmingLanguage = "Python 3.10"
workflow["creator"] = keller
workflow["dateCreated"] = "2017-01-09"
crate.add(workflow)
crate.write("exp_crate")

RO-Crate JSON file

{
    "@context": "https://w3id.org/ro/crate/1.1/context",
    "@graph": [
        {
            "@id": "./",
            "@type": "Dataset",
            "author": {
                "@id": "https://orcid.org/0000-0002-2653-7418"
            },
            "datePublished": "2024-04-19T20:44:03+00:00",
            "description": "An example of generating an ro-crate from a PFHub result, for now this is only focused on the computational platform, environment and implementation",
            "hasPart": [
                {
                    "@id": "pfhub.yaml"
                },
                {
                    "@id": "free_energy_1a.csv"
                },
                {
                    "@id": "https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard.py"
                }
            ],
            "license": {
                "@id": "https://spdx.org/licenses/CC0-1.0"
            },
            "title": "PFHub title: fipy_1a_travis"
        },
        {
            "@id": "ro-crate-metadata.json",
            "@type": "CreativeWork",
            "about": {
                "@id": "./"
            },
            "conformsTo": {
                "@id": "https://w3id.org/ro/crate/1.1"
            }
        },
        {
            "@id": "pfhub.yaml",
            "@type": "File",
            "encodingFormat": "text/yaml",
            "name": "PFHub meta data file"
        },
        {
            "@id": "free_energy_1a.csv",
            "@type": "File",
            "encodingFormat": "text/csv",
            "name": "Free Energy"
        },
        {
            "@id": "https://orcid.org/0000-0002-2653-7418",
            "@type": "Person",
            "affiliation": "NIST",
            "name": "Daniel Wheeler"
        },
        {
            "@id": "https://spdx.org/licenses/CC0-1.0",
            "@type": "CreativeWork",
            "description": "Creative Commons Zero v1.0 Universal",
            "name": "CC0-1.0",
            "url": "https://creativecommons.org/publicdomain/zero/1.0/"
        },
        {
            "@id": "https://orcid.org/0000-0002-2920-8302",
            "@type": "Person",
            "affiliation": "NIST",
            "name": "Trevor Keller"
        },
        {
            "@id": "https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard.py",
            "@type": [
                "File",
                "SoftwareSourceCode",
                "ComputationalWorkflow"
            ],
            "creator": {
                "@id": "https://orcid.org/0000-0002-2920-8302"
            },
            "dateCreated": "2017-01-09",
            "name": "https://github.com/usnistgov/FiPy-spinodal-decomposition-benchmark/blob/main/periodic/cahn-hilliard",
            "programmingLanguage": "Python 3.10"
        },
        {
            "@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl",
            "@type": "ComputerLanguage",
            "alternateName": "CWL",
            "identifier": {
                "@id": "https://w3id.org/cwl/"
            },
            "name": "Common Workflow Language",
            "url": {
                "@id": "https://www.commonwl.org/"
            }
        }
    ]
}

Review of "Recording provenance of workflow runs with RO-Crate"

  • Presents Workflow Run RO-Crate (WRROC)
  • 3 new profiles
  • Has concept of retrospective and prospective provenance built into profiles
  • RO-Crate describes a directory structure in its simplest form
  • The Workflow Run RO-Crate is a set of 3 profiles that extends RO-Crate
  • WRROC strikes a balance between actionable and readable. The profiles are types.
  • List of requirements
    • Containers
    • memory usage
    • config files
    • env files
    • timings
    • success / failure status
    • inputs / outputs
    • versioning
    • scripts
    • parameters
  • Each requirement is linked ot a github issue!
  • 3 types of workflow run crates
    • Process run crate (describe the exectusion as one or more tools)
      • includes human executions
      • poorly defined
      • exectution of multiple software apps
      • allows "composite" data sets
    • workflow run crate (predefined workflow)
      • well defined
    • provenance run crate (workflow computation including internal details)
      • internal details such as inputs / outputs between steps
  • Designed to have workflows rerun -- reproducible
  • inheritance mechanism allows reuse of common parts of descriptions
  • 7 different workflow systems now using WRROCs (includig Galaxy)
  • All crates are included in Provenance Run WRROC.
  • Supposedly runcrate should allow WRROC to actually be executed?
    • Documentation is poor though
  • Allows data to be described at very different levels of granularity
  • runcrate toolkit will be expanded it says