Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial validate support, with --strict option #142

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
ef746b1
Initial validate support, with --warnings option
will-moore Dec 2, 2021
ace592a
Add jsonschema to deps in setup.py
will-moore Dec 2, 2021
7d26157
Copy schemas from ngff. Use LocalRefResolver to load schemas
will-moore Dec 3, 2021
145b938
Add validation tests to test_cli.py
will-moore Dec 3, 2021
4888698
Add jsonschema to requirements-test.txt
will-moore Dec 3, 2021
a20c8aa
Fix jsonschema dependency
will-moore Dec 3, 2021
7520081
Remove .DS_Store files
will-moore Dec 9, 2021
cbd4ee6
Add .DS_Store to .gitignore
will-moore Dec 9, 2021
f0683a6
Use latest schema version by default
will-moore Dec 9, 2021
d093cda
Use logging instead of print
will-moore Dec 9, 2021
33298bd
Move top-level /schemas to ome_zarr/schemas
will-moore Dec 9, 2021
4c28024
remove get_strict_schema(). use get_schema(strict)
will-moore Dec 9, 2021
b388a3a
fix duplicate validate()
will-moore Dec 9, 2021
674ebb7
Import schemas from ngff
will-moore Dec 14, 2021
7b3ab42
Delete schemas
will-moore Dec 14, 2021
e380451
Merge remote-tracking branch 'origin/master' into validate_json_schema
will-moore Mar 22, 2022
d301229
Update to use 'ome_ngff' package
will-moore Mar 22, 2022
c37e116
visit(path: str, func: callable)
will-moore Mar 23, 2022
c1ed016
Remove use of ngff repo. Use cached_path instead
will-moore Jul 1, 2022
9ccebe0
Merge remote-tracking branch 'origin/master' into validate_json_schema
will-moore Jul 1, 2022
e3abc33
Remove ome_ngff dependency
will-moore Jul 1, 2022
5fef946
Add cached_path to dependencies in environment.yml
will-moore Jul 4, 2022
3ad5315
Rename --warnings to --strict. Support Plate and Well
will-moore Jul 4, 2022
5fcfe94
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 4, 2022
6774922
Update test to use --strict
will-moore Jul 4, 2022
2bbce18
Support --clear_cache argument for validate
will-moore Jul 5, 2022
96961bd
Replace spec.get_schema() with SCHEMA_NAME
will-moore Jul 5, 2022
c03b0a0
Use format.get_metadata_version(data) to get version
will-moore Jul 5, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ var
build
dist/
target/
*.DS_Store
*/.DS_Store
3 changes: 2 additions & 1 deletion .isort.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
[settings]
known_third_party = dask,numcodecs,numpy,pytest,scipy,setuptools,skimage,zarr

known_third_party = dask,jsonschema,numcodecs,numpy,ome_ngff,pytest,scipy,setuptools,skimage,zarr
multi_line_output = 3
include_trailing_comma = True
force_grid_wrap = 0
Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ channels:
dependencies:
- flake8
- ipython
- jsonschema
- mypy
- omero-py
- pip
Expand Down
13 changes: 13 additions & 0 deletions ome_zarr/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from .scale import Scaler
from .utils import download as zarr_download
from .utils import info as zarr_info
from .utils import validate as zarr_validate


def config_logging(loglevel: int, args: argparse.Namespace) -> None:
Expand All @@ -29,6 +30,12 @@ def info(args: argparse.Namespace) -> None:
list(zarr_info(args.path, stats=args.stats))


def validate(args: argparse.Namespace) -> None:
"""Wrap the :func:`~ome_zarr.utils.validate` method."""
config_logging(logging.WARN, args)
list(zarr_validate(args.path, args.warnings))


def download(args: argparse.Namespace) -> None:
"""Wrap the :func:`~ome_zarr.utils.download` method."""
config_logging(logging.WARN, args)
Expand Down Expand Up @@ -99,6 +106,12 @@ def main(args: List[str] = None) -> None:
parser_info.add_argument("--stats", action="store_true")
parser_info.set_defaults(func=info)

# validate
parser_validate = subparsers.add_parser("validate")
parser_validate.add_argument("path")
parser_validate.add_argument("--warnings", action="store_true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the role of this argument to trigger the validation of the recommended attributes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. We have various terminology: There are Spec rules that SHOULD be followed (so it's recommended that you do). These are covered by a strict schema and if you don't then this gives you warnings (if you use that flag)!
It would be nice to settle on ONE term that could cover all these cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Trying to look into the terminology used by others, I came back to SHACL which we have not decided to adopt at this stage but we might want to come back to in the near future.

A relevant concept is the idea of severity defined here and used by the validation report. In particular, looking at the pySHACL library, the --allow-warnings option effectively controls the severity level beyond which the result will be considered as invalid.

parser_validate.set_defaults(func=validate)

# download
parser_download = subparsers.add_parser("download")
parser_download.add_argument("path")
Expand Down
6 changes: 3 additions & 3 deletions ome_zarr/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,19 +147,19 @@ def create_zarr(
"channels": [
{
"color": "FF0000",
"window": {"start": 0, "end": 255},
"window": {"start": 0, "end": 255, "min": 0, "max": 255},
"label": "Red",
"active": True,
},
{
"color": "00FF00",
"window": {"start": 0, "end": 255},
"window": {"start": 0, "end": 255, "min": 0, "max": 255},
"label": "Green",
"active": True,
},
{
"color": "0000FF",
"window": {"start": 0, "end": 255},
"window": {"start": 0, "end": 255, "min": 0, "max": 255},
"label": "Blue",
"active": True,
},
Expand Down
39 changes: 38 additions & 1 deletion ome_zarr/reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,12 @@
import dask.array as da
import numpy as np
from dask import delayed
from jsonschema import validate as jsonschema_validate
from jsonschema.validators import validator_for
from ome_ngff.schemas import LocalRefResolver, get_schema

from .axes import Axes
from .format import format_from_version
from .format import CurrentFormat, format_from_version
from .io import ZarrLocation
from .types import JSONDict

Expand Down Expand Up @@ -106,6 +109,12 @@ def load(self, spec_type: Type["Spec"]) -> Optional["Spec"]:
return spec
return None

def validate(self, warnings: bool) -> None:
# Validation for a node is delegated to each spec
# e.g. Labels may have spec for multiscales and labels
for spec in self.specs:
spec.validate(warnings)

def add(
self,
zarr: ZarrLocation,
Expand Down Expand Up @@ -177,6 +186,10 @@ def __init__(self, node: Node) -> None:
def lookup(self, key: str, default: Any) -> Any:
return self.zarr.root_attrs.get(key, default)

def validate(self, warnings: bool = False) -> None:
# If not implemented, ignore for now
pass


class Labels(Spec):
"""Relatively small specification for the well-known "labels" group which only
Expand Down Expand Up @@ -324,6 +337,30 @@ def array(self, resolution: str, version: str) -> da.core.Array:
# data.shape is (t, c, z, y, x) by convention
return self.zarr.load(resolution)

def validate(self, warnings: bool = False) -> None:
multiscales = self.lookup("multiscales", [])
version = multiscales[0].get("version", CurrentFormat().version)
LOGGER.info("Validating Multiscales spec at: %s" % self.zarr)
LOGGER.info("Using Multiscales schema version: %s" % version)
image_schema = get_schema(version)

# Always do a validation with the MUST rules
# Will throw ValidationException if it fails
json_data = self.zarr.root_attrs
jsonschema_validate(instance=json_data, schema=image_schema)

# If we're also checking for SHOULD rules,
# we want to iterate all errors and show as "Warnings"
if warnings:
strict_schema = get_schema(version, strict=True)
cls = validator_for(strict_schema)
cls.check_schema(strict_schema)
# Use our local resolver subclass to resolve local documents
localResolver = LocalRefResolver.from_schema(strict_schema)
validator = cls(strict_schema, resolver=localResolver)
for error in validator.iter_errors(json_data):
LOGGER.warn(error.message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, not familiar with the code/design pattern here, can't see if there is some MVC like pattern where the warnings&errors (the Model) are collected/returned by some validate (the Controller), and then displayed (by the Viewer).
For me, as a 3rd party user intending to use this library to validate it, would be useful to have clear separation of the Model/Controller from the Viewer which would be "mine".

Here dandi/dandi-cli#943 (comment) we are to "converge" on some data structure which we would use to cover validation reports across multiple validators possibly used for any specific file/dataset at hands. So it would be important for us to be able to collect errors/warnings/hints uniformly first and then report them via our own "viewer".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback. This work is at quite an early stage, so we haven't considered the usage of this tool outside of a cli validation. In fact, with changes to the schemas at https://github.com/ome/ngff/tree/main/0.4/schemas (include regular and "strict" schemas), there should be less logic required to validate.
You may find that you can choose the JSON schema validation tool of your choice, and don't need anything from ome-zarr-py.
In any case, this PR needs to be reviewed...
cc @sbesson



class OMERO(Spec):
@staticmethod
Expand Down
41 changes: 32 additions & 9 deletions ome_zarr/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import json
import logging
from pathlib import Path
from typing import Iterator, List
from typing import Callable, Iterator, List

import dask
import dask.array as da
Expand All @@ -17,21 +17,26 @@
LOGGER = logging.getLogger("ome_zarr.utils")


def info(path: str, stats: bool = False) -> Iterator[Node]:
"""Print information about an OME-Zarr fileset.

All :class:`Nodes <ome_utils.reader.Node>` that are found from the given path will
be visited recursively.
"""
def visit(path: str, func: Callable) -> Iterator[Node]:
"""Call func(node) for each node read from path."""
zarr = parse_url(path)
assert zarr, f"not a zarr: {zarr}"
reader = Reader(zarr)
for node in reader():

if not node.specs:
print(f"not an ome-zarr node: {node}")
continue
yield func(node)


def info(path: str, stats: bool = False) -> Iterator[Node]:
"""Print information about an OME-Zarr fileset.

All :class:`Nodes <ome_utils.reader.Node>` that are found from the given path will
be visited recursively.
"""

def func(node: Node) -> Node:
print(node)
print(" - metadata")
for spec in node.specs:
Expand All @@ -43,7 +48,25 @@ def info(path: str, stats: bool = False) -> Iterator[Node]:
minmax = f" minmax={dask.compute(array.min(), array.max())}"
print(f" - {array.shape}{minmax}")
LOGGER.debug(node.data)
yield node
return node

return visit(path, func)


def validate(path: str, warnings: bool) -> Iterator[Node]:
"""
Validate OME-NGFF data

All :class:`Nodes <ome_utils.reader.Node>` that are found from the given path will
be visited recursively.
"""

def func(node: Node) -> Node:
if hasattr(node, "validate"):
node.validate(warnings)
return node

return visit(path, func)


def download(input_path: str, output_dir: str = ".") -> None:
Expand Down
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ def read(fname):
install_requires += (["requests"],)
install_requires += (["scikit-image"],)
install_requires += (["toolz"],)
install_requires += (["jsonschema"],)
install_requires += (["ome_ngff"],)


setup(
Expand Down
9 changes: 9 additions & 0 deletions tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,15 @@ def test_astronaut_info(self):
main(["create", "--method=astronaut", filename])
main(["info", filename])

@pytest.mark.parametrize("warnings", [False, True])
def test_astronaut_validation(self, warnings):
filename = str(self.path) + "-2"
main(["create", "--method=astronaut", filename])
if warnings:
main(["validate", "--warnings", filename])
else:
main(["validate", filename])

def test_astronaut_download(self, tmpdir):
out = str(tmpdir / "out")
filename = str(self.path) + "-3"
Expand Down