Skip to content

Commit

Permalink
Merge pull request #157 from LDeakin/codec_aliasing
Browse files Browse the repository at this point in the history
Refactor codec metadata and fill values
  • Loading branch information
LDeakin authored Mar 9, 2025
2 parents 74bee58 + 27eb79e commit 6d3cc2d
Show file tree
Hide file tree
Showing 121 changed files with 3,192 additions and 2,013 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `ArrayCreateError::DataTypeCreateError` now uses a `PluginCreateError` internally
- **Breaking**: `ArrayError` is now marked as non-exhaustive
- Bump `half` to 2.3.1
- Use the `vlen-{utf8,bytes}` codec by default for `string`/`r*` data types
- `zarrs` previously used `vlen`, an experimental codec not supported by other implementations
- Refactor `codec` name handling and `CodecTraits` in alignment with ZEP0009 and the [`zarr-extensions`] repository
- All "experimental" codecs now use the `zarrs.` prefix (or `numcodecs.` if fully compatible)
- Add support for aliased codec names
- Enables pass-through of codecs from Zarr V2 to V3 without converting to a V3 equivalent (if supported)
- **Breaking**: Add `CodecTraits::{identifier,default_name,configuration[_opt]}()`
- **Breaking**: Remove `CodecTraits::create_metadata[_opt]()`

### Fixed
- Fixed reserving one more element than necessary when retrieving `string` or `bytes` array elements
Expand Down Expand Up @@ -1333,6 +1341,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
[0.3.0]: https://github.com/LDeakin/zarrs/releases/tag/v0.3.0
[0.2.0]: https://github.com/LDeakin/zarrs/releases/tag/v0.2.0

[`zarr-extensions`]: https://github.com/zarr-developers/zarr-extensions

[@LDeakin]: https://github.com/LDeakin
[@lorenzocerrone]: https://github.com/lorenzocerrone
[@dustinlagoy]: https://github.com/dustinlagoy
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ version = "0.2.0"
path = "zarrs_data_type"

[workspace.dependencies.zarrs_metadata]
version = "0.3.6"
version = "0.4.0"
path = "zarrs_metadata"

[workspace.dependencies.zarrs_plugin]
Expand Down
4 changes: 3 additions & 1 deletion zarrs/benches/array_blosc.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
#![allow(missing_docs)]

use std::sync::Arc;

use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
use zarrs::{
array::codec::BloscCodec,
metadata::v3::array::codec::blosc::{BloscCompressionLevel, BloscCompressor, BloscShuffleMode},
metadata::codec::blosc::{BloscCompressionLevel, BloscCompressor, BloscShuffleMode},
};

fn array_blosc_write_all(c: &mut Criterion) {
Expand Down
69 changes: 53 additions & 16 deletions zarrs/doc/status/codecs.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,60 @@
| Codec Type | Codec | ZEP | V3 | V2 | Feature Flag* |
| -------------- | ------------------ | ----------------- | ------- | ------- | ------------- |
| Array to Array | [transpose] | [ZEP0001] | ✓ | | **transpose** |
| Array to Bytes | [bytes] | [ZEP0001] | ✓ | | |
| | [sharding_indexed] | [ZEP0002] | ✓ | | **sharding** |
| Bytes to Bytes | [blosc] | [ZEP0001] | ✓ | ✓ | **blosc** |
| | [gzip] | [ZEP0001] | ✓ | ✓ | **gzip** |
| | [crc32c] | [ZEP0002] | ✓ | | **crc32c** |
| | [zstd] | [zarr-specs #256] | ✓ | ✓ | **zstd** |
| Codec Type | Default codec `name` | Status | Feature Flag* |
| -------------- | -------------------------| ------------ | ------------- |
| Array to Array | [`transpose`] | Core | **transpose** |
| | [`zarrs.bitround`] | Experimental | bitround |
| Array to Bytes | [`bytes`] | Core | |
| | [`sharding_indexed`] | Core | **sharding** |
| | [`vlen-array`] | Experimental | |
| | [`vlen-bytes`] | Experimental | |
| | [`vlen-utf8`] | Experimental | |
| | [`numcodecs.pcodec`] | Experimental | pcodec |
| | [`numcodecs.zfpy`] | Experimental | zfp |
| | [`zarrs.vlen`] | Experimental | |
| | [`zarrs.vlen_v2`] | Experimental | |
| | [`zarrs.zfp`] | Experimental | zfp |
| Bytes to Bytes | [`blosc`] | Core | **blosc** |
| | [`crc32c`] | Core | **crc32c** |
| | [`gzip`] | Core | **gzip** |
| | [`zstd`] | Experimental | **zstd** |
| | [`numcodecs.bz2`] | Experimental | bz2 |
| | [`numcodecs.fletcher32`] | Experimental | fletcher32 |
| | [`zarrs.gdeflate`] | Experimental | gdeflate |

<sup>\* Bolded feature flags are part of the default set of features.</sup>

[ZEP0001]: https://zarr.dev/zeps/accepted/ZEP0001.html
[ZEP0002]: https://zarr.dev/zeps/accepted/ZEP0001.html
[zarr-specs #256]: https://github.com/zarr-developers/zarr-specs/pull/256

[transpose]: crate::array::codec::array_to_array::transpose
[bytes]: crate::array::codec::array_to_bytes::bytes
[sharding_indexed]: crate::array::codec::array_to_bytes::sharding
[blosc]: crate::array::codec::bytes_to_bytes::blosc
[gzip]: crate::array::codec::bytes_to_bytes::gzip
[crc32c]: crate::array::codec::bytes_to_bytes::crc32c
[zstd]: crate::array::codec::bytes_to_bytes::zstd
[`transpose`]: crate::array::codec::array_to_array::transpose
[`zarrs.bitround`]: crate::array::codec::array_to_array::bitround

[`bytes`]: crate::array::codec::array_to_bytes::bytes
[`vlen-array`]: crate::array::codec::array_to_bytes::vlen_array
[`vlen-bytes`]: crate::array::codec::array_to_bytes::vlen_bytes
[`vlen-utf8`]: crate::array::codec::array_to_bytes::vlen_utf8
[`sharding_indexed`]: crate::array::codec::array_to_bytes::sharding
[`numcodecs.pcodec`]: crate::array::codec::array_to_bytes::pcodec
[`numcodecs.zfpy`]: crate::array::codec::array_to_bytes::zfpy
[`zarrs.vlen`]: crate::array::codec::array_to_bytes::vlen
[`zarrs.vlen_v2`]: crate::array::codec::array_to_bytes::vlen_v2
[`zarrs.zfp`]: crate::array::codec::array_to_bytes::zfp

[`blosc`]: crate::array::codec::bytes_to_bytes::blosc
[`crc32c`]: crate::array::codec::bytes_to_bytes::crc32c
[`gzip`]: crate::array::codec::bytes_to_bytes::gzip
[`zstd`]: crate::array::codec::bytes_to_bytes::zstd
[`numcodecs.bz2`]: crate::array::codec::bytes_to_bytes::gzip
[`numcodecs.fletcher32`]: crate::array::codec::bytes_to_bytes::fletcher32
[`zarrs.gdeflate`]: crate::array::codec::bytes_to_bytes::gdeflate

**Experimental codecs are recommended for evaluation only**.
They may change in future releases without maintaining backwards compatibilty.
These codecs have not been standardised, but many are fully compatible with other Zarr implementations.

Codec `name`s and aliases are configurable with [`Config::codec_map_mut`](config::Config::codec_map_mut).
`zarrs` will persist codec names if opening an existing array of creating an array from metadata.

`zarrs` supports arrays created with `zarr-python` 3.x.x with various `numcodecs.zarr3` codecs.
However, arrays must be written with `numcodecs` 0.15+.

49 changes: 0 additions & 49 deletions zarrs/doc/status/codecs_experimental.md

This file was deleted.

48 changes: 28 additions & 20 deletions zarrs/examples/custom_data_type_fixed_size.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
#![allow(missing_docs)]

use std::{borrow::Cow, sync::Arc};
use std::{borrow::Cow, collections::HashMap, sync::Arc};

use num::traits::{FromBytes, ToBytes};
use serde::{Deserialize, Serialize};
use zarrs::array::{
ArrayBuilder, ArrayBytes, ArrayError, DataTypeSize, Element, ElementOwned, FillValueMetadataV3,
};
Expand All @@ -19,12 +18,21 @@ use zarrs_metadata::{
use zarrs_plugin::{PluginCreateError, PluginMetadataInvalidError};
use zarrs_storage::store::MemoryStore;

#[derive(Clone, Copy, Debug, PartialEq, Deserialize, Serialize)]
#[derive(Clone, Copy, Debug, PartialEq)]
struct CustomDataTypeFixedSizeElement {
x: u64,
y: f32,
}

impl From<CustomDataTypeFixedSizeElement> for FillValueMetadataV3 {
fn from(value: CustomDataTypeFixedSizeElement) -> Self {
FillValueMetadataV3::from(HashMap::from([
("x".to_string(), FillValueMetadataV3::from(value.x)),
("y".to_string(), FillValueMetadataV3::from(value.y)),
]))
}
}

type CustomDataTypeFixedSizeMetadata = CustomDataTypeFixedSizeElement;

type CustomDataTypeFixedSizeBytes = [u8; size_of::<u64>() + size_of::<f32>()];
Expand Down Expand Up @@ -144,34 +152,34 @@ impl DataTypeExtension for CustomDataTypeFixedSize {
&self,
fill_value_metadata: &FillValueMetadataV3,
) -> Result<FillValue, IncompatibleFillValueMetadataError> {
let custom_fill_value = match fill_value_metadata {
FillValueMetadataV3::Unsupported(value) => serde_json::from_value::<
CustomDataTypeFixedSizeMetadata,
>(value.clone())
.map_err(|_| {
IncompatibleFillValueMetadataError::new(self.name(), fill_value_metadata.clone())
})?,
_ => Err(IncompatibleFillValueMetadataError::new(
self.name(),
fill_value_metadata.clone(),
))?,
};
Ok(FillValue::new(custom_fill_value.to_ne_bytes().to_vec()))
let err =
|| IncompatibleFillValueMetadataError::new(self.name(), fill_value_metadata.clone());
let metadata_object = fill_value_metadata.as_object().ok_or_else(err)?;
let x = metadata_object
.get("x")
.ok_or_else(err)?
.as_u64()
.ok_or_else(err)?;
let y = metadata_object
.get("y")
.ok_or_else(err)?
.as_f64()
.ok_or_else(err)? as f32;
let element = CustomDataTypeFixedSizeElement { x, y };
Ok(FillValue::new(element.to_ne_bytes().to_vec()))
}

fn metadata_fill_value(
&self,
fill_value: &FillValue,
) -> Result<FillValueMetadataV3, IncompatibleFillValueError> {
let fill_value_metadata = CustomDataTypeFixedSizeMetadata::from_ne_bytes(
let element = CustomDataTypeFixedSizeMetadata::from_ne_bytes(
fill_value
.as_ne_bytes()
.try_into()
.map_err(|_| IncompatibleFillValueError::new(self.name(), fill_value.clone()))?,
);
Ok(FillValueMetadataV3::Unsupported(
serde_json::to_value(fill_value_metadata).unwrap(),
))
Ok(FillValueMetadataV3::from(element))
}

fn size(&self) -> zarrs::array::DataTypeSize {
Expand Down
32 changes: 11 additions & 21 deletions zarrs/examples/custom_data_type_variable_size.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ use zarrs_data_type::{
DataType, DataTypeExtension, DataTypePlugin, FillValue, IncompatibleFillValueError,
IncompatibleFillValueMetadataError,
};
use zarrs_metadata::v3::{array::fill_value::FillValueFloat, MetadataConfiguration, MetadataV3};
use zarrs_metadata::v3::{MetadataConfiguration, MetadataV3};
use zarrs_plugin::{PluginCreateError, PluginMetadataInvalidError};
use zarrs_storage::store::MemoryStore;

Expand Down Expand Up @@ -116,24 +116,16 @@ impl DataTypeExtension for CustomDataTypeVariableSize {
&self,
fill_value_metadata: &FillValueMetadataV3,
) -> Result<FillValue, IncompatibleFillValueMetadataError> {
let fill_value = match fill_value_metadata {
FillValueMetadataV3::Float(f) => Ok(f
.to_float::<f32>()
.ok_or_else(|| {
IncompatibleFillValueMetadataError::new(
self.name(),
fill_value_metadata.clone(),
)
})?
.to_ne_bytes()
.to_vec()),
FillValueMetadataV3::Unsupported(serde_json::Value::Null) => Ok(vec![]),
_ => Err(IncompatibleFillValueMetadataError::new(
if let Some(f) = fill_value_metadata.as_f32() {
Ok(FillValue::new(f.to_ne_bytes().to_vec()))
} else if fill_value_metadata.is_null() {
Ok(FillValue::new(vec![]))
} else {
Err(IncompatibleFillValueMetadataError::new(
self.name(),
fill_value_metadata.clone(),
)),
}?;
Ok(FillValue::new(fill_value))
))
}
}

fn metadata_fill_value(
Expand All @@ -142,12 +134,10 @@ impl DataTypeExtension for CustomDataTypeVariableSize {
) -> Result<FillValueMetadataV3, IncompatibleFillValueError> {
let fill_value = fill_value.as_ne_bytes();
if fill_value.len() == 0 {
Ok(FillValueMetadataV3::Unsupported(serde_json::Value::Null))
Ok(FillValueMetadataV3::Null)
} else if fill_value.len() == 4 {
let value = f32::from_ne_bytes(fill_value.try_into().unwrap());
Ok(FillValueMetadataV3::Float(FillValueFloat::Float(
value as f64,
)))
Ok(FillValueMetadataV3::from(value))
} else {
Err(IncompatibleFillValueError::new(
self.name(),
Expand Down
Loading

0 comments on commit 6d3cc2d

Please sign in to comment.