- make
[]
for attributes taking a string only for the data type kind to check if the key exist, raise if not. Previously if the attribute hadn’t been opened before, the call would fail even if it exists in the file - make
withAttr
check if the attribute exists in the file first. Previously would fail if it hadn’t been opened before
- fix a small memory leak related to
cstrings
not being freed - fix the
-d:DEBUG_HDF5
option for debug information - add support for
ref object
infromH5
deserialization
- fixes an issue where
serialize
(toH5
) would cause a crash if a reference type was contained withnil
value - fixes a possible crash when trying to write an attribute of string / seq type with zero length (simply does not call into the H5 function now; attribute is still created though!)
- correctly handle
distinct
types incopyflat
- PARTIALLY BREAKING: Nim support for 1.6 is a bit wonky. It’s still
supported, but when running on
refc
there seem to be cases where destructors are being called too early, leading to subtle bugs. And on ORC one test case (tWithDset
) does not compile. So please upgrade. - refactor
attributes
read / write logic - add support for more types in attributes (compound types etc.)
- add support to read all types in
withAttr
, including compound andseq[dkObject>
types. Note that in those two cases the data is available asattr
as aJsonNode
! - make
copy_attributes
work for all types
- better handle empty inputs in
add
/write_hyperslab
. Do nothing in that case
- fix
copy
implementation for certain use cases and fix a regression in it
- fix JSON dataset reading not taking into account the shape of a dataset when the dataset is only simple data (same datatype)
- fix naming of matlab field structs, due to misunderstanding on my
part (I thought the
_ref
was part of the HDF5 designation in my test file) - allow
references
to yield the input dataset if it is not a reference dataset
- BREAKING: fix issue with dataset and group iterators, where we
treated the root group special. The root group was off by 1
/
in the counting of the depth. This means no matter the group to start at, the meaning ofdepth
is explicitly such thatdepth == 1
corresponds to any child of the given group. - export all
fromFlat
helper procs - allow
read
from a rawpointer
(intended for manual use withBuffer
). Note: you may need to callH5Dvlen_reclaim
if your read data contains VLEN data! - add basic support for HDF5 references
- add basic support for reading HDF5 datasets and groups into
JsonNode
, for easier handling of runtime based decisions. Seenimhdf5/hdf5_json.nim
- add experimental Matlab file loader (v7.3 Matlab files), see
nimhdf5/hdf5_matlab
. Seeexamples/read_matlab.nim
for an example.
- improves pretty printing of HDF5 objects (PR #65 by @AngelEzquerra)
- fix segfault due to stack corruption on Windows
Wow, this was hard to understand!
The final explanation is: In getNumAttrs we declare a variable h5info on the stack of type H5O_info_t. Due to time_t being interpreted as 4 bytes, the full H5O_info_t type had a size of 144 bytes for our Nim object. But the HDF5 library expects the object to be 160 bytes. We hand a pointer to the stack object to the Nim library, which writes to it. As a result of our object being too small, the HDF5 library caused a stack corruption leading to very bizarre bugs.
My favorite:
“` let isItNil = h5attrs.isNil doAssert not h5attrs.isNil, “But was it nil?” & $isItNil “`
which failed in the assertion, but printed isItNil = false…
The confusing part was that the API changed between 1.10 (is what I run locally on linux) and 1.12 (I run 1.14 on Windows), causing me to think the problem is in the difference between the H5O_info_t types. But that’s not the actual problem, because the type we actually map to is H5O_info1_t, which is backwards compatible, because we call the H5Oget_info2 (and friends) procedure, which takes that type as an argument. The new H5O_info2_t type is used in the H5Oget_info3 function.
- fix Windows support
- use custom
/
proc to handle\
separators that are inserted intermittently fromos./
andparentDir
- fix
normalizePath
to always use/
as separator - fix
int
type to always be mapped to 8 byte HDF5 type if machine’s type is also 8 byte by usingH5T_NATIVE_LLONG
in those cases - replace hardcoded paths in some tests by using
getTempDir
- use custom
- fixes potential source of segfaults in
copyflat
where we could callallocShared0
with a0
argument - support
array
types to be written (at least as part of a compound datatype) - improve warning message when importing
blosc
filters without the Nimblosc
library installed
- Drops support for Nim 1.4
- add basic serialization submodule to auto serialize most objects to
a H5 file. Scalar types are written as attributes and non scalar as
datasets.
Can be extended for complicated custom types by using the
toH5
hook. See thetSerialize.nim
test and theserialize.nim
file. Note: currently no deserialization is supported. You need to parse the data back into your file if needed. An equivalent inverse can be added, but has no priority at the moment. - allow usage of tilde
~
in paths to H5 files - replace distinct `hid_t` types by traced ‘fat’ objects
The basic idea here is the following: The `hid_t` identifiers all refer to objects that live in the H5 library (and possibly in a file). In our previous approach we kept track of different types by using `distinct hid_t` types. That’s great because we cannot mix and match the wrong type of identifiers in a given context. However, there are real resources underlying each identifier. Most identifiers require the user to call a `close` / `free` type of routine. While we can associate a destructor with a `=destroy` hook to a `distinct hid_t` (with `hid_t` just being an integer type), the issue is when that destructor is being called. In this old way the identifier is a pure value type. If an identifier is copied and the copy goes out of scope early, we release the resource despite still needing it! Therefore, we now have a ‘fat’ object that knows its internal id (just a real `hid_t`) and which closing function to call. Our actual IDs then are `ref objects` of these fat objects. That way we get sane releasing of resources in the correct moments, i.e. when the last reference to an identifier goes out of scope. This is the correct thing to do in 99% of the cases.
- add
FileID
field to parent file for datasets, similar to already present for groups. Convenient in practice. - refactor
read
andwrite
related procs. The meat of the code is now handled in one procedure each (which also takes care of reclaiming VLEN memory for example). - greatly improve automatic writing and reading of complex datatypes
including Nim objects that contain
string
fields or other VLEN data. This is performed by performing a copy to a suitable datatype that matches the H5 definition of the equivalent data in Nim.type_utils
andcopyflat
submodules are added to that end. In this context there is some trickyness involved, which causes the implementation to be more complex than one might expect. The necessity to get the correct alignment between naive `offsetOf` expectations and the reality of how structs are packed.
- remove support for reading into a
cstring
, as this is not well defined. A local cstring that needs to be created cannot be returned (without dealing manually with allocations) - add
add
,write_hyperslab
,read
working withptr T
for direct access with a manual memory region (useful when working with things likeTensors
) - reorder
dataset.nim
code a little bit - support
openArray
in more places
- (finally!) add support for
string
datasets- fixed length string datasets, written by constructing a
create_dataset("foo", <size>, array[N, char])
dataset (writing is done by simply giving aseq[string]
- variable length string datasets, written by constructing a
create_dataset("foo", <size>, string)
dataset (writing is done by simply giving aseq[string]
) - support strings as variable length arrays of type
char
, constructed bycreate_dataset("foo", <size>, special_type(char))
dataset (writing is done by simply giving aseq[string]
- fixed length string datasets, written by constructing a
- add missing overload for
write
for the most general case, which was previously only possible via[]=
, so:let dset = ... dset.write(data)
is now valid.
- implement slicing
read
andwrite
procedures for 1D datasets:let data = @[1, 2, 3] var dset = h5f.create_dataset("foo", 3, int) dset.write(data) doAssert data[0 .. 1] == data[0 .. 1] doAssert data.read(0 .. 1) == data[0 .. 1] dset.write(1 .. 2) = @[4, 5] doAssert dset[1 ..< 3] == @[4, 5] dset[0 .. 1] = @[10, 11] doAssert dset[int] == @[10, 11, 5]
is now also all valid. These are implemented by using hyperslab reading / writing.
- fix bug in
write_norm
about coordinate selection, such that writing specific indices now actually works correctly - fix bug in
write
when writing specific coordinates of a 1D dataset
- fix behavior of
delete
to make sure we also keep our internalTableRef
in line with the file - BREAKING: fully support writing datasets as
(N, )
instead of turning it into(N, 1)
instead (especially for VLEN data). This has big implications when reading 1D data using hyperslabs. If instead of adding an extra dimension as:let data = dset.read_hyperslab(dtype, start = @[1000, 0], count = @[1000, 1])
instead of
let data = dset.read_hyperslab(dtype, start = @[1000], count = @[1000])
reading performance is orders of magnitudes slower! Essentially when handing an integer to
create_datasets
it is now kept as such (and turned into a 1 element tuple). For non vlen data creating and writing such datasets correctly worked correctly before if I’m not mistaken. - add more exception types for dealing with filters & in particular
blosc
:HDF5FilterError
HDF5DecompressionError
HDF5BloscFilterError
HDF5BloscDecompressionError
- add
overwrite
option towrite_dataset
convenience proc
- avoid copy of input data when writing VLEN data
- CT error if composite data with string fields is being read, as it’s currently not supported (strings are vlen data & vlen in composite isn’t implemented)
- fix regression in
copy
due todistinct hid_t
variants - extend
withDset
to work properly with vlen data (returningdset
variable withseq[seq[T]]
) and addwithDset
overload working with a H5 file and a string name of a dataset - add test case for
withDset
- treat
akTruncate
flag as write access to the file (create_dataset
was not working with it) - fix
blosc
filter, regressed due to recentdistinct
introductions
- further fixes
=destroy
hooks introduced inv0.4.2
. Under some circumstances the defined hooks caused segmentation faults when deallocating objects (these hooks are finicky!) - fix opening files with
akTruncate
(i.e. overwrite a file instead of appending) - SEMI-BREAKING: raise an exception if opening a file failed.
This is more of an oversight rather than a feature that we did not
raise so far. This is not really breaking in a sense, because in
the past we simply failed in the
getNumAttrs
call that happened when trying to open the attributes of the root group in the file.
- fixed the
=destroy
hooks introduced inv0.4.2
- added support for
SWMR
(see README) - introduce better checks on whether an object is open by using
H5I
interface - turn file access constants into an
enum
to better handle multiple constants at the same time as aset
- lots of cleanup of old code, replace includes by imports, …
- adds
getOrCreateGroup
helper to always get a group, either returning the existing one or creating it. Before versionv0.4.0
this was the default behavior for[]
as well ascreate_group
. As of now,[]
raises aKeyError
now if it does not exist (this is a breaking change that is retroactively added to the changelog ofv0.4.0
). However,create_group
does not throw if the group already exists. This may change in the future though.
- adds missing import of
os.`/`
indatasets.nim
, which got removed in the refactor - fixes a regression in
open
for datasets in the case of a not existing dataset
- NOTE: At the time of release of
v0.4.0
the following breaking change was not listed as such:[]
for groups does not create a group anymore, if it does not exist. UsegetOrCreateGroup
added inv0.4.2
for that! This was an unintended side effect that was overlooked, as the implementation was based oncreate_group
.
- major change: introduce multiple different distinct types for the
different usages of
hid_t
in the HDF5 library. This gives us more readability, type safety etc. We can write proper type awareclose
procedures etc. - also adds
=destroy
hooks for all relevant types, so manual closing is not required anymore (unless one wishes to close early) - breaking: iterators taking a
depth
argument now treat it differently. A depth of 0 now means only the same level where previously it meant all levels. The previous behavior is available viadepth = -1
. The default behavior has not changed though. - breaking: renames the
shape_raw
anddset_raw
arguments ofcreate_dataset
to simplyshape
anddset
. The purpose of the_raw
suffix is completely unimportant for a user of the library. - improve output of pretty printing of datasets, groups and files
- add tests for iterators and
contains
procedure
- refactor out pretty printing, iterators, some attribute related code into their own files
- move constructors into
datatypes.nim
, as they don’t depend on other things and are often useful in other modules (better separation, less recursive imports) - move a lot of features into
h5util
that may be used commonly between modules - fixes issue with iterator for groups, which could cause to not find any datasets in a group, despite them existing
- fix segmentation fault in
visit_file
for C++ backend
- fix
H5Attributes
return values for[]
template returningAnyKind
- change
[]
,[]=
templates forH5Attribtutes
into procs - fix the high level example to at least make it compile
visit_file
now does not open all groups and datasets anymore. Only recognizes which groups / files actually exist- adds
close
for dataset / groups. Both are now aware if they are open or not - add a string conversion for
H5Attr
- fix accessing a dataset from a group. Now uses the path of the group as the base
- fix error message in
read_hyperslab_vlen
- turn some templates into procs
- make
blosc
an optional import
H5File
as a proc is deprecated and replaced byH5open
!- reading of string attributes now takes care to check if they are variable length or fixed length strings
- import of
blosc
plugin is not automatic anymore, but needs to be done manually by compiling with-d:blosc
- remove a lot old comments and imports from days past…
- change usage of
csize
tocsize_t
in full wrapper / library. For most use cases this did not have any effect (csize
was an int, instead of unsigned). But forH5T_VARIABLE = csize.high
this caused problems, because the value was not the one expected (csize_t.high
) - add support for compound datatypes. Creating a dataset / writing and reading data works for any objects `T` which have fields that can be stored in HDF5 files currently. Objects and tuples are treated the same!
- add support for
seq[string]
attributes - reorder
datasets.nim
and clean up[]
logic - add
[]
accessor from aH5Group
- add
isVlen
helper to check if dataset is variable length - make
special_type
usage optional when reading datasets - fix branching in
nimToH5type
to be fully compile time - add
H5File
to replaceH5FileObj
(latter is kept as deprecated typedef) - variable lenght data is created automatically if user gives
seq[T]
type increate_dataset
read
can automatically read variable length data ifseq[T]
datatype is given- add tests for compound data and
seq[string]
attributes
- change
dtypeAnyKind
definition when creating dataset - improve iteration over subgroups / datasets
- fix mapping of H5 types to Nim types, see PR #36.
- remove dependency of
typetraits
andtypeinfo
modules by introducing customDtypeKind enum