All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Changed: replace unmaintained
iso-639
dependency withiso639-lang
- Fixed: ensure
poetry
can manageaudformat
- Added:
strict
argument toaudformat.utils.hash()
. If set toTrue
, the order of the data, and its level/column names are taken into account when calculating the hash - Changed: store tables per default as parquet files,
by changing the default value of
storage_format
to"parquet"
inaudformat.Table.save()
andaudformat.Database.save()
- Fixed: load csv tables with
pandas.read_csv()
, ifpyarrow.csv.read_csv()
fails
- Added: expand format specifications to allow parquet files as table files
- Added: support for storing tables as parquet files
by adding
"parquet"
(audformat.define.TableStorageFormat.PARQUET
) as an option for thestorage_format
argument ofaudformat.Table.save()
andaudformat.Database.save()
- Added: support for
numpy>=2.0
- Added: mention text files as potential media files in the documentation
- Added: mention in the documentation of
audformat.utils.hash()
that column/level names do not influence its hash value - Added: warn in the documentation of
audformat.utils.hash()
that the hash of a dataframe or series, containing"Int64"
as data type, changes withpandas>=2.2.0
- Fixed: ensure
"boolean"
data type is always used in indices of misc tables that store boolean values
- Fixed:
audformat.Database.get()
, if its argumentadditional_schemes
contains a non-existent scheme
- Added:
as_dataframe
argument toaudformat.utils.read_csv()
- Fixed:
audformat.utils.read_csv()
now treats float/integer values instart
,end
columns as seconds
- Fixed:
audformat.Database.load()
when loading databases with a misc table that has an assigned split
- Changed: depend on
audeer>=2.0.0
- Fixed:
pandas
deprecation warnings
- Added:
audformat.Database.get()
method to retrieve labels based on their schemes and independent of the tables in which they are stored - Added:
aggregate_function
andaggregate_strategy
arguments toaudformat.utils.concat()
to support overlapping values in the objects that should be concatenated - Changed:
audformat.Column.get(map=...)
now returns dtype of labels - Changed:
audformat.Column.get(map=...)
does no longer raise an error if some of the mapped values are not available when stored in a dictionary as scheme labels - Fixed: avoid deprecation warning
by replacing
pkg_resources
internally withimportlib.metadata
- Fixed:
audformat.utils.hash()
forpandas>=2.1.0
- Fixed: remove upper limit of
pandas
dependency
- Fixed: require
pandas<2.1.0
aspandas>=2.1.0
introduced a bug in calculating the hash of an index - Removed: deprecated
root
argument fromaudformat.testing.create_audio_files()
- Fixed: ensure
audformat.utils.to_segmented_index()
andaudformat.Table.get()
withas_segmented=True
uses same precision forend
values asaudformat.segmented_index()
- Added:
audformat.Scheme.labels_as_list
property to list all scheme labels - Added: example to the documentation of
audformat.utils.to_filewise_index()
- Changed: convert dates to UTC timezone
in
audformat.Column.set()
when using a scheme of type"date"
- Fixed: support
pandas>=2.0.0
- Fixed: mention
author
,license
,license_url
,organization
in the specification documentation of the database header - Fixed: missing
Raises
section in the documentation ofaudformat.Database.load()
andaudformat.Database.attachments
- Fixed: when the
root
argument ofaudformat.utils.expand_file_path()
is a relative path it is no longer expanded to an absolute path
- Added:
copy_attachments
argument toaudformat.Database.update()
- Changed: preserve
dtypes
whenaudformat.Table.get()
is called with an index - Changed: speed up
audformat.utils.union()
- Changed: allow to save a database with missing attachments
- Added:
audformat.Attachment
to store any kind of files/folders as part of the database - Added: support for Python 3.10
- Added: support for Python 3.11
- Changed: require
audeer>=1.19.0
- Changed: split API documentation into sub-pages for each function
- Fixed: support
"meta"
as key in meta dictionaries like the one passed asmeta
argument toaudformat.Database
- Fixed: avoid
FutureWarning
when setting values in place for a series inaudformat.Column.set()
- Fixed: improve sketches in the specifications section of the documentation
- Changed:
audformat.Column.set()
now lists values not matching the scheme of the column in the corresponding error message - Fixed:
audformat.Column.set()
checking of values for a scheme with minimum and/or maximum when input values are given asnp.array
and containNaN
orNone
- Fixed:
audformat.Column.set()
checking of values for a scheme with minimum and/or maximum when minimum or maximum is 0
- Added:
audformat.Table.map_files()
- Fixed:
audformat.Database.load()
for databases that contain a scheme with labels stored in a misc table that is using schemes for its columns. Before it could fail if the schemes were not loaded in the correct order - Fixed:
audformat.Table.drop_index()
andaudformat.MiscTable.drop_index()
when the provided index to drop contains entries not present in the index of the table. Before it was extending the table by those entries besides dropping overlapping indices
- Added:
audformat.Scheme.uses_table
to indicate if the scheme uses a misc table to store its labels - Added: usage example to docstring of
audfromat.utils.to_segmented_index()
- Changed: forbid nesting of misc tables as scheme labels
- Fixed: support for
pd.Index
andpd.Series
inaudformat.utils.to_filewise_index()
- Fixed: description of
audformat.Schemes.labels
in API documentation
- Added:
audformat.MiscTable
which can store data not associated with media files - Added: store scheme labels in a misc table
- Added: dictionary
audformat.Database.misc_tables
holding misc tables of a database - Added:
audformat.utils.difference()
for finding index entries that are only part of a single index for a given sequence of indices - Added:
audformat.utils.is_index_alike()
for checking if a sequence of indices has the same number of levels, level names, and matching dtypes - Added:
audformat.define.DataType.OBJECT
- Added:
audformat.utils.set_index_dtypes()
to change dtypes of an index - Added:
audformat.testing.add_misc_table()
- Added:
audformat.Database.__iter__
iterates through all (misc) tables, e.g. a user can dolist(db)
to get a list of all (misc) tables - Changed:
audformat.Database.update()
can now join schemes with different labels - Changed:
audformat.utils.union()
,audformat.utils.intersect()
, andaudformat.utils.concat()
now support any kind of index - Changed:
audformat.utils.intersect()
no longer removes segments from a segmented index that are contained in a filewise index - Changed: require
pandas>=1.4.1
- Changed: use
pandas
dtype"string"
instead of"object"
for storingaudformat
dtype"str"
entries - Changed: use a misc table
to store the
"speaker"
scheme labels in the emodb example in the documentation - Changed:
audformat.utils.join_labels()
raisesValueError
if labels are of different dtype - Fixed: ensure column IDs are different from index level names
- Fixed: make sure
audformat.Column.set()
converts data to dtype of scheme before checking if values are in min-max-range of scheme - Fixed: links to
pandas
API in the documentation - Fixed: include methods
to_dict()
,from_dict()
,dump()
, and attributesdescription
,meta
in the documentation for the classesaudformat.Column
,audformat.Database
,audformat.Media
,audformat.Rater
,audformat.Scheme
,audformat.Split
,audformat.Table
- Fixed: type hint of argument
dtype
in the documentation ofaudformat.Scheme
- Removed: support for Python 3.7
- Added:
audformat.utils.map_country()
- Changed: improve speed of
audformat.Table.drop_files()
for segmented tables
- Added:
audformat.utils.index_has_overlap()
- Added:
audformat.utils.iter_index_by_file()
- Changed: store categories with integers as
int64
instead ofInt64
- Changed: require
audeer>=1.18.0
- Changed: support
pandas>=1.4.0
- Added:
audformat.utils.map_file_path()
- Changed: ensure
audformat.testing.create_database()
uses Unix path separators - Changed: don't allow
\
path entries in a portable database - Changed: mark deprecated
root
argument ofaudformat.testing.create_audio_files()
to be removed in version 1.0.0
- Fixed: conversion of pickle protocol 5 files to pickle protocol 4 in cache
- Fixed: reintroduce sorting the output of
audformat.Database.files
andaudformat.Database.segments
- Fixed: changelog for 0.13.0
- Changed:
audformat.utils.union()
no longer sorts levels - Changed:
audformat.Table.save()
forces pickle format 4 - Changed: clean up test requirements
- Changed: require
pandas < 1.4.0
- Changed: the API documentation on the
language
argument ofaudformat.Database
is more verbose now - Changed: the difference between
audformat.define.DataType.TIME
andaudformat.define.DataType.DATE
is now discussed in the API documentation - Fixed: saving a not loaded table to CSV when a PKL file is present
- Fixed:
pandas
deprecation warnings
- Removed: Python 3.6 support
- Added:
audformat.assert_no_duplicates()
- Changed:
audformat.assert_index()
no longer checks for duplicates
- Added:
audformat.utils.hash()
- Added:
audformat.utils.expand_file_path()
- Added:
audformat.utils.replace_file_extension()
- Changed: use
yaml.CLoader
for faster header reading
- Added:
as_segmented
,allow_nat
,root
,num_workers
arguments toaudformat.Table.get()
- Added:
as_segmented
,allow_nat
,root
,num_workers
arguments toaudformat.Column.get()
- Added:
files_duration
argument toaudformat.utils.to_segmented_index()
- Added:
audformat.Database.files_duration()
- Changed: changed default value of
load_data
argument inaudformat.Database.load()
toFalse
- Changed: speed up
audformat.Database.files
andaudformat.Database.segments
- Fixed: re-add support for
pandas>=1.3
- Added: support for Python 3.9
- Fixed: speed up
audformat.utils.union()
- Fixed:
audformat.Column.set()
withpd.Series
andnp.array
for a scheme with fixed labels and containingNaN
values
- Removed: duration scheme and column from conventions and emodb example
- Added: custom
BadKeyError
when key is not found - Changed: limit to
pandas <1.3
until it works again for newerpandas
versions - Changed: remove the
<1.0.0
limit foraudiofile
as a stable release is available and the API has not changed
- Added:
audformat.utils.duration
- Fixed: description of
audformat.Database.is_portable
in documentation
- Added:
audformat.utils.join_schemes
- Added:
Database.is_portable
- Added:
copy_media
argument toDatabase.update()
- Changed: remove
root
argument fromtesting.create_audio_files()
and instead useDatabase.root
- Fixed:
utils.concat()
converts to nullable dtype - Fixed:
utils.concat()
returnsDataFrame
if input contains at least oneDataFrame
Note: tables stored from this version upwards cannot be loaded with older versions
- Added:
Database.root
- Added:
utils.join_labels()
- Added:
Scheme.replace_labels()
- Changed: set dependency to
pandas>=1.1.5
- Changed: do not compress pickled table files
- Changed:
allow_nat
argument toutils.to_segmented_index()
- Fixed:
audformat.assert_index()
checks for correct dtypes
- Added:
audformat.Database.update()
- Added:
audformat.Table.update()
- Added:
overwrite
argument toaudformat.utils.concat()
- Changed: result of
audformat.Table.__add__()
is no longer assigned to aaudformat.Database
- Added:
audformat.Database.license
- Added:
audformat.Database.license_url
- Added:
audformat.Database.author
- Added:
audformat.Database.organization
- Added:
audformat.utils.intersect()
for index objects - Added:
audformat.utils.union()
for index objects - Changed:
Database.load()
raises error if table file missing - Changed: forbid duplicates in
audformat
conform indices - Fixed:
audformat.Table.__add__()
returned wrong values for some index combinations
- Added:
update_other_formats
argument toaudformat.Table.save()
to make sure existing files in other formats are updated as well - Changed: use
round_trip
argument when loading CSV files to ensure dataframes are equal after storing and loading again
- Fixed: implemented
audformat.Database.__eq__
and returnTrue
for identical databases
- Changed: use nullable Pandas' type
"boolean"
forbool
schemes - Fixed:
Scheme.draw()
generates boolean values if scheme isbool
- Changed: add arguments
num_workers
andverbose
toaudformat.Database.load()
- Fixed: avoid sphinx syntax in CHANGELOG
- Changed: add arguments
num_workers
andverbose
toaudformat.Database.drop_files()
,audformat.Database.map_files()
,audformat.Database.pick_files()
,audformat.Database.save()
- Changed:
audformat.segmented_index()
supportint
andfloat
, which will be interpreted as seconds - Fixed:
audformat.utils.to_segmented_index()
returns correct index type forNaT
- Fixed: add column name to HTML Series output in docs
- Fixed: removed mentioning of
NotConformToUnifiedFormat
error andRedundantArgumentError
error - Fixed: add missing errors to docstring
of
audformat.Table.set()
andaudformat.Column.set()
- Added: initial release public release