Changes go here
Bug fix, released on 3 January 2023.
Bug:
Table.collapse(..., one_to_many=True)
had a lingering dense conversion being performed. Avoiding this conversion yields nearly a 100x performance gain PR #888.
Substantial performance improvement and bug fix, released on 9 December 2022.
Important:
- Python 3.6 testing support has been removed.
New features:
Table.collapse(..., one_to_many=True)
now uses a sparse matrix on construction, substantially reducing memory overhead PR #884.
Bug fixes:
Table.metadata_to_dataframe()
now considers all rows for column names, see PR #881
General maintenance and feature expansion, released on 25 March 2022.
Bug fixes:
Table.from_json
now respects the creation date issue #770 in Python 3.7 and higher
New Features
- Python 3.10 support, see PR #865
- 10x improvement on
Table.collapse
when operating over many partitions, see PR #866 - Minor performance improvement, see PR #871
- Coerce consistent, and fixed width, types for IDs, see PR #872
- Table / metadata alignment support, see PR #859
Bug fix, released on 16 November 2020.
Bug fixes:
- During deployment testing for QIIME 2 2020.11, it was observed that certain combinations of hdf5 or h5py dependencies can result in metadata strings parsing as ASCII rather than UTF-8. Parse of BIOM-Format 2.1.0 files now normalize metadata strings as UTF-8, see PR #853.
New Features
- Added support for aligning dataframes and trees against biom tables with
Table.align_to_dataframe
andTable.align_tree
. see PR #859
New features and support for Pandas >= 1.0, released on 5 November 2020.
Important:
- Cython and numpy are no more required to be present before building, see PR #840
New Features:
-
Added support for the AnnData format, see PR #845
-
Performance boost to
Table.remove_empty
. For large tables this cuts the running time from 20 seconds to ~1.1 seconds, see PR #847 -
A much faster way to merge tables (without metadata) has been added. For large tables, this was a few minutes rather than a few hours. This method is implicitly invoked when calling
Table.merge
if unioning both axes, and the tables lack metadata.Table.concat
is still much faster, but assumes one axis is disjoint. See PR #848. -
Simplify interaction with the concatenation method, allowing for passing in an individual table and support for a general
biom.concat(tables)
wrapper. See PR #851. -
Added support for parsing adjacency table structures, see issue #823.
Bug fixes:
- Support for pandas >= 1.0, see the comment and commits here
New features and bug fixes, released on 28 January 2020.
Important:
- Python 2.7 and 3.5 support has been dropped.
- Python 3.8 support has been added into Travis CI.
- A change to the defaults for
Table.nonzero_counts
was performed such that the default now is to count the number of nonzero features. See issue #685 - We now require a SciPy >= 1.3.1. See issue #816
New Features:
- The detailed report is no longer part of the table validator. See issue #378.
load_table
now accepts open file handles. See issue #481.biom export-metadata
has been added to export metadata as TSV. See issue #820.Table.to_tsv
has been modified to allow fordirect_io
. See issue #836.
Bug fixes:
Table.to_dataframe(dense=False)
does now correctly produce sparse data frames (and not accidentally dense ones as before). See issue #808.- Order of error evaluations was unstable in Python versions without implicit
OrderedDict
. See issue #813. Thanks @gwarmstrong for identifying this bug. Table._extract_data_from_tsv
would fail if taxonomy was provided, and if the first row had the empty string for taxonomy. See issue #827. Thanks @KasperSkytte for identifying this bug.
New features and bug fixes, released on 28 September 2018.
Important:
- Python 3.4 support has been dropped. We now only support Python 2.7, 3.5, 3.6 and 3.7.
- We will be dropping Python 2.7 support on the next release.
- Pandas >= 0.20.0 is now the minimum required version.
- pytest is now used instead of nose.
New Features:
- Massive performance boost to
Table.collapse
with the default collapse function. The difference was 10s of milliseconds vs. minutes stemming from prior use ofoperator.add
. See issue #761. Table.align_to
for aligning one table to another. This is useful in multi-omic analyses where multiple preparations have been performed on the sample physical samples. This is essentially a helper method aroundTable.sort_order
. See issue #747.- Added additional sanity checks when calling
Table.to_hdf5
, see PR #769. Table.subsample()
can optionally perform subsampling with replacement. See issue #774.Table.to_dataframe()
now supports adense
argument to returnpd.DataFrame
. See issue #762.- Parsing methods for BIOM-Format 1.0.0 tables now preserve dict ordering. See issue #781.
Bug fixes:
Table.subsample(by_id=True, axis='observation')
did not subsample over the 'observations'. Because of the nature of the bug, an empty table was returned, so the scope of the issue is such that it should not have produced misleading results but instead triggered empty table errors, with the exception of the pathological case of the ID namespaces between features and samples not being disjoint. See PR #759 for more information.- Tables of shape
(0, n)
or(n, 0)
were raising exceptions when being written out. See issue #619. - Tables loaded with a
list
of emptydict
s will have their metadata attributes set to None. See issue #594.
New features and bug fixes, released on 27 April 2017.
New Features:
Table.from_hdf5
now supports a rapid subset in the event that metadata is not needed. In benchmarking against the Earth Microbiome Project BIOM table, the reduction in runtime was multiple orders of magnitude while additionally preserving substantial memory.Table.rankdata
has been added to convert values to ranked abundances on either axis. See issue #645.- Format of numbers in
biom summarize-table
output is now more readable and localized. See issue #679. Table.concat
has been added to the API and allows for concatenating multiple tables in which the IDs of one of the axes are known to be disjoint. This has substantial performance benefits overTable.merge
.Table.sort_order
was performing an implicit cast to dense, and not leveraging fancy indexing. A substantial performance gain was acheived. See PR #720.biom subset-table
now accepts a QIIME-like mapping file when subsetting by IDs Issue #587Table.del_metadata
was added to support the removal of metadata entries from the table Issue #708.Table.to_dataframe
was added to cast the internal matrix data to a PandasSparseDataFrame
Issue #622.Table.metadata_to_dataframe
was added to cast axis metadata to a PandasDataFrame
Issue #622.test_table.py
andtest_util.py
now use a stable random seed. See issue #728- Failure to cast a value when parsing a TSV will now print the associated line number which had the bad value. See #284.
Table.remove_empty
has been added to remove zero'd samples, observations or both. See #721.- A subcommand of the command line interface was added to obtain a table's IDs:
table-ids
.
Bug fixes:
-o
is now a required parameter ofbiom from-uc
. This was not the case previously, which resulted in a cryptic error message if-o
was not provided. See issue #683.- Matrices are now cast to csr on
Table
construction if the data evaluate asisspmatrix
. This fixes #717 where some API methods assumed the data were csc or csr. Table.concat
was not handling tables without metadata, resulting in an exception due to mismatches metadata shape. See #724.- When validating a BIOM-Format 1.0.0 table, specifying the version string would trigger an error. See #664. An explicit regression test was not added as this stemmed from an integration, and there currently is not support for script usage tests; see #656.
Table.nnz
was not callingeliminate_zeros()
on the underlying sparse matrix, resulting in the possibility of counting explicitly set zero values. See #727.Table.from_hdf5
was not properly turningbytes
intostr
for thetable_id
and thetype
HDF5 attributes. See #731.Table.__init__
now always performs anastype(float)
on the containedspmatrix
. This type normalization is beneficial for underlying Cython code on the filtering and transform operations. It is possible this will introduce some performance overhead, however in most cases the data should already be float. See #718.Table.to_hdf5
was not handling lists of str appropriately in the general case. Ssee #638.Table.to_hdf5
was not handling taxonomy as flat strings, which was a common mistake that was outside of expectations for the formatter. The formatter now attempts to split on semicolon if this scenario is encountered, and errors with a more informative error if a problem occurs. See #530.
New features and bug fixes, released on 21 October 2015.
Changes:
- Codebase is now Python 2/3 compatible. It is currently tested with Python versions 2.7, 3.4 and 3.5.
biom-serve
and the accompanying html interface has been removed.
New Features:
Table.head
has been added to retrieve the first few rows and or columns from a table. This can be accessed through the newbiom head
command. See issue #639.biom.parse.from_uc
has been added to support creation ofbiom.Table
objects from vsearch/uclust/usearch.uc
files. This can be accessed through the newbiom from-uc
command. See issue #648.- Codebase now uses click instead of pyqi for its command line interface. See issue #631.
Bug fixes:
Table.update_ids
strict check was too aggressive. See issue #633.biom --version
now prints the software version (previously the individual commands did this, but not the base command).Table.vlen_list_of_str_formatter
was considering astr
to be valid for formatting resulting in an obscure error when astr
, as opposed to alist
ofstr
, was used for taxonomy. See issue #709.
Bug fixes, released on April 22nd 2015
Changes:
- Codebase updated to reflect pep8 1.6.x
New features:
Table.to_hdf5
andTable.from_hdf5
now support custom parsers and formatters, see issue #608
Bug fixes:
Table.update_ids
was not updating the internal ID lookup caches, issue #599--is-json
has been removed from the table validator as it was being ignored anyway, issue #591biom summarize-table
can now properly interact with pipes. This previously worked on OSX but did not on Linux. Issue #598biom convert
was recording the wrong version information from HDF5 -> JSON, issue #595Table.collapse
, underone_to_many
was not constructing the resulting matrix properly, issue #606- Improve error message when trying to load an empty file, issue #614.
- Improve error handling when filtering tables, and return tables of shape
(0, n)
instead of(0, 0)
when fully filtering out a table along an axis, issue #620 - Fix
Table.nonzero
to work on data that is not already in csr, issue #625.
Minor fixes, released on January 29, 2014
Bug fixes:
- Improve error message when trying to load an HDF5 file without h5py being installed.
- Allow validating json files when h5py is not installed.
Minor fixes, released on December 18, 2014
Bug fixes:
- Remove syntax error from
normalize_table.py
. Table.to_json
was not serializing empty tables properly, see #571biom summarize-table
could not handle empty tables, see #571
Minor fixes and performance improvements, released on November 19th 2014
Changes:
- The collapsing function to
Table.collapse
is now passed the entire table to allow for more complex collapses (e.g., median, random selection, etc). See #544, #545 and #547. - Updated version strings in the project to be Semantic Versioning-stlye. This better matches with other open source python projects, and plays nicer with pip.
- Conversion from TSV now takes less memory. See #551.
- Parameter header_mark has been removed from _extract_data_from_tsv() in table.py
- Order of magnitude improvement in parsing HDF5 BIOM tables, see #529
- Added
Table.length
, see #548 - Order of magnitude performance increase in
Table.nonzero
, see #538
Bug fixes:
- Ensure that a copy is performed in
Table.subsample
- Avoided a memory leak when checking if a table is JSON or TSV, see #552.
Format finalization, released on August 7th 2014
New features:
- Group metadata (e.g., a phylogenetic tree) can now be stored within the HDF5
representation. These data are available within the
Table
object - Matrix data can now be accessed by the
Table.matrix_data
property Table
IDs are now accessed via theTable.ids
methodTable
metadata are now accessed via theTable.metadata
method- New method
Table.update_ids
, which allows for updating the ids along either axis. - added
normalize-table
option to optparse and HTML interfaces which utilizes the new TableNormalizer command fromtable_normalizer.py
Changes:
- Metadata are now stored in individual datasets within HDF5. This resulted in a change to the BIOM-Format spec which has now been bumped to format version 2.1.
Table.collapse
min_group_size
is now 1 by default, see #480- General improvements to BIOM 2.x online documentation
Table.pa
now supports negative values- dropped old, unused scripts
- added
Table.iter_pairwise
- added
Table.min
andTable.max
, see #459 - iter methods now support dense/sparse
- added
Table.matrix_data
property Table.filter
yields a sparse vector, see #470Table.subsample
can now sample by IDs (e.g., get a random subset of samples or observations from aTable
).biom.util.generate_subsamples
will generate an infinite number of subsamples and can be used for rarefaction.biom summarize-table
can now operate on observations.- 10% performance boost in
Table.subsample
, see #532
Bug fixes:
Table.transform
operates on full vectors now, see #476biom convert
now handles taxonomy strings correctly, see #504Table.sort_order
was not retainingTable.type
, see #474convert_biom_to_table
now usesload_table
, see #478Table.pa
now handles negative values, see #492Table.copy
was not retainingTable.type
, see #494
Bug fix release, released on June 3rd 2014
Changes:
- Light weight loading mechanism (
biom.load_table
) added Table.data
now has a default axis- Convert documentation updated
- Quick start page added to documentation
Bug fixes:
- missing fields from JSON representation reintroduced
TableConverter
works as expected
Major release, released on May 15th 2014
Changes:
- NumPy 1.7 or above is required
- Support for HDF5
- Codebase is PEP-8 compliant
- CSMat has been removed and Scipy is now a required dependency
- Requires pyqi 0.3.2
- New HTML interface
- No longer dependent on dateutil
Table.bin_samples_by_metadata
andTable.bin_observations_by_metadata
have been combined intoTable.partition
, which takes an axis argumentTable.collapse_samples_by_metadata
andTable.collapse_observations_by_metadata
have been combined intoTable.collapse
, which now takes an axis argumentTable.filter_samples
andTable.filter_observations
have been combined intoTable.filter
, which now takes an axis argumentTable.transform_samples
andTable.transform_observations
have been combined intoTable.transform
, which now takes an axis argumentTable.norm_sample_by_observation
andTable.norm_observation_by_sample
have been combined intoTable.norm
, which now takes an axis argumentTable.iter_samples
andTable.iter_observations
have been combined intoTable.iter
, which now takes an axis argumentTable.iter_sample_data
andTable.iter_observation_data
have been combined intoTable.iter_data
, which now takes an axis argumentTable.get_sample_index
andTable.get_observation_index
have been combined intoTable.get_index
, which now takes an axis argumentTable.add_sample_metadata
andTable.add_observation_metadata
have been combined intoTable.add_metadata
, which now takes an axis argumentTable.sample_data
andTable.observation_data
have been combined intoTable.data
, which now takes an axis argumentTable.sample_exists
andTable.observation_exists
have been combined intoTable.exists
, which now takes an axis argumentTable.sort_by_sample_ids
andTable.sort_by_observation_ids
have been combined intoTable.sort
, which now takes an axis argumentTable.sort_sample_order
andTable.sort_observation_order
have been combined intoTable.sort_order
, which now takes an axis argumentTable.norm_samples_by_metadata
andTable.norm_observations_by_metadata
have been removed- Added
Table.metadata
to allow fetching of metadata by an ID instead of just by index - Added
Table.pa
for conversion to presence/absence - Added
Table.subsample
for randomly subsampling data Table
now embraces numpydoc
Documentation release, released on December 4th 2013
New Features:
- biom-format is now installable via pip! Simply run
pip install biom-format
.
Changes:
- Fixed installation instructions to be clearer about the various ways of installing biom-format. Also fixed a couple of minor formatting issues.
Feature release, released on December 4th 2013
New Features:
- Added new sparse matrix backend
ScipySparseMat
, which requires that scipy is installed if this backend is in use. This backend will generally yield improvements in both runtime and memory consumption, especially with larger sparse tables. The default sparse matrix backend is stillCSMat
(this means that scipy is an optional dependency of the biom-format project).
Changes:
- Sparse backends
SparseDict
andSparseMat
have been removed in favor ofCSMat
. Cython is no longer a dependency. - The BIOM Format project license is now Modified BSD (see COPYING.txt for more details) and is no longer GPL. To change the license, we obtained written permission (by email) from all past and present developers on the biom-format project. The core developers, including @gregcaporaso, @wasade, @jrrideout, and @rob-knight were included on these emails. For code that was derived from the QIIME and PyCogent projects, which are under the GPL license, written permission was obtained (by email) from the developers of the original code (tracing through the commit history, as necessary). @gregcaporaso, @wasade, @jrrideout, and @rob-knight were included on these emails.
- Removed the top-level
python-code
directory, moving all contents up one level. If you are installing the biom-format project by manually settingPYTHONPATH
to<dir prefix>/biom-format/python-code
, you will need to change the path to<dir prefix>/biom-format
instead. Please see the installation instructions for more details. - Reorganized sparse backend code into a new subpackage,
biom.backends
. This change should not affect client code.
New Features:
Table.collapseObservationsByMetadata
andTable.collapseSamplesByMetadata
now have an additional argument,include_collapsed_metadata
, which allows the user to either include or exclude collapsed metadata in the collapsed table.Table.collapseObservationsByMetadata
andTable.collapseSamplesByMetadata
now have an additional argument,one_to_many_mode
, which allows the user to specify a collapsing strategy for one-to-many metadata relationships (currently supports adding and dividing counts).Table.binObservationsByMetadata
,Table.binSamplesByMetadata
,Table.collapseObservationsByMetadata
, andTable.collapseSamplesByMetadata
now have an additional argument,constructor
, which allows the user to choose the return type of the binned/collapsed table(s).Table.delimitedSelf
now has an additional argument,observation_column_name
, which allows the user to specify the name of the first column in the output table (e.g. 'OTU ID', 'Taxon', etc.).- Added new
Table.transpose
method. Table.__init
has change from__init__(self, data, sample_ids, observation_ids, sample_metadata=None, observation_metadata=None, table_id=None, type=None, **kwargs)
to__init__(self, data, observation_ids, sample_ids, observation_metadata=None, sample_metadata=None, table_id=None, type=None, **kwargs)
This is for clarity, the data is in the same order as the arguments to the constructor. *table_factory
has changed fromtable_factory(data, sample_ids, observation_ids, sample_metadata=None, observation_metadata=None, table_id=None, input_is_dense=False, transpose=False, **kwargs)
totable_factory(data, observation_ids, sample_ids, observation_metadata=None, sample_metadata=None, table_id=None, input_is_dense=False, transpose=False, **kwargs)
This is for clarity, the data is in the same order as the arguments to the function.
Changes:
- pyqi 0.2.0 is now a required dependency. This changes the look-and-feel of the biom-format command-line interfaces and introduces a new executable,
biom
, which can be used to see a list of all available biom-format command-line commands. Thebiom
command is now used to run biom-format commands, instead of having a Python script (i.e., .py file) for each biom-format command. The old scripts (e.g., add_metadata.py, convert_biom.py, etc.) are still included but are deprecated. Users are pointed to the newbiom
command to run instead. Bash tab completion is now supported for all command and option names (see the biom-format documentation for instructions on how to enable this). - The following scripts have had their names and options changed:
add_metadata.py
is nowbiom add-metadata
. Changed option names:--input_fp
is now--input-fp
--output_fp
is now--output-fp
--sample_mapping_fp
is now--sample-metadata-fp
--observation_mapping_fp
is now--observation-metadata-fp
--sc_separated
is now--sc-separated
--int_fields
is now--int-fields
--float_fields
is now--float-fields
--sample_header
is now--sample-header
--observation_header
is now--observation-header
- New option
--sc-pipe-separated
biom_validator.py
is nowbiom validate-table
. Changed option names:-v
/--verbose
is now--detailed-report
--biom_fp
is now--input-fp
convert_biom.py
is nowbiom convert
. Changed option names:--input_fp
is now--input-fp
--output_fp
is now--output-fp
--biom_type
is now--matrix-type
--biom_to_classic_table
is now--biom-to-classic-table
--sparse_biom_to_dense_biom
is now--sparse-biom-to-dense-biom
--dense_biom_to_sparse_biom
is now--dense-biom-to-sparse-biom
--sample_mapping_fp
is now--sample-metadata-fp
--observation_mapping_fp
is now--observation-metadata-fp
--header_key
is now--header-key
--output_metadata_id
is now--output-metadata-id
--process_obs_metadata
is now--process-obs-metadata
--biom_table_type
is now--table-type
print_biom_python_config.py
is nowbiom show-install-info
.print_biom_table_summary.py
is nowbiom summarize-table
. Changed option names:--input_fp
is now--input-fp
--output_fp
is now--output-fp
. This is now a required option (output is no longer printed to stdout).--num_observations
is now--qualitative
--suppress_md5
is now--suppress-md5
subset_biom.py
is nowbiom subset-table
. Changed option names:--biom_fp
is now--input-fp
--output_fp
is now--output-fp
--ids_fp
is now--ids
biom.parse.parse_mapping
has been replaced bybiom.parse.MetadataMap
.biom.parse.MetadataMap.from_file
can be directly substituted in place ofbiom.parse.parse_mapping
.
Bug Fixes:
- Fixed performance issue with formatting BIOM tables for writing to a file.
- Fixed issue with
Table.addSampleMetadata
andTable.addObservationMetadata
when adding metadata to a subset of the samples/observations in a table that previously was without any sample/observation metadata. - Fixed issue with
Table.addSampleMetadata
andTable.addObservationMetadata
when updating a table's existing metadata, including the case where there are sample/observation IDs that are in the metadata file but not in the table.
New Features:
-
Table.collapseObservationsByMetadata
andTable.collapseSamplesByMetadata
now support one-to-many relationships on the metadata field to collapse on. -
added new script called
print_biom_table_summary.py
(and accompanying tutorial) that prints summary statistics of the input BIOM table as a whole and on a per-sample basis
Changes:
SparseMat
now uses cython for loops more efficiently
Bug Fixes:
- fixed serious performance issue with
Table.transformSamples/Observations
when usingCSMat
as the sparse backend
Changes:
- added documentation for how to switch sparse backends via BIOM config file
Bug Fixes:
- performance issue on table creation with
CSMat
where anO(N)
lookup was being performed
New Features:
- new default sparse matrix backend
CSMat
(COO/CSR/CSC) more efficient thanSparseDict
andSparseMat
(pure python + numpy) - support for biom config file, which allows specification of sparse backend to use. Currently supports
CSMat
(default),SparseMat
, andSparseDict
. Default can be found undersupport_files/biom_config
, and can be copied to$HOME/.biom_config
or located by setting$BIOM_CONFIG_FP
- new script called
add_metadata.py
with accompanying tutorial that allows users to add arbitrary sample and/or observation metadata to biom files - new script called
subset_biom.py
that efficiently pulls out a subset of a biom table (either by samples or observations). Useful for very large tables where memory may be an issue
Changes:
- parser is more efficient for sparse tables and formatter is more efficient for both table types (less memory consumption)
biom.Table
objects are now immutable (except that metadata can still be added viaaddSampleMetadata
/addObservationMetadata
).__setitem__
andsetValueByIds
have been removed andSampleIds
,ObservationIds
,SampleMetadata
, andObservationMetadata
members are now tuples as a resultbiom.Table
object has a new method calledgetTableDensity()
- performance testing framework has been added for
Table
objects
Bug Fixes:
convert_biom.py
now converts dense tables to sparse tables (previously it didn't do anything)- many misc. fixes to script help/documentation and docstrings (fixing typos, editing for clarity, etc.)
New Features:
- new default sparse matrix backend
SparseMat
(requires Cython) more efficient over existingSparseDict
backend
- format now accepts unicode but does not accept str due to JSON parsing from Python
- specification for metadata is now either
null
or an object - PySparse has been gutted, sparse matrix support is now through
Table.SparseDict
New Features:
- more table types!
Changes:
Table.getBioFormatJsonString()
and similar methods now require ageneratedby
string