Releases: tskit-dev/tskit
C API C_1.1.1
Bug fixes
- Fix segfault in tsk_variant_restricted_copy in tree sequences with large
numbers of alleles or very long alleles
(@jeromekelleher, #2437, #2429).
Python 0.5.1
Fixes
- Copies of a
Variant
object would cause a segfault when.samples
was accessed.
(@benjeffery, #2400, #2401)
Changes
-
Tables in a table collection can be replaced using the replace_with method
(@hyanwong, #1489 #2389) -
SVG drawing routines now return a special string object that is automatically
rendered in a Jupyter notebook (@hyanwong, #2377)
Features
C API 1.1.0
Features
-
Add
num_children
totsk_tree_t
an array which contains counts of the number of child
nodes of each node in the tree. (@GertjanBisschop, #2274, #2316) -
Add
edge
totsk_tree_t
an array which contains theedge_id
of the edge encoding
the relationship between the child node and its parent for each (child) node in the tree.
(@GertjanBisschop, #2304, #2340)
Changes
-
Reduce the maximum number of rows in a table by 1. This removes edge cases so that a
tsk_id_t
can be
used to count the number of rows. (@benjeffery, #2336, #2337) -
Samples are now copied by
tsk_variant_restricted_copy
. (@benjeffery, #2400, #2401)
Python 0.5.0
Major Feature Release
Breaking Changes
-
The JSON metadata codec now interprets the empty string as an empty object. This means
that applying a schema to an existing table will no longer necessitate modifying the
existing rows. (@benjeffery, #2064, #2104) -
Remove the previously deprecated
as_bytes
argument toTreeSequence.variants
.
If you need genotypes in byte form this can be done following the code in the
to_macs
method on line5573
oftrees.py
.
This argument was initially deprecated more than 3 years ago when the code was part of
msprime
.
(@benjeffery, #605, #2172) -
Arguments after
ploidy
inwrite_vcf
marked as keyword only
(@jeromekelleher, #2329, #2315). -
When metadata equal to
b''
is printed to text or HTML tables it will render as
an empty string rather than"b''"
. (@hyanwong, #2349, #2351)
Changes
-
A
min_time
parameter indraw_svg
enables the youngest node as the y axis min
value, allowing negative times.
(@hyanwong, #2197, #2215) -
VcfWriter.write
now prints the site ID of variants in the ID field of the
output VCF files.
(@roohy, #2103, #2107) -
Make dumping of tables and tree sequences to disk a zero-copy operation.
(@benjeffery, #2111, #2124) -
Add
copy
argument toTreeSequence.variants
which if False reuses the
returnedVariant
object for improved performance. Defaults to True.
(@benjeffery, #605, #2172) -
tree.mrca
now takes 2 or more arguments and gives the common ancestor of them all.
(@savitakartik, #1340, #2121) -
Add a
edge
attribute to theMutation
class that gives the ID of the
edge that the mutation falls on.
(@jeromekelleher, #685, #2279). -
Add the
TreeSequence.split_edges
operation which inserts nodes into
edges at a specific time.
(@jeromekelleher, #2276, #2296). -
Add the
TreeSequence.decapitate
(and closely related
TableCollection.delete_older
) operation to remove topology and mutations
older than a give time.
(@jeromekelleher, #2236, #2302, #2331). -
Add the
TreeSequence.individuals_time
andTreeSequence.individuals_population
methods to return arrays of per-individual times and populations, respectively.
(@petrelharp, #1481, #2298). -
Add the
sample_mask
andsite_mask
towrite_vcf
to allow parts
of an output VCF to be omitted or marked as missing data. Also add the
as_vcf
convenience function, to return VCF as a string.
(@jeromekelleher, #2300). -
Add support for missing data to
write_vcf
, and add theisolated_as_missing
argument. (@jeromekelleher, #2329, #447). -
Add
Tree.num_children_array
andTree.num_children
. Returns the counts of
the number of child nodes for each or a single node in the tree respectively.
(@GertjanBisschop, #2318, #2319, #2332) -
Add
Tree.path_length
.
(@jeremyguez, #2249, #2259). -
Add B1 tree balance index.
(@jeremyguez, @jeromekelleher, #2251, #2281, #2346). -
Add B2 tree balance index.
(@jeremyguez, @jeromekelleher, #2252, #2353, #2354). -
Add Sackin tree imbalance index.
(@jeremyguez, @jeromekelleher, #2246, #2258). -
Add Colless tree imbalance index.
(@jeremyguez, @jeromekelleher, #2250, #2266, #2344). -
Add
direction
argument toTreeSequence.edge_diffs
, allowing iteration
over diffs in the reverse direction. NOTE: this comes with a ~10% performance
regression as the implementation was moved from C to Python for simplicity
and maintainability. Please open an issue if this affects your application.
(@jeromekelleher, @benjeffery, #2120). -
Add
Tree.edge_array
andTree.edge
. Returns the edge id of the edge encoding
the relationship of each node with its parent.
(@GertjanBisschop, #2361, #2357)
C API 1.0.0
This major release marks the point at which the documented API becomes stable and supported.
Breaking changes
-
Change the type of genotypes to
int32_t
, removing theTSK_16_BIT_GENOTYPES
flag option.
(@benjeffery, #463, #2108) -
tsk_variant_t
now includes itstsk_site_t
rather than pointing to it.
(@benjeffery, #2161, #2162) -
Rename
TSK_TAKE_TABLES
toTSK_TAKE_OWNERSHIP
.
(@benjeffery, #2221, #2222) -
TSK_DEBUG
,TSK_NO_INIT
,TSK_NO_CHECK_INTEGRITY
andTSK_TAKE_OWNERSHIP
have moved tocore.h
(@benjeffery, #2218, #2230)) -
Rename several flags:
- All flags to
simplify
for exampleTSK_KEEP_INPUT_ROOTS
becomesTSK_SIMPLIFY_KEEP_INPUT_ROOTS
. - All flags to
subset
for exampleTSK_KEEP_UNREFERENCED
becomesTSK_SUBSET_KEEP_UNREFERENCED
. TSK_BUILD_INDEXES
->TSK_TS_INIT_BUILD_INDEXES
TSK_NO_METADATA
->TSK_TABLE_NO_METADATA
TSK_NO_EDGE_METADATA
->TSK_TC_NO_EDGE_METADATA
(@benjeffery, #1720, #2226, #2229, #2224)
- All flags to
-
Remove the generic
TSK_ERR_OUT_OF_BOUNDS
- replacing with specific errors.
RemoveTSK_ERR_NON_SINGLE_CHAR_MUTATION
which was unused.
(@benjeffery, #2260) -
Reorder stats API methods to place
result
as the last argument. (@benjeffery, #2292, #2285)
Features
-
Make dumping of tables and tree sequences to disk a zero-copy operation.
(@benjeffery, #2111, #2124) -
Add
edge
attribute tomutation_t
struct and make available in tree sequence.
(@jeromekelleher, #685, #2279) -
Reduce peak memory usage in
tsk_treeseq_simplify
.
(@jeromekelleher, #2287, #2288)
Python 0.4.1
Bugfix release
Changes
TableCollection.name_map
has been deprecated in favour oftable_name_map
.
(@benjeffery, #1981, #2086)
Fixes
-
TreeSequence.dump_text
now prints decoded metadata if there is a schema.
(@benjeffery, #1860, #1527) -
Add missing
ReferenceSequence.__eq__
method.
(@benjeffery, #2063, #2085)
Python 0.4.0
Major Python release
Breaking changes
-
The
Tree.num_nodes
method is now deprecated with a warning, because it confusingly
returns the number of nodes in the entire tree sequence, rather than in the tree. Text
summaries of trees (e.g.str(tree)
) now return the number of nodes in the tree,
not in the entire tree sequence (@hyanwong, #1966 #1968) -
The CLI
info
command now gives more detailed information on the tree sequence
(@benjeffery, #1611) -
64 bits are now used to store the sizes of ragged table columns such as metadata,
allowing them to hold more data. This change is fully backwards and forwards compatible
for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with
large offset arrays that require 64 bits will fail to load in previous versions with
error_tskit.FileFormatError: An incompatible type for a column was found in the file
.
(@jeromekelleher, #343, #1527, #1528, #1530,
#1554, #1573, #1589,#1598,#1628, #1571,
#1579, #1585, #1590, #1602, #1618, #1620, #1652). -
The Tree class now conceptually has an extra node, the "virtual root" whose
children are the roots of the tree. The quintuply linked tree arrays
(parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array)
all have one extra element.
(@jeromekelleher, #1691, #1704). -
Tree traversal orders returned by the
nodes
method have changed when there
are multiple roots. Previously orders were defined locally for each root, but
are now globally across all roots. (@jeromekelleher, #1704). -
Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
TableCollection.sort
no longer sorts individuals.
(@benjeffery, #1774, #1789) -
Metadata encoding errors now raise
MetadataEncodingError
(@benjeffery, #1505, #1827). -
For
TreeSequence.samples
all arguments afterpopulation
are now keyword only
(@benjeffery, #1715, #1831). -
Remove the method
TreeSequence.to_nexus
and replace withTreeSequence.as_nexus
.
As the old method was not generating standards-compliant output, it seems unlikely
that it was used by anyone. Calls toto_nexus
will result in a
NotImplementedError, informing users of the change. See below for details on
as_nexus
. -
Change default value for
missing_data_char
in theTreeSequence.haplotypes
method from "-" to "N". This is a more idiomatic usage to indicate
missing data rather than a gap in an alignment. (@jeromekelleher,
#1893, #1894)
Features
-
Add the
ibd_segments
method and associated classes to compute, summarise
and store segments of identity by descent from a tree sequence
(@gtsambos, @jeromekelleher). -
Allow skipping of site and mutation tables in
TableCollection.sort
(@benjeffery, #1475, #1826). -
Add
TableCollection.sort_individuals
to sort the individuals as this is no longer done by the
default sort (@benjeffery, #1774, #1789). -
Add
__setitem__
to all tables allowing single rows to be updated. For example
tables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE)
(@jeromekelleher, @benjeffery, #1545, #1600). -
Added a new parameter
time
toTreeSequence.samples()
allowing to select
samples at a specific time point or time interval.
(@mufernando, @petrelharp, #1692, #1700) -
Add
table.metadata_vector
to all table classes to allow easy extraction of a single
metadata key into an array
(@petrelharp, #1676, #1690). -
Add
time_units
toTreeSequence
to describe the units of the time dimension of the
tree sequence. This is then used to generate an error iftime_units
isuncalibrated
when
using the branch lengths in statistics. (@benjeffery, #1644, #1760, #1832) -
Add the
virtual_root
property to the Tree class (@jeromekelleher, #1704). -
Add the
num_edges
property to the Tree class (@jeromekelleher, #1704). -
Improved performance for tree traversal methods in the
nodes
iterator.
Roughly a 10X performance increase for "preorder", "postorder", "timeasc"
and "timedesc" (@jeromekelleher, #1704). -
Substantial performance improvement for
Tree.total_branch_length
(@jeromekelleher, #1794 #1799) -
Add the
discrete_genome
property to the TreeSequence class which is true if
all coordinates are discrete (@jeromekelleher, #1144, #1819) -
Add a
random_nucleotides
function. (user:jeromekelleher
, #1825) -
Add the
TreeSequence.alignments
method. (user:jeromekelleher
, #1825) -
Add alignment export in the FASTA and nexus formats using the
TreeSequence.write_nexus
andTreeSequence.write_fasta
methods.
(@jeromekelleher, @hyanwong, #1894) -
Add the
discrete_time
property to the TreeSequence class which is true if
all time coordinates are discrete or unknown (@benjeffery, #1839, #1890) -
Add the
skip_tables
option toload
to support only loading
top-level information from a file. Also add theignore_tables
option to
TableCollection.equals
andTableCollection.assert_equals
to
compare only top-level information. (@clwgg, #1882, #1854). -
Add the
skip_reference_sequence
option toload
. Also add the
ignore_reference_sequence
optionequals
to compare two table
collections without comparing their reference sequence. (@clwgg,
#2019, #1971). -
tskit now supports python 3.10 (@benjeffery, #1895, #1949)
Fixes
-
dump_tables
omitted individual parents. (@benjeffery, #1828, #1884) -
Add the
Tree.as_newick
method and deprecate ...
Python 0.4.0 BETA 1
BETA RELEASE
- Install with
pip install --pre tskit
- Please report any issues.
Breaking changes
-
The
Tree.num_nodes
method is now deprecated with a warning, because it confusingly
returns the number of nodes in the entire tree sequence, rather than in the tree. Text
summaries of trees (e.g.str(tree)
) now return the number of nodes in the tree,
not in the entire tree sequence (@hyanwong, #1966 #1968) -
The CLI
info
command now gives more detailed information on the tree sequence
(@benjeffery, #1611) -
64 bits are now used to store the sizes of ragged table columns such as metadata,
allowing them to hold more data. This change is fully backwards and forwards compatible
for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with
large offset arrays that require 64 bits will fail to load in previous versions with
error_tskit.FileFormatError: An incompatible type for a column was found in the file
.
(@jeromekelleher, #343, #1527, #1528, #1530,
#1554, #1573, #1589,#1598,#1628, #1571,
#1579, #1585, #1590, #1602, #1618, #1620, #1652). -
The Tree class now conceptually has an extra node, the "virtual root" whose
children are the roots of the tree. The quintuply linked tree arrays
(parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array)
all have one extra element.
(@jeromekelleher, #1691, #1704). -
Tree traversal orders returned by the
nodes
method have changed when there
are multiple roots. Previously orders were defined locally for each root, but
are now globally across all roots. (@jeromekelleher, #1704). -
Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
TableCollection.sort
no longer sorts individuals.
(@benjeffery, #1774, #1789) -
Metadata encoding errors now raise
MetadataEncodingError
(@benjeffery, #1505, #1827). -
For
TreeSequence.samples
all arguments afterpopulation
are now keyword only
(@benjeffery, #1715, #1831). -
Remove the method
TreeSequence.to_nexus
and replace withTreeSequence.as_nexus
.
As the old method was not generating standards-compliant output, it seems unlikely
that it was used by anyone. Calls toto_nexus
will result in a
NotImplementedError, informing users of the change. See below for details on
as_nexus
. -
Change default value for
missing_data_char
in theTreeSequence.haplotypes
method from "-" to "N". This is a more idiomatic usage to indicate
missing data rather than a gap in an alignment. (@jeromekelleher,
#1893, #1894)
Features
-
Allow skipping of site and mutation tables in
TableCollection.sort
(@benjeffery, #1475, #1826). -
Add
TableCollection.sort_individuals
to sort the individuals as this is no longer done by the
default sort (@benjeffery, #1774, #1789). -
Add
__setitem__
to all tables allowing single rows to be updated. For example
tables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE)
(@jeromekelleher, @benjeffery, #1545, #1600). -
Added a new parameter
time
toTreeSequence.samples()
allowing to select
samples at a specific time point or time interval.
(@mufernando, @petrelharp, #1692, #1700) -
Add
table.metadata_vector
to all table classes to allow easy extraction of a single
metadata key into an array
(@petrelharp, #1676, #1690). -
Add
time_units
toTreeSequence
to describe the units of the time dimension of the
tree sequence. This is then used to generate an error iftime_units
isuncalibrated
when
using the branch lengths in statistics. (@benjeffery, #1644, #1760, #1832) -
Add the
virtual_root
property to the Tree class (@jeromekelleher, #1704). -
Add the
num_edges
property to the Tree class (@jeromekelleher, #1704). -
Improved performance for tree traversal methods in the
nodes
iterator.
Roughly a 10X performance increase for "preorder", "postorder", "timeasc"
and "timedesc" (@jeromekelleher, #1704). -
Substantial performance improvement for
Tree.total_branch_length
(@jeromekelleher, #1794 #1799) -
Add the
discrete_genome
property to the TreeSequence class which is true if
all coordinates are discrete (@jeromekelleher, #1144, #1819) -
Add a
random_nucleotides
function. (user:jeromekelleher
, #1825) -
Add the
TreeSequence.alignments
method. (user:jeromekelleher
, #1825) -
Add alignment export in the FASTA and nexus formats using the
TreeSequence.write_nexus
andTreeSequence.write_fasta
methods.
(@jeromekelleher, @hyanwong, #1894) -
Add the
discrete_time
property to the TreeSequence class which is true if
all time coordinates are discrete or unknown (@benjeffery, #1839, #1890) -
Add the
skip_tables
option toload
to support only loading
top-level information from a file. Also add theignore_tables
option to
TableCollection.equals
andTableCollection.assert_equals
to
compare only top-level information. (@clwgg, #1882, #1854). -
Add the
skip_reference_sequence
option toload
. Also add the
ignore_reference_sequence
optionequals
to compare two table
collections without comparing their reference sequence. (@clwgg,
#2019, #1971). -
tskit now supports python 3.10 (@benjeffery, #1895, #1949)
Fixes
-
dump_tables
omitted individual parents. (@benjeffery, #1828, #1884) -
Add the
Tree.as_newick
method and deprecateTree.newick
. The
as_newick
method by default labels samples with the pattern"n{node_id}"
which is much more useful that the behaviour ofTree.newick
(which mimics
...
C API 0.99.15
Breaking changes
-
The
tables
argument totsk_treeseq_init
is no longerconst
, to allow for future no-copy tree sequence creation.
(@benjeffery, #1718, #1719) -
Additional consistency checks for mutation tables are now run by
tsk_table_collection_check_integrity
even whenTSK_CHECK_MUTATION_ORDERING
is not passed in. (@petrelharp, #1713, #1722) -
num_tracked_samples
andnum_samples
intsk_tree_t
are now typed astsk_size_t
(@benjeffery, #1723, #1727) -
The previously deprecated option
TSK_SAMPLE_COUNTS
has been removed. (@benjeffery, #1744, #1761). -
Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
tsk_table_collection_sort
no longer sorts individuals.
(@benjeffery, #1774, #1789) -
The
tsk_tree_t.left_root
member has been removed. Client code can be updated
most easily by using the equivalenttsk_tree_get_left_root
function. However,
it may be worth considering updating code to use either the standard traversal
functions (which automatically iterate over roots) or to use thevirtual_root
member (which may lead to more concise code). (@jeromekelleher, #1796,
#1862) -
Rename
tsk_tree_t.left
andtsk_tree_t.right
members to
tsk_tree_t.interval.left
andtsk_tree_t.interval.right
respectively.
(@jeromekelleher, #1686, #1913) -
kastore
is now vendored into this repo instead of being a git submodule. Developers need to run
git submodule update
. (@jeromekelleher, #1687, #1973) -
Tree
arrays such asleft_sib
,right_child
etc. now have an additional
"virtual root" node at the end. (@jeromekelleher, #1691, #1704) -
num_samples
,num_tracked_samples
,marked
andmark
have been removed from
tsk_tree_t
. (@jeromekelleher, #1936)
Features
-
Add
tsk_table_collection_individual_topological_sort
to sort the individuals as this is no longer done by the
default sort. (@benjeffery, #1774, #1789) -
The default behaviour for table size growth is now to double the current size of the table,
up to a threshold. To keep the previous behaviour, use (e.g.)
tsk_edge_table_set_max_rows_increment(tables->edges, 1024)
, which results in adding
space for 1024 additional rows each time we run out of space in the edge table.
(@benjeffery, #5, #1683) -
tsk_table_collection_check_integrity
now has aTSK_CHECK_MIGRATION_ORDERING
flag. (@petrelharp, #1722) -
The default behaviour for ragged column growth is now to double the current size of the column,
up to a threshold. To keep the previous behaviour, use (e.g.)
tsk_node_table_set_max_metadata_length_increment(tables->nodes, 1024)
, which results in adding
space for 1024 additional entries each time we run out of space in the ragged column.
(@benjeffery, #1703, #1709) -
Support for compiling the C library on Windows using msys2 (@jeromekelleher,
#1742). -
Add
time_units
totsk_table_collection_t
to describe the units of the time dimension of the
tree sequence. This is then used to geerate an error iftime_units
isuncalibrated
when
using the branch lengths in statistics. (@benjeffery, #1644, #1760) -
Add the
TSK_LOAD_SKIP_TABLES
option to load just the top-level information from a
file. Also add theTSK_CMP_IGNORE_TABLES
option to compare only the top-level
information in two table collections. (@clwgg, #1882, #1854). -
Add reference sequence.
(@jeromekelleher, @benjeffery, #146, #1911, #1944, #1911) -
Add the
TSK_LOAD_SKIP_REFERENCE_SEQUENCE
option to load a table collection
without the reference sequence. Also add the TSK_CMP_IGNORE_REFERENCE_SEQUENCE
option to compare two table collections without comparing their reference
sequence. (@clwgg, #2019, #1971). -
Add a "virtual root" to
Tree
arrays such asleft_sib
,right_child
etc.
The virtual root is appended to each array, has all real roots as its children,
but is not the parent of any node. Simplifies traversal algorithms.
(@jeromekelleher, #1691, #1704) -
Add
num_edges
totsk_tree_t
to count the edges that define the topology of
the tree. (@jeromekelleher, #1704) -
Add the
tsk_tree_get_size_bound
function which returns an upper bound on the number of nodes reachable from
the roots of a tree. Useful for tree stack allocations (@jeromekelleher, #1704).
C API 0.99.14
Breaking changes
- 64 bits are now used to store the sizes of ragged table columns such as metadata,
allowing them to hold more data. As suchtsk_size_t
is now 64 bits wide.
This change is fully backwards and forwards compatible for all tree-sequences whose
ragged column sizes fit into 32 bits. New tree-sequences with
large offset arrays that require 64 bits will fail to load in previous versions with
errorTSK_ERR_BAD_COLUMN_TYPE
.
(@jeromekelleher, #343, #1527, #1528, #1530,
#1554, #1573, #1589,#1598,#1628, #1571,
#1579, #1585, #1590, #1602, #1618, #1620, #1652).
Features
- Add
tsk_X_table_update_row
methods which allow modifying single rows of tables
(@jeromekelleher, #1545, #1552).