Skip to content

Releases: tskit-dev/tskit

C API C_1.1.1

29 Jul 18:13
Compare
Choose a tag to compare

Bug fixes

  • Fix segfault in tsk_variant_restricted_copy in tree sequences with large
    numbers of alleles or very long alleles
    (@jeromekelleher, #2437, #2429).

Python 0.5.1

14 Jul 12:08
f41eddc
Compare
Choose a tag to compare

Fixes

  • Copies of a Variant object would cause a segfault when .samples was accessed.
    (@benjeffery, #2400, #2401)

Changes

  • Tables in a table collection can be replaced using the replace_with method
    (@hyanwong, #1489 #2389)

  • SVG drawing routines now return a special string object that is automatically
    rendered in a Jupyter notebook (@hyanwong, #2377)

Features

  • New Site.alleles() method (@hyanwong, #2380, #2385)

  • The variants(), haplotypes() and alignments() methods can now
    take a list of sample ids and a left and right position, to restrict the
    size of the output (@hyanwong, #2092, #2397)

C API 1.1.0

14 Jul 12:07
f41eddc
Compare
Choose a tag to compare

Features

  • Add num_children to tsk_tree_t an array which contains counts of the number of child
    nodes of each node in the tree. (@GertjanBisschop, #2274, #2316)

  • Add edge to tsk_tree_t an array which contains the edge_id of the edge encoding
    the relationship between the child node and its parent for each (child) node in the tree.
    (@GertjanBisschop, #2304, #2340)

Changes

  • Reduce the maximum number of rows in a table by 1. This removes edge cases so that a tsk_id_t can be
    used to count the number of rows. (@benjeffery, #2336, #2337)

  • Samples are now copied by tsk_variant_restricted_copy. (@benjeffery, #2400, #2401)

Python 0.5.0

22 Jun 15:05
Compare
Choose a tag to compare

Major Feature Release

Breaking Changes

  • The JSON metadata codec now interprets the empty string as an empty object. This means
    that applying a schema to an existing table will no longer necessitate modifying the
    existing rows. (@benjeffery, #2064, #2104)

  • Remove the previously deprecated as_bytes argument to TreeSequence.variants.
    If you need genotypes in byte form this can be done following the code in the
    to_macs method on line 5573 of trees.py.
    This argument was initially deprecated more than 3 years ago when the code was part of
    msprime.
    (@benjeffery, #605, #2172)

  • Arguments after ploidy in write_vcf marked as keyword only
    (@jeromekelleher, #2329, #2315).

  • When metadata equal to b'' is printed to text or HTML tables it will render as
    an empty string rather than "b''". (@hyanwong, #2349, #2351)

Changes

  • A min_time parameter in draw_svg enables the youngest node as the y axis min
    value, allowing negative times.
    (@hyanwong, #2197, #2215)

  • VcfWriter.write now prints the site ID of variants in the ID field of the
    output VCF files.
    (@roohy, #2103, #2107)

  • Make dumping of tables and tree sequences to disk a zero-copy operation.
    (@benjeffery, #2111, #2124)

  • Add copy argument to TreeSequence.variants which if False reuses the
    returned Variant object for improved performance. Defaults to True.
    (@benjeffery, #605, #2172)

  • tree.mrca now takes 2 or more arguments and gives the common ancestor of them all.
    (@savitakartik, #1340, #2121)

  • Add a edge attribute to the Mutation class that gives the ID of the
    edge that the mutation falls on.
    (@jeromekelleher, #685, #2279).

  • Add the TreeSequence.split_edges operation which inserts nodes into
    edges at a specific time.
    (@jeromekelleher, #2276, #2296).

  • Add the TreeSequence.decapitate (and closely related
    TableCollection.delete_older) operation to remove topology and mutations
    older than a give time.
    (@jeromekelleher, #2236, #2302, #2331).

  • Add the TreeSequence.individuals_time and TreeSequence.individuals_population
    methods to return arrays of per-individual times and populations, respectively.
    (@petrelharp, #1481, #2298).

  • Add the sample_mask and site_mask to write_vcf to allow parts
    of an output VCF to be omitted or marked as missing data. Also add the
    as_vcf convenience function, to return VCF as a string.
    (@jeromekelleher, #2300).

  • Add support for missing data to write_vcf, and add the isolated_as_missing
    argument. (@jeromekelleher, #2329, #447).

  • Add Tree.num_children_array and Tree.num_children. Returns the counts of
    the number of child nodes for each or a single node in the tree respectively.
    (@GertjanBisschop, #2318, #2319, #2332)

  • Add Tree.path_length.
    (@jeremyguez, #2249, #2259).

  • Add B1 tree balance index.
    (@jeremyguez, @jeromekelleher, #2251, #2281, #2346).

  • Add B2 tree balance index.
    (@jeremyguez, @jeromekelleher, #2252, #2353, #2354).

  • Add Sackin tree imbalance index.
    (@jeremyguez, @jeromekelleher, #2246, #2258).

  • Add Colless tree imbalance index.
    (@jeremyguez, @jeromekelleher, #2250, #2266, #2344).

  • Add direction argument to TreeSequence.edge_diffs, allowing iteration
    over diffs in the reverse direction. NOTE: this comes with a ~10% performance
    regression as the implementation was moved from C to Python for simplicity
    and maintainability. Please open an issue if this affects your application.
    (@jeromekelleher, @benjeffery, #2120).

  • Add Tree.edge_array and Tree.edge. Returns the edge id of the edge encoding
    the relationship of each node with its parent.
    (@GertjanBisschop, #2361, #2357)

C API 1.0.0

24 May 17:15
Compare
Choose a tag to compare

This major release marks the point at which the documented API becomes stable and supported.

Breaking changes

  • Change the type of genotypes to int32_t, removing the TSK_16_BIT_GENOTYPES flag option.
    (@benjeffery, #463, #2108)

  • tsk_variant_t now includes its tsk_site_t rather than pointing to it.
    (@benjeffery, #2161, #2162)

  • Rename TSK_TAKE_TABLES to TSK_TAKE_OWNERSHIP.
    (@benjeffery, #2221, #2222)

  • TSK_DEBUG, TSK_NO_INIT, TSK_NO_CHECK_INTEGRITY and TSK_TAKE_OWNERSHIP have moved to core.h
    (@benjeffery, #2218, #2230))

  • Rename several flags:

    • All flags to simplify for example TSK_KEEP_INPUT_ROOTS becomes TSK_SIMPLIFY_KEEP_INPUT_ROOTS.
    • All flags to subset for example TSK_KEEP_UNREFERENCED becomes TSK_SUBSET_KEEP_UNREFERENCED.
    • TSK_BUILD_INDEXES -> TSK_TS_INIT_BUILD_INDEXES
    • TSK_NO_METADATA -> TSK_TABLE_NO_METADATA
    • TSK_NO_EDGE_METADATA -> TSK_TC_NO_EDGE_METADATA

    (@benjeffery, #1720, #2226, #2229, #2224)

  • Remove the generic TSK_ERR_OUT_OF_BOUNDS - replacing with specific errors.
    Remove TSK_ERR_NON_SINGLE_CHAR_MUTATION which was unused.
    (@benjeffery, #2260)

  • Reorder stats API methods to place result as the last argument. (@benjeffery, #2292, #2285)

Features

Python 0.4.1

11 Jan 14:21
Compare
Choose a tag to compare

Bugfix release

Changes

Fixes

Python 0.4.0

10 Dec 17:25
Compare
Choose a tag to compare

Major Python release

Breaking changes

  • The Tree.num_nodes method is now deprecated with a warning, because it confusingly
    returns the number of nodes in the entire tree sequence, rather than in the tree. Text
    summaries of trees (e.g. str(tree)) now return the number of nodes in the tree,
    not in the entire tree sequence (@hyanwong, #1966 #1968)

  • The CLI info command now gives more detailed information on the tree sequence
    (@benjeffery, #1611)

  • 64 bits are now used to store the sizes of ragged table columns such as metadata,
    allowing them to hold more data. This change is fully backwards and forwards compatible
    for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with
    large offset arrays that require 64 bits will fail to load in previous versions with
    error _tskit.FileFormatError: An incompatible type for a column was found in the file.
    (@jeromekelleher, #343, #1527, #1528, #1530,
    #1554, #1573, #1589,#1598,#1628, #1571,
    #1579, #1585, #1590, #1602, #1618, #1620, #1652).

  • The Tree class now conceptually has an extra node, the "virtual root" whose
    children are the roots of the tree. The quintuply linked tree arrays
    (parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array)
    all have one extra element.
    (@jeromekelleher, #1691, #1704).

  • Tree traversal orders returned by the nodes method have changed when there
    are multiple roots. Previously orders were defined locally for each root, but
    are now globally across all roots. (@jeromekelleher, #1704).

  • Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
    TableCollection.sort no longer sorts individuals.
    (@benjeffery, #1774, #1789)

  • Metadata encoding errors now raise MetadataEncodingError
    (@benjeffery, #1505, #1827).

  • For TreeSequence.samples all arguments after population are now keyword only
    (@benjeffery, #1715, #1831).

  • Remove the method TreeSequence.to_nexus and replace with TreeSequence.as_nexus.
    As the old method was not generating standards-compliant output, it seems unlikely
    that it was used by anyone. Calls to to_nexus will result in a
    NotImplementedError, informing users of the change. See below for details on
    as_nexus.

  • Change default value for missing_data_char in the TreeSequence.haplotypes
    method from "-" to "N". This is a more idiomatic usage to indicate
    missing data rather than a gap in an alignment. (@jeromekelleher,
    #1893, #1894)

Features

  • Add the ibd_segments method and associated classes to compute, summarise
    and store segments of identity by descent from a tree sequence
    (@gtsambos, @jeromekelleher).

  • Allow skipping of site and mutation tables in TableCollection.sort
    (@benjeffery, #1475, #1826).

  • Add TableCollection.sort_individuals to sort the individuals as this is no longer done by the
    default sort (@benjeffery, #1774, #1789).

  • Add __setitem__ to all tables allowing single rows to be updated. For example
    tables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE)
    (@jeromekelleher, @benjeffery, #1545, #1600).

  • Added a new parameter time to TreeSequence.samples() allowing to select
    samples at a specific time point or time interval.
    (@mufernando, @petrelharp, #1692, #1700)

  • Add table.metadata_vector to all table classes to allow easy extraction of a single
    metadata key into an array
    (@petrelharp, #1676, #1690).

  • Add time_units to TreeSequence to describe the units of the time dimension of the
    tree sequence. This is then used to generate an error if time_units is uncalibrated when
    using the branch lengths in statistics. (@benjeffery, #1644, #1760, #1832)

  • Add the virtual_root property to the Tree class (@jeromekelleher, #1704).

  • Add the num_edges property to the Tree class (@jeromekelleher, #1704).

  • Improved performance for tree traversal methods in the nodes iterator.
    Roughly a 10X performance increase for "preorder", "postorder", "timeasc"
    and "timedesc" (@jeromekelleher, #1704).

  • Substantial performance improvement for Tree.total_branch_length
    (@jeromekelleher, #1794 #1799)

  • Add the discrete_genome property to the TreeSequence class which is true if
    all coordinates are discrete (@jeromekelleher, #1144, #1819)

  • Add a random_nucleotides function. (user:jeromekelleher, #1825)

  • Add the TreeSequence.alignments method. (user:jeromekelleher, #1825)

  • Add alignment export in the FASTA and nexus formats using the
    TreeSequence.write_nexus and TreeSequence.write_fasta methods.
    (@jeromekelleher, @hyanwong, #1894)

  • Add the discrete_time property to the TreeSequence class which is true if
    all time coordinates are discrete or unknown (@benjeffery, #1839, #1890)

  • Add the skip_tables option to load to support only loading
    top-level information from a file. Also add the ignore_tables option to
    TableCollection.equals and TableCollection.assert_equals to
    compare only top-level information. (@clwgg, #1882, #1854).

  • Add the skip_reference_sequence option to load. Also add the
    ignore_reference_sequence option equals to compare two table
    collections without comparing their reference sequence. (@clwgg,
    #2019, #1971).

  • tskit now supports python 3.10 (@benjeffery, #1895, #1949)

Fixes

  • dump_tables omitted individual parents. (@benjeffery, #1828, #1884)

  • Add the Tree.as_newick method and deprecate ...

Read more

Python 0.4.0 BETA 1

07 Dec 19:44
Compare
Choose a tag to compare

BETA RELEASE

  • Install with pip install --pre tskit
  • Please report any issues.

Breaking changes

  • The Tree.num_nodes method is now deprecated with a warning, because it confusingly
    returns the number of nodes in the entire tree sequence, rather than in the tree. Text
    summaries of trees (e.g. str(tree)) now return the number of nodes in the tree,
    not in the entire tree sequence (@hyanwong, #1966 #1968)

  • The CLI info command now gives more detailed information on the tree sequence
    (@benjeffery, #1611)

  • 64 bits are now used to store the sizes of ragged table columns such as metadata,
    allowing them to hold more data. This change is fully backwards and forwards compatible
    for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with
    large offset arrays that require 64 bits will fail to load in previous versions with
    error _tskit.FileFormatError: An incompatible type for a column was found in the file.
    (@jeromekelleher, #343, #1527, #1528, #1530,
    #1554, #1573, #1589,#1598,#1628, #1571,
    #1579, #1585, #1590, #1602, #1618, #1620, #1652).

  • The Tree class now conceptually has an extra node, the "virtual root" whose
    children are the roots of the tree. The quintuply linked tree arrays
    (parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array)
    all have one extra element.
    (@jeromekelleher, #1691, #1704).

  • Tree traversal orders returned by the nodes method have changed when there
    are multiple roots. Previously orders were defined locally for each root, but
    are now globally across all roots. (@jeromekelleher, #1704).

  • Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
    TableCollection.sort no longer sorts individuals.
    (@benjeffery, #1774, #1789)

  • Metadata encoding errors now raise MetadataEncodingError
    (@benjeffery, #1505, #1827).

  • For TreeSequence.samples all arguments after population are now keyword only
    (@benjeffery, #1715, #1831).

  • Remove the method TreeSequence.to_nexus and replace with TreeSequence.as_nexus.
    As the old method was not generating standards-compliant output, it seems unlikely
    that it was used by anyone. Calls to to_nexus will result in a
    NotImplementedError, informing users of the change. See below for details on
    as_nexus.

  • Change default value for missing_data_char in the TreeSequence.haplotypes
    method from "-" to "N". This is a more idiomatic usage to indicate
    missing data rather than a gap in an alignment. (@jeromekelleher,
    #1893, #1894)

Features

  • Allow skipping of site and mutation tables in TableCollection.sort
    (@benjeffery, #1475, #1826).

  • Add TableCollection.sort_individuals to sort the individuals as this is no longer done by the
    default sort (@benjeffery, #1774, #1789).

  • Add __setitem__ to all tables allowing single rows to be updated. For example
    tables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE)
    (@jeromekelleher, @benjeffery, #1545, #1600).

  • Added a new parameter time to TreeSequence.samples() allowing to select
    samples at a specific time point or time interval.
    (@mufernando, @petrelharp, #1692, #1700)

  • Add table.metadata_vector to all table classes to allow easy extraction of a single
    metadata key into an array
    (@petrelharp, #1676, #1690).

  • Add time_units to TreeSequence to describe the units of the time dimension of the
    tree sequence. This is then used to generate an error if time_units is uncalibrated when
    using the branch lengths in statistics. (@benjeffery, #1644, #1760, #1832)

  • Add the virtual_root property to the Tree class (@jeromekelleher, #1704).

  • Add the num_edges property to the Tree class (@jeromekelleher, #1704).

  • Improved performance for tree traversal methods in the nodes iterator.
    Roughly a 10X performance increase for "preorder", "postorder", "timeasc"
    and "timedesc" (@jeromekelleher, #1704).

  • Substantial performance improvement for Tree.total_branch_length
    (@jeromekelleher, #1794 #1799)

  • Add the discrete_genome property to the TreeSequence class which is true if
    all coordinates are discrete (@jeromekelleher, #1144, #1819)

  • Add a random_nucleotides function. (user:jeromekelleher, #1825)

  • Add the TreeSequence.alignments method. (user:jeromekelleher, #1825)

  • Add alignment export in the FASTA and nexus formats using the
    TreeSequence.write_nexus and TreeSequence.write_fasta methods.
    (@jeromekelleher, @hyanwong, #1894)

  • Add the discrete_time property to the TreeSequence class which is true if
    all time coordinates are discrete or unknown (@benjeffery, #1839, #1890)

  • Add the skip_tables option to load to support only loading
    top-level information from a file. Also add the ignore_tables option to
    TableCollection.equals and TableCollection.assert_equals to
    compare only top-level information. (@clwgg, #1882, #1854).

  • Add the skip_reference_sequence option to load. Also add the
    ignore_reference_sequence option equals to compare two table
    collections without comparing their reference sequence. (@clwgg,
    #2019, #1971).

  • tskit now supports python 3.10 (@benjeffery, #1895, #1949)

Fixes

  • dump_tables omitted individual parents. (@benjeffery, #1828, #1884)

  • Add the Tree.as_newick method and deprecate Tree.newick. The
    as_newick method by default labels samples with the pattern "n{node_id}"
    which is much more useful that the behaviour of Tree.newick (which mimics
    ...

Read more

C API 0.99.15

07 Dec 13:23
ee5fdb3
Compare
Choose a tag to compare

Breaking changes

  • The tables argument to tsk_treeseq_init is no longer const, to allow for future no-copy tree sequence creation.
    (@benjeffery, #1718, #1719)

  • Additional consistency checks for mutation tables are now run by tsk_table_collection_check_integrity
    even when TSK_CHECK_MUTATION_ORDERING is not passed in. (@petrelharp, #1713, #1722)

  • num_tracked_samples and num_samples in tsk_tree_t are now typed as tsk_size_t
    (@benjeffery, #1723, #1727)

  • The previously deprecated option TSK_SAMPLE_COUNTS has been removed. (@benjeffery, #1744, #1761).

  • Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
    tsk_table_collection_sort no longer sorts individuals.
    (@benjeffery, #1774, #1789)

  • The tsk_tree_t.left_root member has been removed. Client code can be updated
    most easily by using the equivalent tsk_tree_get_left_root function. However,
    it may be worth considering updating code to use either the standard traversal
    functions (which automatically iterate over roots) or to use the virtual_root
    member (which may lead to more concise code). (@jeromekelleher, #1796,
    #1862)

  • Rename tsk_tree_t.left and tsk_tree_t.right members to
    tsk_tree_t.interval.left and tsk_tree_t.interval.right respectively.
    (@jeromekelleher, #1686, #1913)

  • kastore is now vendored into this repo instead of being a git submodule. Developers need to run
    git submodule update. (@jeromekelleher, #1687, #1973)

  • Tree arrays such as left_sib, right_child etc. now have an additional
    "virtual root" node at the end. (@jeromekelleher, #1691, #1704)

  • num_samples, num_tracked_samples, marked and mark have been removed from
    tsk_tree_t. (@jeromekelleher, #1936)

Features

  • Add tsk_table_collection_individual_topological_sort to sort the individuals as this is no longer done by the
    default sort. (@benjeffery, #1774, #1789)

  • The default behaviour for table size growth is now to double the current size of the table,
    up to a threshold. To keep the previous behaviour, use (e.g.)
    tsk_edge_table_set_max_rows_increment(tables->edges, 1024), which results in adding
    space for 1024 additional rows each time we run out of space in the edge table.
    (@benjeffery, #5, #1683)

  • tsk_table_collection_check_integrity now has a TSK_CHECK_MIGRATION_ORDERING flag. (@petrelharp, #1722)

  • The default behaviour for ragged column growth is now to double the current size of the column,
    up to a threshold. To keep the previous behaviour, use (e.g.)
    tsk_node_table_set_max_metadata_length_increment(tables->nodes, 1024), which results in adding
    space for 1024 additional entries each time we run out of space in the ragged column.
    (@benjeffery, #1703, #1709)

  • Support for compiling the C library on Windows using msys2 (@jeromekelleher,
    #1742).

  • Add time_units to tsk_table_collection_t to describe the units of the time dimension of the
    tree sequence. This is then used to geerate an error if time_units is uncalibrated when
    using the branch lengths in statistics. (@benjeffery, #1644, #1760)

  • Add the TSK_LOAD_SKIP_TABLES option to load just the top-level information from a
    file. Also add the TSK_CMP_IGNORE_TABLES option to compare only the top-level
    information in two table collections. (@clwgg, #1882, #1854).

  • Add reference sequence.
    (@jeromekelleher, @benjeffery, #146, #1911, #1944, #1911)

  • Add the TSK_LOAD_SKIP_REFERENCE_SEQUENCE option to load a table collection
    without the reference sequence. Also add the TSK_CMP_IGNORE_REFERENCE_SEQUENCE
    option to compare two table collections without comparing their reference
    sequence. (@clwgg, #2019, #1971).

  • Add a "virtual root" to Tree arrays such as left_sib, right_child etc.
    The virtual root is appended to each array, has all real roots as its children,
    but is not the parent of any node. Simplifies traversal algorithms.
    (@jeromekelleher, #1691, #1704)

  • Add num_edges to tsk_tree_t to count the edges that define the topology of
    the tree. (@jeromekelleher, #1704)

  • Add the tsk_tree_get_size_bound function which returns an upper bound on the number of nodes reachable from
    the roots of a tree. Useful for tree stack allocations (@jeromekelleher, #1704).

C API 0.99.14

06 Sep 10:11
31797f6
Compare
Choose a tag to compare

Breaking changes

  • 64 bits are now used to store the sizes of ragged table columns such as metadata,
    allowing them to hold more data. As such tsk_size_t is now 64 bits wide.
    This change is fully backwards and forwards compatible for all tree-sequences whose
    ragged column sizes fit into 32 bits. New tree-sequences with
    large offset arrays that require 64 bits will fail to load in previous versions with
    error TSK_ERR_BAD_COLUMN_TYPE.
    (@jeromekelleher, #343, #1527, #1528, #1530,
    #1554, #1573, #1589,#1598,#1628, #1571,
    #1579, #1585, #1590, #1602, #1618, #1620, #1652).

Features