Skip to content

Python 0.4.0

Compare
Choose a tag to compare
@github-actions github-actions released this 10 Dec 17:25
· 514 commits to main since this release

Major Python release

Breaking changes

  • The Tree.num_nodes method is now deprecated with a warning, because it confusingly
    returns the number of nodes in the entire tree sequence, rather than in the tree. Text
    summaries of trees (e.g. str(tree)) now return the number of nodes in the tree,
    not in the entire tree sequence (@hyanwong, #1966 #1968)

  • The CLI info command now gives more detailed information on the tree sequence
    (@benjeffery, #1611)

  • 64 bits are now used to store the sizes of ragged table columns such as metadata,
    allowing them to hold more data. This change is fully backwards and forwards compatible
    for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with
    large offset arrays that require 64 bits will fail to load in previous versions with
    error _tskit.FileFormatError: An incompatible type for a column was found in the file.
    (@jeromekelleher, #343, #1527, #1528, #1530,
    #1554, #1573, #1589,#1598,#1628, #1571,
    #1579, #1585, #1590, #1602, #1618, #1620, #1652).

  • The Tree class now conceptually has an extra node, the "virtual root" whose
    children are the roots of the tree. The quintuply linked tree arrays
    (parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array)
    all have one extra element.
    (@jeromekelleher, #1691, #1704).

  • Tree traversal orders returned by the nodes method have changed when there
    are multiple roots. Previously orders were defined locally for each root, but
    are now globally across all roots. (@jeromekelleher, #1704).

  • Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
    TableCollection.sort no longer sorts individuals.
    (@benjeffery, #1774, #1789)

  • Metadata encoding errors now raise MetadataEncodingError
    (@benjeffery, #1505, #1827).

  • For TreeSequence.samples all arguments after population are now keyword only
    (@benjeffery, #1715, #1831).

  • Remove the method TreeSequence.to_nexus and replace with TreeSequence.as_nexus.
    As the old method was not generating standards-compliant output, it seems unlikely
    that it was used by anyone. Calls to to_nexus will result in a
    NotImplementedError, informing users of the change. See below for details on
    as_nexus.

  • Change default value for missing_data_char in the TreeSequence.haplotypes
    method from "-" to "N". This is a more idiomatic usage to indicate
    missing data rather than a gap in an alignment. (@jeromekelleher,
    #1893, #1894)

Features

  • Add the ibd_segments method and associated classes to compute, summarise
    and store segments of identity by descent from a tree sequence
    (@gtsambos, @jeromekelleher).

  • Allow skipping of site and mutation tables in TableCollection.sort
    (@benjeffery, #1475, #1826).

  • Add TableCollection.sort_individuals to sort the individuals as this is no longer done by the
    default sort (@benjeffery, #1774, #1789).

  • Add __setitem__ to all tables allowing single rows to be updated. For example
    tables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE)
    (@jeromekelleher, @benjeffery, #1545, #1600).

  • Added a new parameter time to TreeSequence.samples() allowing to select
    samples at a specific time point or time interval.
    (@mufernando, @petrelharp, #1692, #1700)

  • Add table.metadata_vector to all table classes to allow easy extraction of a single
    metadata key into an array
    (@petrelharp, #1676, #1690).

  • Add time_units to TreeSequence to describe the units of the time dimension of the
    tree sequence. This is then used to generate an error if time_units is uncalibrated when
    using the branch lengths in statistics. (@benjeffery, #1644, #1760, #1832)

  • Add the virtual_root property to the Tree class (@jeromekelleher, #1704).

  • Add the num_edges property to the Tree class (@jeromekelleher, #1704).

  • Improved performance for tree traversal methods in the nodes iterator.
    Roughly a 10X performance increase for "preorder", "postorder", "timeasc"
    and "timedesc" (@jeromekelleher, #1704).

  • Substantial performance improvement for Tree.total_branch_length
    (@jeromekelleher, #1794 #1799)

  • Add the discrete_genome property to the TreeSequence class which is true if
    all coordinates are discrete (@jeromekelleher, #1144, #1819)

  • Add a random_nucleotides function. (user:jeromekelleher, #1825)

  • Add the TreeSequence.alignments method. (user:jeromekelleher, #1825)

  • Add alignment export in the FASTA and nexus formats using the
    TreeSequence.write_nexus and TreeSequence.write_fasta methods.
    (@jeromekelleher, @hyanwong, #1894)

  • Add the discrete_time property to the TreeSequence class which is true if
    all time coordinates are discrete or unknown (@benjeffery, #1839, #1890)

  • Add the skip_tables option to load to support only loading
    top-level information from a file. Also add the ignore_tables option to
    TableCollection.equals and TableCollection.assert_equals to
    compare only top-level information. (@clwgg, #1882, #1854).

  • Add the skip_reference_sequence option to load. Also add the
    ignore_reference_sequence option equals to compare two table
    collections without comparing their reference sequence. (@clwgg,
    #2019, #1971).

  • tskit now supports python 3.10 (@benjeffery, #1895, #1949)

Fixes

  • dump_tables omitted individual parents. (@benjeffery, #1828, #1884)

  • Add the Tree.as_newick method and deprecate Tree.newick. The
    as_newick method by default labels samples with the pattern "n{node_id}"
    which is much more useful that the behaviour of Tree.newick (which mimics
    ms output). (@jeromekelleher, #1671, #1838.)

  • Add the as_nexus and write_nexus methods to the TreeSequence class,
    replacing the broken to_nexus method (see above). This uses the same
    sample labelling pattern as as_newick.
    (@jeetsukumaran, @jeromekelleher, #1785, #1835,
    #1836, #1838)

  • load_text created additional populations even if the population table was specified,
    and didn't strip newlines from input text (@hyanwong, #1909, #1910)