Skip to content

Expressing phylogenetic claims

Jonathan A Rees edited this page Oct 15, 2019 · 4 revisions

See also: Declarative 'patch' system

On 25 June 2014, Jonathan gave a short presentation at iEvoBio 2014. The slide deck is here. This page attempts to explain the context and present status of this work, as of June 2014.

History

We've implemented two different 'patch' systems for making ad hoc modifications to the taxonomy. Currently there are a few hundred patches. The script that makes the taxonomy has become increasingly fragile and confusing due to order dependence and context sensitivity of the patch directives. Furthermore we have identified regression testing as an important unfilled need. The solution seems to be a fully declarative language in which one writes biological claims. A claim can be either asserted into the taxonomy, or tested against it for regression testing.

To get to this goal the syntax and semantics of the claim language needs to be specified. Thus the idea presented in the talk that terms of the language designate taxa, and claims are relations between taxa, or other 'about' taxa.

The language is only partially implemented (Claim.java in smasher); there is much left to do. The talk was just to give a flavor of the theory that was the outcome of months of experience and reflection, not to report on a working system.

Comparison with the current version of smasher

The smasher 'smashing' process, where two taxonomies are combined, has two phases. First is 'alignment' in which taxon name occurrences in the two taxonomies are assessed for coreference, subject to available information. The complexity here mostly has to do with homonyms, the fact that a single name can refer to different taxa depending on which taxonomy, either within a single taxonomy or across taxonomies (i.e. a single name that isn't a homonym in either taxonomy but refers to different taxa in the two).

In the second phase, taxa that occur in the lower priority taxa are added to the structure of the first taxonomy in compatible locations.

(To be written: how current version works; how it should work based on the 'claim' analysis.)