Skip to content

Tracking changes and merging branches during ontology development

Marta Costa edited this page Jun 12, 2015 · 2 revisions

Tracking changes to OBO files

OBO format has a recommended tight serialisation in which line order and the order of components in lines is strictly specified. This makes tracking changes on OBO files easy: as long as all files are tightly serialised, simply use the diff tool of your choice to compare changes between versions. There are two main options for ensuring tight serialisation: OORT and OBO-Edit. __WARNING:__If you have hand edited an OBO file, please be careful to ensure that IDs are NOT duplicated prior to re-re-serialising. This will result in merged terms. Jenkins will check this for you if you commit to the repository <TEST!> - as OORT will fail if one term has >1 label.

For changes between subsequently committed versions, a repository diff tool adequate and convenient. When there are larger numbers of changes, it can be useful to use a diff tool that allows individual changes to be reverted, such as the one built into emacs.

We also have a script, obo_def_comp.pl for tracking changes to comments and definitions between versions. In addition, Jenkins uses obo_track_new.pl to track obsoletions, merges and changes to names and IDs.

Merging branched OBO files.

It is possible to use svn merge successfully with tightly serialised OBO files, but we have traditionally used emacs merge instead. Merging provides an opportunity to review the contents and merging of non-clashing changes as well as to resolve clashes. Emacs merge makes it very easy to switch the winning change in a merge. For this strategy to work, it is essential to track the version in the trunk from which the current branched copies were made (svn would, of course, do this for you if you use svn merge with a standard branching strategy).

Tracking changes and merging OWL files.

This is harder than for OBO. At the time of writing, none of the various OWL formats is human readable given a strategy of having distinct labels and URIs - which we follow for all our ontologies.

Tight serialisations, as for OBO format, could work for Functional syntax and Manchester syntax formats, but none is currently available. Manchester syntax is the most OBO-like, and certainly the most readable by non-logicians. Unfortunately it is currently incompletely expressive, and so is not considered safe as an archival format.

Various proposal exist to extend Manchester Syntax to make it fully expressive, to improve readability by incorporating a system for displaying human readable IDs and to specify tight serialisation. But until these are in place with tooling support, file level diffs and merges remain problematic for OWL ontologies. At a push, functional syntax + Protege provides a borderline sufficient solution - but should be used with caution.

A number of GUI-based tools have been developed for diffs and merges of OWL files. These should be periodically investigated to see if they are usable.