Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Different languages for model specification #538

Merged
merged 13 commits into from
Jul 3, 2024
37 changes: 29 additions & 8 deletions doc/_static/petab_schema.yaml
Original file line number Diff line number Diff line change
@@ -38,13 +38,26 @@ properties:
files and optional visualization files.
properties:

sbml_files:
type: array
description: List of PEtab SBML files.

items:
type: string
description: PEtab SBML file name or URL.
model_files:
type: object
description: One or multiple models

# the model ID
patternProperties:
"^[a-zA-Z_]\\w*$":
type: object
properties:
location:
type: string
description: Model file name or URL
language:
type: string
description: |
Model language, e.g., 'sbml', 'cellml', 'bngl', 'pysb'
required:
- location
- language
additionalProperties: false

measurement_files:
type: array
@@ -78,8 +91,16 @@ properties:
type: string
description: PEtab visualization file name or URL.

mapping_files:
type: array
description: List of PEtab mapping files.

items:
type: string
description: PEtab mapping file name or URL.

required:
- sbml_files
- model_files
- observable_files
- measurement_files
- condition_files
131 changes: 95 additions & 36 deletions doc/documentation_data_format.rst
Original file line number Diff line number Diff line change
@@ -2,7 +2,7 @@ PEtab data format specification
===============================


Format version: 1
Format version: 2.0.0

This document explains the PEtab data format.

@@ -41,12 +41,11 @@ Overview
---------

The PEtab data format specifies a parameter estimation problem using a number
of text-based files (`Systems Biology Markup Language (SBML) <http://sbml.org>`_
and
of text-based files (
`Tab-Separated Values (TSV) <https://www.iana.org/assignments/media-types/text/tab-separated-values>`_)
(Figure 2), i.e.

- An SBML model [SBML]
- A model

- A measurement file to fit the model to [TSV]

@@ -67,6 +66,9 @@ and
- (optional) A visualization file, which contains specifications how the data
and/or simulations should be plotted by the visualization routines [TSV]

- (optional) A mapping file, which allows mapping PEtab entity IDs to entity
IDs in the model, which might not have valid PEtab IDs themselves [TSV]

.. figure:: gfx/petab_files.png
:alt: Files constituting a PEtab problem

@@ -91,11 +93,11 @@ problem as such.
- Fields in "[]" are optional and may be left empty.


SBML model definition
---------------------

The model must be specified as valid SBML. There are no further restrictions.
Model definition
----------------

PEtab 2.0.0 is agnostic of specific model formats. A model file is referenced
in the PEtab problem description (YAML) via its file name or a URL.

Condition table
---------------
@@ -107,7 +109,7 @@ different experimental conditions).
This is specified as a tab-separated value file in the following way:

+--------------+------------------+------------------------------------+-----+---------------------------------------+
| conditionId | [conditionName] | parameterOrSpeciesOrCompartmentId1 | ... | parameterOrSpeciesOrCompartmentId${n} |
| conditionId | [conditionName] | modelEntityId1 | ... | modelEntityId${n} |
+==============+==================+====================================+=====+=======================================+
| STRING | [STRING] | NUMERIC\|STRING | ... | NUMERIC\|STRING |
+--------------+------------------+------------------------------------+-----+---------------------------------------+
@@ -140,32 +142,44 @@ Detailed field description
Condition names are arbitrary strings to describe the given condition.
They may be used for reporting or visualization.

- ``${parameterOrSpeciesOrCompartmentId1}``

Further columns may be global parameter IDs, IDs of species or compartments
as defined in the SBML model. Only one column is allowed per ID.
Values for these condition parameters may be provided either as numeric
values, or as IDs defined in the SBML model, the parameter table or both.

- ``${parameterId}``

The values will override any parameter values specified in the model.

- ``${speciesId}``

If a species ID is provided, it is interpreted as the initial
condition of that species (as amount if `hasOnlySubstanceUnits` is set to `True`
for the respective species, as concentration otherwise) and will override the
initial condition given in the SBML model or given by a preequilibration
condition. If no value is provided for a condition, the result of the
preequilibration (or initial condition from the SBML model, if
no preequilibration is defined) is used.

- ``${compartmentId}``

If a compartment ID is provided, it is interpreted as the initial
compartment size.

- ``${modelEntityId}``

Further columns may be the IDs of model entities that have globally unique
IDs, such as parameters, species or compartments defined in the model to set
condition-specific values. Only one column is allowed per ID.
Values for these entities may be provided either as numeric values, or as IDs
of globally unique entity IDs as defined in the model, the mapping table or
the parameter table.

Any non-``NaN`` value will override the original values of the model, or if
preequilibration was used, they will override the value obtained from
preequilibration. A ``NaN`` value indicates that the original value of the
model is to be used (when used in the preequilibration condition, or in the
simulation condition if no preequilibration is used) or that the result of
preequilibration is to be used (when used in the simulation condition after
preequilibration).

The value in the condition table either replaces the initial value or the
value at all timepoints based on whether the model entity has a rate law
assigned or not:

* For model entities that have constant algebraic assignments
(but not necessarily constant values), i.e, that do not have a rate of
change with respect to time assigned and that are not subject to event
assignments, the algebraic assignment is replaced statically at all
timepoints. Examples for such model entities are the targets of SBML
`AssignmentRules`.

* For all other entities, e.g., those that are assigned by SBML `RateRules`,
only the initial value can be assigned in the condition table. If an
assignment of the rate of change with respect to time or event assignment
is desired, the values of model entities that are used to define rate of
change or event assignments must be assigned in the condition table.
If no such model entities exist, assignment is not possible.

If the model has a concept of species and a species ID is provided, its
value is interpreted as amount or concentration in the same way as anywhere
else in the model.

Measurement table
-----------------
@@ -705,6 +719,49 @@ Detailed field description
legend and which defaults to the value in ``datasetId``.


Mapping table
-------------

Mapping PEtab entity IDs to entity IDs in the model. This optional file may be
used to reference model entities in PEtab files where the ID in the model would
not be a valid identifier in PEtab (e.g., due to inclusion of blanks, dots, or
other special characters).

The TSV file has two mandatory columns, ``petabEntityId`` and
``modelEntityId``. Additional columns are allowed.

+---------------+---------------+
| petabEntityId | modelEntityId |
+===============+===============+
| STRING | STRING |
+---------------+---------------+
| reaction1_k1 | reaction1.k1 |
+---------------+---------------+


Detailed field description
~~~~~~~~~~~~~~~~~~~~~~~~~~

- ``petabEntityId`` [STRING, NOT NULL]

A valid PEtab identifier that is not defined in any other part of the PEtab
problem. This identifier may be referenced in condition, measurement,
parameter and observable tables, but cannot be referenced in the model
itself.

- ``modelEntityId`` [STRING, NOT NULL]

A globally unique identifier defined in the model,
*that is not a valid PEtab ID* (see :ref:`identifiers`).

For example, in SBML, local parameters may be referenced as
``$reactionId.$localParameterId``, which are not valid PEtab IDs as they
contain a ``.`` character. Similarly, this table may be used to reference
specific species in a BNGL model that may contain many unsupported
characters such as ``,``, ``(`` or ``.``. However, please note that IDs must
exactly match the species names in the BNGL-generated network file, and no
pattern matching will be performed.

Extensions
~~~~~~~~~~

@@ -743,7 +800,7 @@ Parameter estimation problems combining multiple models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Parameter estimation problems can comprise multiple models. For now, PEtab
allows to specify multiple SBML models with corresponding condition and
allows one to specify multiple models with corresponding condition and
measurement tables, and one joint parameter table. This means that the parameter
namespace is global. Therefore, parameters with the same ID in different models
will be considered identical.
@@ -1070,6 +1127,8 @@ float values are demoted to boolean values. For example, in ``1 + true``,
the expression is interpreted as ``true && true = true``.


.. _identifiers:

Identifiers
-----------