Skip to content

Labeled Property Graph Model

Cátia Raquel Jesus Vaz edited this page Jun 30, 2023 · 32 revisions

Graph Model

phyloDB rely on a graph data model which is represented in next Figure. This model is built from a subset of the concepts and properties defined in the TypOn ontology [1, 2], allowing the representation of the main entities in phylogenetic analyses as well as their relationships. Other entities such as users and projects are introduced to support user and project management, including authorization and data versioning.


Domain description

The main concepts found in phylogenetic data are loci (group of Locus), Allele, typing Schema, allelic Profile and Isolate. A Taxon or taxonomic unit has several associated loci, represented in the data model as depicted in the previous Figure by the relationship 'CONTAINS' between concepts Taxon and Locus. Moreover, the fact that a locus denotes a region where alleles occur is represented by the relationship 'CONTAINS' between concepts 'Locus' and 'Allele'.

Typing schemas can rely on several loci to characterise different allelic profiles, and this is expressed in the data model by the relationship 'HAS' between Schema Details and Locus. Allelic profiles belong to a specific dataset as described by the relationship 'CONTAINS' between Dataset and Profile. These profiles follow the same schema, therefore they should be related to the typing method used, and hence the 'HAS' relationship between Dataset and Schema.

An isolate can be associated with typing information and a set of Ancillary details. Ancillary data include information about the place where the microorganism was isolated, the environment, the host, and other possible contextual details. In the data model this is expressed with the relationship 'HAS' between Isolate Details and 'Ancillary'. In practice the information about a given isolate has several ancillary data associated. Since an isolate may be associated to an allelic profile, there is also a relationship 'HAS' between Isolate Details and Profile.

Other entities such as User, Project, UserDetails, ProjectDetails are introduced to support user and project management, including authorization and data versioning.

The data model also supports versioning and soft deletion. Given the data dependencies, the importance of keeping track of changes and for the sake of reproducible results, we avoid deleting information in the database. In this case, by considering a versioning and a soft-delete strategy, information removal is possible while keeping previous results valid for the underlying version of the data. The versioning strategy to achieve this behavior is to separate each object from its state, link them through a relationship with the respective version number, and capture changes by having different state nodes. As depicted in the Figure, in the data model the name of the status nodes ends with Details, and the version annotations are done via relationship 'CONTAINS_DETAILS'. For this purpose, it exists TaxonDetails, LocusDetails, AlleleDetails, SchemaDetails, ProfileDetails, IsolateDetails, DatasetDetails, UserDetails and ProjectDetails.

The Coordinate concept is to describe the coordinates for a given visualization algorithm.

Taxon

  • Properties:
    • id (Unique, Mandatory, String) - Taxonomic unit identifier.
    • deprecated (Mandatory, Boolean) - Deprecated flag.
  • Relations:
    • CONTAINS (To: Locus) - Expresses which loci compose this taxonomic unit.
    • CONTAINS_DETAILS (To: TaxonDetails) - Expresses which are the properties, by version, of this taxonomic unit.
      • Properties:
        • from (Mandatory, Date) - Expresses the date that this detail was created.
        • to (Date) - Expresses the date until this detail was the current version. If doesn't exist then its the current detail.
        • version (Mandatory, Integer) - Number of the version of this detail.

TaxonDetails

  • Properties:
    • description (String) - Taxonomic unit description.

Locus

  • Properties:
    • id (Unique, Mandatory, String) - Locus identifier.
    • deprecated (Mandatory, Boolean) - Deprecated flag.
  • Relations:
    • CONTAINS (To: Allele) - Expresses which alleles compose this taxonomic unit.
    • CONTAINS_DETAILS (To: LocusDetails) - Expresses which are the properties, by version, of this locus.
      • Properties:
        • from (Mandatory, Date) - Expresses the date that this detail was created.
        • to (Date) - Expresses the date until this detail was the current version. If doesn't exist then its the current detail.
        • version (Mandatory, Integer) - Number of the version of this detail.

LocusDetails

  • Properties:
    • description (String) - Locus description.

Allele

  • Properties:
    • id (Unique, Mandatory, String) - Allele identifier.
    • deprecated (Mandatory, Boolean) - Deprecated flag.
  • Relations:
    • CONTAINS_DETAILS (To: AlleleDetails) - Expresses which are the properties, by version, of this allele.
      • Properties:
        • from (Mandatory, Date) - Expresses the date that this detail was created.
        • to (Date) - Expresses the date until this detail was the current version. If doesn't exist then its the current detail.
        • version (Mandatory, Integer) - Number of the version of this detail.

AlleleDetails

  • Properties:
    • sequence (String) - Allele sequence.

Schema

  • Properties:
    • id (Unique, Mandatory, String) - Schema identifier.
    • type (Unique, Mandatory, String) - Schema type.
    • deprecated (Mandatory, Boolean) - Deprecated flag.
  • Relations:
    • CONTAINS_DETAILS (To: SchemaDetails) - Expresses which are the properties, by version, of this schema.
      • Properties:
        • from (Mandatory, Date) - Expresses the date that this detail was created.
        • to (Date) - Expresses the date until this detail was the current version. If doesn't exist then it's the current detail.
        • version (Mandatory, Integer) - Number of the version of this detail.
  • Restrictions:
    • A schema must contain only loci of the same taxonomic unit.
    • Type must be 'mlst', 'mlva' or 'snp'.

SchemaDetails

  • Properties:
    • description (String) - Schema description.
  • Relations:
    • HAS (To: Locus) - Expresses which loci compose this schema.
      • Properties:
        • part (Mandatory, Integer) - Expresses the order of the locus in the schema.
        • version (Mandatory, Integer) - Expresses the version of the loci that composes this schema.

Dataset

  • Properties:
    • id (Unique, Mandatory, String) - Dataset identifier.
    • deprecated (Mandatory, Boolean) - Deprecated flag.
  • Relations:
    • CONTAINS_DETAILS (To: DatasetDetails) - Expresses which are the properties, by version, of this dataset.
      • Properties:
        • from (Mandatory, Date) - Expresses the date that this detail was created.
        • to (Date) - Expresses the date until this detail was the current version. If doesn't exist then it's the current detail.
        • version (Mandatory, Integer) - Number of the version of this detail.
    • CONTAINS (To: Profile) - Expresses which allelic profiles exist in this dataset.
    • CONTAINS (To: Isolate) - Expresses which isolates exist in this dataset.
  • Restrictions:
    • A dataset must follow one schema.

DatasetDetails

  • Properties:
    • name (String) - Dataset name.
  • Relations:
    • HAS (To: Schema) - Expresses which is the schema that this dataset follows.
      • Properties:
        • version (Mandatory, Integer) - Expresses the version of the schema that composes this dataset.

Profile

  • Properties:
    • id (Unique, Mandatory, String) - Allelic profile identifier.
    • deprecated (Mandatory, Boolean) - Deprecated flag.
  • Relations:
    • CONTAINS_DETAILS (To: ProfileDetails) - Expresses which are the properties, by version, of this profile.
      • Properties:
        • from (Mandatory, Date) - Expresses the date that this detail was created.
        • to (Date) - Expresses the date until this detail was the current version. If doesn't exist then it's the current detail.
        • version (Mandatory, Integer) - Number of the version of this detail.
    • HAS (To: Coordinate) - Expresses which coordinate this profile has for a given visualization graph.
      • Properties:
        • inferenceId(Mandatory, String) - Inference identifier
        • id(Mandatory, String) - Visualization identifier
        • algorithm (Mandatory, String) - Algorithms used name.
        • component(Mandatory, Integer) - Visualization component group.
        • deprecated (Mandatory, Boolean) - Deprecated flag.
    • DISTANCES (To: Profile) - Expresses the distance, inferred by an analysis, to another profile within a graph.
      • Properties:
        • id(Mandatory, String) - Inference identifier
        • algorithm (Mandatory, String) - Algorithms used name.
        • weight (Mandatory, Integer) - Weight value.
        • deprecated (Mandatory, Boolean) - Deprecated flag.
        • fromVersion (Mandatory, Integer) - Version of from profile.
        • toVersion (Mandatory, Integer) - Version of to profile.
  • Restrictions:
    • A profile can only have relationships with the alleles that belongs to the loci which composes the dataset that this profile belongs.
    • A profile can only have relationships with the alleles that are public or belong to the same project.

ProfileDetails

  • Properties:
    • aka (String) - Allelic profile name.
  • Relations:
    • HAS (To: Allele) - Expresses which alleles define this profile.
      • Properties:
        • version (Mandatory, Integer) - Expresses the version of the allele that composes this profile.
        • part (Mandatory, Integer) - Part of the schema that this allele defines.
        • total (Mandatory, Integer) - Total of alleles needed to compose the schema for a profile.

Isolate

  • Properties:
    • id (Unique, Mandatory, String) - Isolate identifier.
    • deprecated (Mandatory, Boolean) - Deprecated flag.
  • Relations:
    • CONTAINS_DETAILS (To: IsolateDetails) - Expresses which are the properties, by version, of this isolate.
      • Properties:
        • from (Mandatory, Date) - Expresses the date that this detail was created.
        • to (Date) - Expresses the date until this detail was the current version. If doesn't exist then it's the current detail.
        • version (Mandatory, Integer) - Number of the version of this detail.
  • Restrictions:
    • An isolate can only have one profile.

IsolateDetails

  • Properties:
    • description (String) - Isolate description.
  • Relations:
    • HAS (To: Ancillary) - Expresses which ancillary details this isolate has.
    • HAS (To: Profile) - Expresses which allelic profiles this isolate has.
      • Properties:
        • version (Mandatory, Integer) - Expresses the version of the profile that relates to this isolate.

Ancillary

  • Properties:
    • key (Mandatory, String) - Ancillary detail key.
    • value (Mandatory, String) - Ancillary detail value.
  • Restrictions:
    • The key and value attribute must be unique as a set.

Coordinate

  • Properties:
    • x (Mandatory, String) - X axis value.
    • y (Mandatory, String) - Y axis value.

Project

  • Properties:
    • id (Unique, Mandatory, String) - Project identifier.
  • Relations:
    • CONTAINS_DETAILS (To: ProjectDetails) - Expresses which are the properties, by version, of this project.
      • Properties:
        • from (Mandatory, Date) - Expresses the date that this detail was created.
        • to (Date) - Expresses the date until this detail was the current version. If doesn't exist then it's the current detail.
        • version (Mandatory, Integer) - Number of the version of this detail.
    • CONTAINS (To: Allele) - Expresses which alleles this project has.
    • CONTAINS (To: Dataset) - Expresses which datasets this project has.

ProjectDetails

  • Properties:
    • description (String) - Project description.
    • type (Mandatory, String) - Project type.
    • name (Mandatory, String) - Project name.
  • Relations:
    • HAS (To: User) - Expresses which users have access to this project.
  • Restrictions:
    • Type must be 'public' or 'private'.

User

  • Properties:
    • id (Unique, Mandatory, UUID) - User identifier.
    • Provider (Mandatory, String) - User identity provider.
  • Relations:
    • CONTAINS_DETAILS (To: UserDetails) - Expresses which are the properties, by version, of this user.
      • Properties:
        • from (Mandatory, Date) - Expresses the date that this detail was created.
        • to (Date) - Expresses the date until this detail was the current version. If doesn't exist then it's the current detail.
        • version (Mandatory, Integer) - Number of the version of this detail.

UserDetails

  • Properties:
    • role (String) - User role.
  • Restrictions:
    • role must be either 'user' or 'admin'.

[1] J. Almeida, J. Tiple, M. Ramirez, J. Melo-Cristino, C. Vaz, A. P. Francisco, J. A. Carriço, An Ontology and a REST API for Sequence Based Microbial Typing Data, in: A. T. Freitas, A. Navarro (Eds.), Bioinformatics for Personalized Medicine, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 21– 28. [2] C. Vaz, A. P. Francisco, M. Silva, K. A. Jolley, J. E. Bray, H. Pouseele, J. Rothganger, M. Ramirez, J. A. Carriço, Typon: the microbial typing ontology, Journal of Biomedical Semantics 5 (1) (2014) 43. doi:10.1186/2041-1480-5-43.

Clone this wiki locally