Which means Concurrent Versioning of knowledge Graphs
This project aims to create a knowledge hub that can store and query a set of RDF datasets with a versioning system. The project is part of the BD team's research efforts within the LIRIS and VCity project. The aim of this POC is to query a set of city version and extract associated knowledge.
This system has a demonstration and its code source is available on GitHub.
Our motivation is to find a method for retrieving knowledge from a set of urban data versions stored in RDF format.
Motivations for linking SPARQL and SQL are numerous, particularly in the fields of science, technology, and business, where there is a growing need to integrate increasingly diverse data sources (captors, institutions, ...). By using a SPARQL to SQL translator, we can enable relational databases to be exposed on the Semantic Web and queried with SPARQL (with the same performance as with SQL?). This allows researchers and developers to work with RDF and relational data seamlessly and efficiently while leveraging the performance optimizations of existing relational databases.
A "from scratch" engine that is not based on SPARQL nor SQL would not be interoperable with these query systems. It is more simple than reimplementing the full stack (algebra, including join algorithms, optimisation, efficient storage and indexing) and because we think that performance will be comparable to a dedicated implementation.
We want to ensure the provenance, accuracy, efficiency, and reliability of querying a condensed representation of the various version of a dataset with regard to querying each different version represented extensionally, that is understanding whether our proposal of adding to each quad the set of versions it appears in, instead of representing each version as a separate dataset, leads to a more efficient way of answering queries across versions. A query-scenario of this experiment can be : "Which city version has the highest number of trees in the district 1?"
Using a SQL as a backend for SPARQL has been done in some cases.
- A Mapping of SPARQL Onto Conventional SQL - W3C This paper discusses a semantics for expressing relational data as an RDF graph and an algebra for mapping SPARQL SELECT queries over that RDF to SQL queries over the original relational data. The goal is to provide a specification for SPARQL tool vendors and a foundation for the Semantic Web. It highlights the importance of creating a computable mapping from SPARQL semantics to SQL semantics.
-
-
This project uses Java 21 JDK + Maven and a dockerized (make sure that Docker is installed too) PostgreSQL 17 database. If you don't have Java 21 installed by default, I recommend that you install SDKMAN! and use this tool to set Java 21 as current session version.
SDKMAN! is a tool for managing parallel versions of multiple Software Development Kits on most Unix based systems.
Once you have SDKMAN!
installed, run:
sdk install java 21.0.1-amzn
sdk use java 21.0.1-amzn
Make sure you have Maven installed. If you don't have Maven installed, run: sudo apt install maven
.
This project:
- uses the
jena-arq
library for parsing SPARQL statements in Java, - uses the
springdoc-openapi-starter-webmvc-ui
library to parse the Swagger API annotations and displays the swagger-ui, - needs a PostgreSQL 17 database, so the
postgresql
driver is installed too.
This project has been tested with:
sonarqube
, assuring the code quality,JaCoCo
, testing the code coverage.
This project:
- uses the
jena-fuseki-server
Apache Jena Fuseki is a SPARQL server, - needs a PostgreSQL 17 database if you use this target language, so
the
postgresql
driver is installed too.
This project has been tested with: junit-jupiter-engine
# at the root of the project
# starts the database declared inside the docker-compose.yml file
docker compose up -d
# if you want to hack the import program
cd quads-loader
## wait until the PostgreSQL database is up
## starts the Java Spring application locally (http://localhost:8080/)
java "-DDATASOURCE_URL=<url>" "-DDATASOURCE_USERNAME=<username>" "-DDATASOURCE_PASSWORD=<password>" -jar target/quads-loader-0.0.1-SNAPSHOT.jar
# at the root of the project
# starts the database declared inside the docker-compose.yml file
docker compose up -d
# if you want to hack the import program
cd quads-query
## wait until the PostgreSQL database is up
# build the project
mvn package
## starts the Java Spring application locally (http://localhost:8081/)
java "-DDATASOURCE_URL=<url>" "-DDATASOURCE_USERNAME=<username>" "-DDATASOURCE_PASSWORD=<password>" ?"-DTARGET_LANG=<target language>" ?"-DCONDENSED_MODE=<boolean>" -jar quads-query-1.0-SNAPSHOT-jar-with-dependencies.jar
erDiagram
VersionedQuad |{--|{ Version: "bitstring index"
VersionedQuad ||--|{ VersionedNamedGraph: "(named_graph, bitstring index)"
Version ||--|{ VersionedNamedGraph: "index"
VersionedQuad {
text subject
text predicate
text object
text named_graph
bitstring validity
}
VersionedNamedGraph {
text versioned_named_graph
int index_version
text named_graph
}
Version {
int index_version
text message
timestamptz transaction_time_start
timestamptz transaction_time_end
}
Metadata {
text subject
text predicate
text object
}
erDiagram
VersionedQuad ||--|{ ResourceOrLiteral: "subject"
VersionedQuad ||--|{ ResourceOrLiteral: "object"
VersionedQuad ||--|{ ResourceOrLiteral: "predicate"
VersionedQuad ||--|{ ResourceOrLiteral: "named graph"
VersionedNamedGraph ||--|{ ResourceOrLiteral: "named graph"
VersionedNamedGraph ||--|{ ResourceOrLiteral: "versioned named graph"
Metadata }|--|{ ResourceOrLiteral: "subject"
Metadata }|--|{ ResourceOrLiteral: "object"
Metadata }|--|{ ResourceOrLiteral: "predicate"
VersionedQuad ||--|{ VersionedNamedGraph: "foreign key"
VersionedQuad {
int id_subject PK, FK
int id_predicate PK, FK
int id_object PK, FK
int id_named_graph FK
bitstring validity
}
VersionedNamedGraph {
int id_versioned_named_graph PK, FK
int index_version
int id_named_graph FK
}
ResourceOrLiteral {
int id_resource_or_literal PK, FK
text name
string type "Not null if literal"
}
Version {
int index_version "PK, (FK)"
text message
timestamptz transaction_time_start
timestamptz transaction_time_end
}
Metadata {
int id_subject PK, FK
int id_predicate PK, FK
int id_object PK, FK
}
flowchart BT
CS[Computer Scientist] -->|Sends the SPARQL query to the endpoint| SE
SE -->|Sends the quads to the Computer Scientist| CS
subgraph Server
SE -->|Sends the SPARQL query for translation| ARQ[SPARQL to SQL translator]
ARQ -->|Sends the SQL translated query to JDBC| JDBC[Java Database Connectivity]
JDBC -->|The filtered quads| ARQ
ARQ -->|The filtered quads| SE
end
subgraph Database
JDBC -->|Sends the SQL query to the database| DB[PostgreSQL]
DB -->|Sends the result of the SQL query| JDBC
end
flowchart TB
CS[Computer Scientist] -->|Sends the files to the import endpoint| SE
SE -->|Returns the version number via HTTP| CS
subgraph Server
SE -->|Sends files to import| RIOT[Jena RIOT]
RIOT -->|Send the quads for insertion| JDBC[Java Database Connectivity]
JDBC -->|Sends the version number| SE
end
subgraph Database
JDBC -->|Sends the SQL query to the database| DB[PostgreSQL]
DB -->|Sends the version information| JDBC
end
The API description is available on the swagger-ui at runtime.
# make sure your database is up
# starts the tests
mvn spring-boot:run test
The code coverage and quality is available on the Sonarqube server after running a sonar inspection.
This project has been tested with a dataset created by the UD-Graph Project. This dataset as been transformed to be compatible with the designed conceptual model.
sequenceDiagram
title Transformation, Import and query workflow
autonumber
participant BSBM
participant Annotation
System ->>+ BSBM: Ask for a set of versions
BSBM ->>- System: Generate a set of versions
loop For each generated version
System ->>+ Annotation: Send the versionable data to annotate
Annotation ->>- System: The annotated data with the version index
System ->>+ Annotation: Send the versionable data to annotate
Annotation ->>- System: The annotated data with the graph name
end
participant Triple store
loop For each Annotated version
System ->>+ QuaDer: Sends the version to import
QuaDer ->>+ Database: Inserts the version
Database ->>- QuaDer: Returns the insert status
QuaDer ->>- System: Returns the version index
System ->>+ Triple store: Sends the version to import
Triple store ->>- System: Returns the insert status
end
System ->>+ Triple store: Sends the theoretical annotations to import
Triple store ->>- System: Returns the insert status
box QuaQue
participant SPARQL-SQL translator
participant SPARQL API
end
actor User client
User client ->>+ SPARQL API: Sends a SPARQL query
SPARQL API ->>+ SPARQL-SQL translator: Translates the SPARQL query
SPARQL-SQL translator ->>- SPARQL API: Returns the SQL query
SPARQL API ->>+ Database: Sends the SQL query
Database ->>- SPARQL API: Returns the queried result
SPARQL API ->>- User client: Returns the result
Before importing the dataset inside the triple store and the relational database, we transform the data to match the theoretical model and the implementation.
We add a quad for each triple (the graph name). Its semantic is the link between the triple and the source of the data. The transformation has been made with the annotate python program. We used a virtual environment with pip 23.3.1 from Python 3.10.12.
# create a virtual environment
python3 -m venv venv
# activate the virtual environment
source venv/bin/activate
# install the dependencies
pip install -r python/requirements.txt
# run the program
cd workflows
/bin/bash workflow-bsbm.sh 2 7500 1000 10 > allout.txt 2>&1
# in another terminal
cd workflows
tail -f allout.txt
Let's assume that we have a dataset with 2 versions with the following quads:
Version 1 (buildings-2015.trig):
Subject | Predicate | Object | Named Graph |
---|---|---|---|
http://example.edu/Building#1 | height | 10.5 | http://example.edu/Named-Graph#Grand-Lyon |
http://example.edu/Building#2 | height | 9.1 | http://example.edu/Named-Graph#Grand-Lyon |
http://example.edu/Building#1 | height | 11 | http://example.edu/Named-Graph#IGN |
Version 2 (buildings-2018.trig):
Subject | Predicate | Object | Named Graph |
---|---|---|---|
http://example.edu/Building#1 | height | 10.5 | http://example.edu/Named-Graph#IGN |
http://example.edu/Building#1 | height | 10.5 | http://example.edu/Named-Graph#Grand-Lyon |
http://example.edu/Building#3 | height | 15 | http://example.edu/Named-Graph#Grand-Lyon |
After some transformations, we have the following quads representing the theoretical model:
After the import inside the relational database, we have the following quads representing the implementation:
Subject | Predicate | Object | Named Graph | Validity |
---|---|---|---|---|
http://example.edu/Building#1 | height | 10.5 | http://example.edu/Named-Graph#Grand-Lyon | 11 |
http://example.edu/Building#2 | height | 9.1 | http://example.edu/Named-Graph#Grand-Lyon | 10 |
http://example.edu/Building#1 | height | 11 | http://example.edu/Named-Graph#IGN | 10 |
http://example.edu/Building#1 | height | 10.5 | http://example.edu/Named-Graph#IGN | 01 |
http://example.edu/Building#3 | height | 15 | http://example.edu/Named-Graph#Grand-Lyon | 01 |