-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge commit '89cb8ab2623eb3e7fad06c1eda45ffa5a0d950ba' as 'semantic_…
…standardization'
- Loading branch information
Showing
55 changed files
with
136,506 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
|
||
.idea | ||
.DS_Store | ||
.history | ||
|
||
# Eclipse # | ||
.classpath | ||
.project | ||
.settings/ | ||
target | ||
|
||
|
||
# Play! # | ||
logs | ||
.swagger-codegen-ignore | ||
client/.gitignore | ||
client/.swagger-codegen-ignore | ||
client/build.gradle | ||
client/build.sbt | ||
client/git_push.sh | ||
client/gradle.properties | ||
client/gradle/ | ||
client/gradlew | ||
client/gradlew.bat | ||
client/pom.xml | ||
client/settings.gradle | ||
client/src/ | ||
|
||
# Play! # | ||
bin/ | ||
/db | ||
.eclipse | ||
/lib/ | ||
/logs/ | ||
/modules | ||
/project/project | ||
/project/target | ||
/target | ||
tmp/ | ||
test-result | ||
server.pid | ||
*.eml | ||
#/dist/ | ||
.cache | ||
.cache-main | ||
.cache-tests | ||
|
||
# extra # | ||
NO__lib | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,275 @@ | ||
|
||
semantic_standardization | ||
========================== | ||
|
||
This project is currently a POC exploring the standardization of terms using the vocabulary `Istat-Classificazione-08-Territorio` and the ontology `CLV-AP_IT`. | ||
Currently the component are designed to use an in-memory storage of only those ontology and vocabulary, but the component can be extended to act in a similar way for different use cases and ontology/vocabulary couples. | ||
|
||
Two endpoints are provided: | ||
|
||
1. the first one retrieves a flat representation of a vocabulary (conceptually similar to a CSV, but in JSON), using an ad-hoc SPARQL query. | ||
2. the second on expose a list of properties actually used in a vocabulary from an ontology, returning the "local" hierarchy for each property. | ||
|
||
The idea is that each endpoint (and its configured queries) acts for a very specific domain, so the next versions could introduce new vocabularies and ontologies, but needs to create ad-hoc SPARQL queries for retrieving the informations needed. | ||
|
||
## semantic annotation in DAF ingestion | ||
|
||
The [DAF](https://github.com/italia/daf) `semantic_annotation` has currently the following structure: `{ontology}.{concept}.{property}`. | ||
During the ingestion phase of datasets in DAF platform a `semantic_annotation` is used, in order to relate some column of a dataset to the most appropriate property of a given existing concept, from the controlled vocabularies. | ||
|
||
**Note** that while the annotation is used to relate cells with vocabularies, it does not save explicitly a reference to the vocabularies used. A reference to concept from an ontology is used instead. | ||
|
||
|
||
## examples | ||
|
||
|
||
### example: sequence of calls | ||
|
||
1. retrieves (vocabulary,ontology) reference from semantic_annotation tag | ||
``` | ||
curl -X GET http://localhost:9000/kb/v1/daf/annotation/lookup?semantic_annotation=POI-AP_IT.PointOfInterestCategory.POIcategoryIdentifier -H "accept: application/json" -H "content-type: application/json" | ||
``` | ||
|
||
2. retrieves the hierarchies for a given property | ||
``` | ||
curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name=POICategoryClassification&ontology_name=poiapit&lang=it -H "accept: application/json" -H "content-type: application/json" | ||
``` | ||
|
||
3. retrieves the dataset values for a certain vocaulary | ||
``` | ||
curl -X GET http://localhost:9000/kb/v1/vocabularies/POICategoryClassification?lang=it -H "accept: application/json" -H "content-type: application/json" | ||
``` | ||
|
||
---- | ||
|
||
### example: retrieves informations from the semantic_annotation tag | ||
With this endpoint we can retrieve informations about the vocabulary/ontology pair related to a given `semantic_annotation` tag: | ||
|
||
``` | ||
curl -X GET http://localhost:9000/kb/v1/daf/annotation/lookup?semantic_annotation={semantic_annotation} \ | ||
-H "accept: application/json" -H "content-type: application/json" | ||
``` | ||
|
||
for example, for the Point Of Interest vocabulary: | ||
|
||
``` | ||
curl -X GET 'http://localhost:9000/kb/v1/daf/annotation/lookup?semantic_annotation=POI-AP_IT.PointOfInterestCategory.POIcategoryIdentifier' \ | ||
-H "accept: application/json" -H "content-type: application/json" | ||
``` | ||
|
||
This will return a datastructure similar to the following one for each tag: | ||
|
||
``` | ||
[ | ||
{ | ||
"vocabulary_id": "POICategoryClassification", | ||
"vocabulary": "http://dati.gov.it/onto/controlledvocabulary/POICategoryClassification", | ||
"ontology": "http://dati.gov.it/onto/poiapit", | ||
"semantic_annotation": "POI-AP_IT.PointOfInterestCategory.POIcategoryIdentifier", | ||
"property_id": "POIcategoryIdentifier", | ||
"concept_id": "PointOfInterestCategory", | ||
"ontology_prefix": "poiapit", | ||
"ontology_id": "POI-AP_IT", | ||
"concept": "http://dati.gov.it/onto/poiapit#PointOfInterestCategory", | ||
"property": "http://dati.gov.it/onto/poiapit#POIcategoryIdentifier" | ||
} | ||
] | ||
``` | ||
|
||
the idea is to be able to have as much informations as possible to eventually relate the annotation to ontologies and vocabularies. | ||
|
||
|
||
### example: retrieving a vocabulary dataset | ||
|
||
We can obtain a de-normalized, tabular version of the vocabulary `Istat-Classificazione-08-Territorio` using the curl call: | ||
|
||
``` | ||
curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name={vocabulary_name}&ontology_name={ontology_prefix}&lang={lang} \ | ||
-H "accept: application/json" -H "content-type: application/json" | ||
``` | ||
|
||
A `SPARQL` query is used to create a proper tabular representation of the data. | ||
|
||
#### example: PontOfInterest / POI_AP-IT | ||
|
||
``` | ||
curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name=POICategoryClassification&ontology_name=poiapit&lang=it -H "accept: application/json" -H "content-type: application/json" | ||
``` | ||
|
||
this will return a data structure: | ||
|
||
``` | ||
[ | ||
{ | ||
"vocabulary": "POI-AP_IT", | ||
"path": "POI-AP_IT.PointOfInterestCategory.definition", | ||
"hierarchy_flat": "PointOfInterestCategory", | ||
"hierarchy": [ | ||
{ | ||
"class": "PointOfInterestCategory", | ||
"level": 0 | ||
} | ||
] | ||
}, | ||
... | ||
] | ||
``` | ||
|
||
|
||
#### example: Luoghi Istat / CLV_AP-IT | ||
``` | ||
$ curl -X GET "http://localhost:9000/kb/v1/vocabularies/Istat-Classificazione-08-Territorio?lang=it" -H "accept: application/json" -H "content-type: application/json" | ||
``` | ||
this will return a result structure similar to the following one: | ||
|
||
``` | ||
[ | ||
[ | ||
{ "key": "CLV-AP_IT_Country_name", "value": "Italia"}, | ||
{"key": "CLV-AP_IT_City_name", "value": "Abano Terme"}, | ||
{"key": "CLV-AP_IT_Province_name", "value": "Padova"}, | ||
{"key": "CLV-AP_IT_Region_name", "value": "Veneto"} | ||
], | ||
[ | ||
{"key":"CLV-AP_IT_Province_name", "value": "Lodi"}, | ||
{"key":"CLV-AP_IT_City_name", "value": "Abbadia Cerreto"}, | ||
{"key": "CLV-AP_IT_Country_name", "value": "Italia"}, | ||
{"key": "CLV-AP_IT_Region_name", "value": "Lombardia"} | ||
] | ||
... | ||
] | ||
``` | ||
|
||
For technical reason, currently a value of `CLV-AP_IT_Region_name` is used in place of `CLV-AP_IT.Region.name`. | ||
|
||
### example: retrieve the hierarchies for the properties used | ||
|
||
If we have the example vocabulary `Istat-Classificazione-08-Territorio`, which uses terms from the ontology `clvapit`, we can retrieve the local hierarchy associated to each property with the curl command: | ||
|
||
``` | ||
$ curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name={vocabulary_name}&ontology_name={ontology_prefix}&lang={lang} \ | ||
-H "accept: application/json" -H "content-type: application/json" | ||
``` | ||
|
||
#### example: POI / POI_AP-IT | ||
|
||
``` | ||
curl -X GET http://localhost:9000/kb/v1/vocabularies/POICategoryClassification?lang=it \ | ||
-H "accept: application/json" -H "content-type: application/json" | ||
``` | ||
|
||
which will return results: | ||
|
||
``` | ||
[ | ||
[ | ||
{ | ||
"key": "POI-AP_IT_PointOfInterestCategory_definition", | ||
"value": "Rientrano in questa categoria tutti i punti di interesse connessi all'intrattenimento come zoo, discoteche, pub, teatri, acquari, stadi, casino, parchi divertimenti, ecc." | ||
}, | ||
{ | ||
"key": "POI-AP_IT_PointOfInterestCategory_POICategoryName", | ||
"value": "Settore intrattenimento" | ||
}, | ||
{ | ||
"key": "POI-AP_IT_PointOfInterestCategory_POICategoryIdentifier", | ||
"value": "cat_1" | ||
} | ||
], | ||
... | ||
] | ||
``` | ||
|
||
|
||
#### example: Luoghi Istat / CLV_AP-IT | ||
|
||
``` | ||
$ curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name=Istat-Classificazione-08-Territorio&ontology_name=clvapit&lang=it \ | ||
-H "accept: application/json" -H "content-type: application/json" | ||
``` | ||
|
||
which will return the results: | ||
|
||
``` | ||
[ | ||
{ | ||
"vocabulary": "CLV-AP_IT", | ||
"path": "CLV-AP_IT.Country.name", | ||
"hierarchy_flat": "Country", | ||
"hierarchy": "hierarchy" | ||
}, | ||
{ | ||
"vocabulary": "CLV-AP_IT", | ||
"path": "CLV-AP_IT.City.name", | ||
"hierarchy_flat": "Country.Region.Province.City", | ||
"hierarchy": "hierarchy" | ||
} | ||
... | ||
] | ||
``` | ||
|
||
|
||
### example configurations | ||
|
||
An example configuration for working with a vocabulary (VocabularyAPI): | ||
|
||
``` | ||
"data_dir": "./data" | ||
"Istat-Classificazione-08-Territorio" { | ||
vocabulary.name: "Istat-Classificazione-08-Territorio" | ||
vocabulary.ontology.name: "CLV-AP_IT" | ||
vocabulary.ontology.prefix: "clvapit" | ||
vocabulary.file: ${data_dir}"/vocabularies/Istat-Classificazione-08-Territorio.ttl" | ||
vocabulary.contexts: [ "http://dati.gov.it/onto/clvapit#" ] | ||
vocabulary.query.csv: ${data_dir}"/vocabularies/Istat-Classificazione-08-Territorio#dataset.csv.sparql" | ||
} | ||
``` | ||
|
||
The `vocabulary.query.csv` is a reference to a SPARQL query designed to produce a flat representation of the vocabulary informations. | ||
|
||
|
||
An example configuration for working with an ontology (OntologyAPI) could be similar to the following one: | ||
|
||
``` | ||
clvapit { | ||
ontology.name: "CLV-AP_IT" | ||
ontology.prefix: "clvapit" | ||
ontology.file: ${data_dir}"/ontologies/agid/CLV-AP_IT/CLV-AP_IT.ttl" | ||
ontology.contexts: [ "http://dati.gov.it/onto/clvapit#" ] | ||
ontology.query.hierarchy: ${data_dir}"/ontologies/agid/CLV-AP_IT/CLV-AP_IT.hierarchy.sparql" | ||
} | ||
``` | ||
The `ontology.query.hierarchy` is a reference to a SPARQL query designed to produce a flat representation of the vocabulary informations. | ||
|
||
|
||
* * * | ||
|
||
**Note** that the `${data_dir}` can be replaced with a specific root path on disk: at this stage of the development this will be a relative folder (for example: `/dist/data` for the sbt project). | ||
|
||
Eventually the idea of pre-loading ontologies and vocabularies from disk can be replaced with the import from a central datastore (dedicated maintain the last version of ontologies), where they are already loaded under conventional paths/names. This way we will be able to switch from an in-memory tiny repository (one for each ontology/vocabulary) to a central RDF/SPARQL repository, containing all the pre-loaded ontologies and vocabulariesl. | ||
|
||
|
||
---- | ||
|
||
## TODO | ||
|
||
+ more documentation / comments | ||
+ more proper tests | ||
+ remove redundant classes for RDFRepository, importing external kb-core dependency, instead | ||
|
||
|
||
## known ISSUES | ||
|
||
... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
/* | ||
* Copyright 2017 TEAM PER LA TRASFORMAZIONE DIGITALE | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
import javax.inject._ | ||
|
||
import play.api.http.DefaultHttpErrorHandler | ||
import play.api._ | ||
import play.api.mvc._ | ||
import play.api.mvc.Results._ | ||
import play.api.routing.Router | ||
|
||
import scala.concurrent.Future | ||
|
||
import de.zalando.play.controllers.PlayBodyParsing | ||
|
||
/** | ||
* The purpose of this ErrorHandler is to override default play's error reporting with application/json content type. | ||
*/ | ||
class ErrorHandler @Inject() ( | ||
env: Environment, | ||
config: Configuration, | ||
sourceMapper: OptionalSourceMapper, | ||
router: Provider[Router] | ||
) extends DefaultHttpErrorHandler(env, config, sourceMapper, router) { | ||
|
||
private def contentType(request: RequestHeader): String = | ||
request.acceptedTypes.map(_.toString).filterNot(_ == "text/html").headOption.getOrElse("application/json") | ||
|
||
override def onProdServerError(request: RequestHeader, exception: UsefulException) = { | ||
implicit val writer = PlayBodyParsing.anyToWritable[Throwable](contentType(request)) | ||
Future.successful(InternalServerError(exception)) | ||
} | ||
|
||
// called when a route is found, but it was not possible to bind the request parameters | ||
override def onBadRequest(request: RequestHeader, error: String): Future[Result] = { | ||
implicit val writer = PlayBodyParsing.anyToWritable[String](contentType(request)) | ||
Future.successful(BadRequest("Bad Request: " + error)) | ||
} | ||
|
||
// 404 - page not found error | ||
override def onNotFound(request: RequestHeader, message: String): Future[Result] = { | ||
implicit val writer = PlayBodyParsing.anyToWritable[String](contentType(request)) | ||
Future.successful(NotFound(request.path)) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
/** | ||
* Created by ale on 06/06/17. | ||
*/ | ||
import javax.inject.Inject | ||
|
||
import play.api.http.DefaultHttpFilters | ||
import play.filters.cors.CORSFilter | ||
|
||
class Filters @Inject() (corsFilter: CORSFilter) | ||
extends DefaultHttpFilters(corsFilter) |
Oops, something went wrong.