-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content-encoding SPARQL query (België) #772
Comments
Adding a hardcoded charset in the HTTP header would suffice. Otherwise, the receiving server doesn't know what type encoding is sent. |
On second thoughts... what if the server doesn't handle the charset properly? Or doesn't have an UTF-8 default encoding? Then it would be nice if network of terms can provide a charset that the server will handle properly. In those cases a dataset parameter should be necessary. |
What is |
@ddeboer construct_gtaa.rq is basically the gtaa.rq query, but it may include VALUES for query and datasetUri, variables that are filled in from within the network of terms. To be able to use a test query file we renamed it. In itself not so exciting. |
Currently these are the contents of the file: PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX justskos: <http://justskos.org/ns/core#>
PREFIX text: <http://jena.apache.org/text#>
CONSTRUCT {
?uri a skos:Concept ;
skos:prefLabel ?prefLabel ;
skos:altLabel ?altLabel ;
skos:hiddenLabel ?hiddenLabel ;
skos:scopeNote ?scopeNote ;
skos:broader ?broader_uri ;
skos:narrower ?narrower_uri ;
skos:related ?related_uri .
?broader_uri skos:prefLabel ?broader_prefLabel .
?narrower_uri skos:prefLabel ?narrower_prefLabel .
?related_uri skos:prefLabel ?related_prefLabel .
}
WHERE {
VALUES ?query { "zelensky" }
VALUES ?datasetUri {
<http://data.beeldengeluid.nl/gtaa/Persoonsnamen>
}
?uri text:query (skos:prefLabel skos:altLabel skos:hiddenLabel ?query) .
?uri skos:inScheme ?datasetUri ;
justskos:status ?status .
FILTER(?status IN ('approved', 'candidate'))
OPTIONAL {
?uri skos:prefLabel ?prefLabel .
FILTER(LANG(?prefLabel) = "nl" )
}
OPTIONAL {
?uri skos:altLabel ?altLabel .
FILTER(LANG(?altLabel) = "nl")
}
OPTIONAL {
?uri skos:hiddenLabel ?hiddenLabel .
FILTER(LANG(?hiddenLabel) = "nl")
}
OPTIONAL {
?uri skos:scopeNote ?scopeNote .
FILTER(LANG(?scopeNote) = "nl")
}
OPTIONAL {
?uri skos:broader ?broader_uri .
?broader_uri skos:prefLabel ?broader_prefLabel .
FILTER(LANG(?broader_prefLabel) = "nl")
}
OPTIONAL {
?uri skos:narrower ?narrower_uri .
?narrower_uri skos:prefLabel ?narrower_prefLabel .
FILTER(LANG(?narrower_prefLabel) = "nl")
}
OPTIONAL {
?uri skos:related ?related_uri .
?related_uri skos:prefLabel ?related_prefLabel .
FILTER(LANG(?related_prefLabel) = "nl")
}
}
LIMIT 1000 |
For this issue it should be modified a bit: VALUES ?query { "België" }
VALUES ?datasetUri {
<http://data.beeldengeluid.nl/gtaa/GeografischeNamen>
} |
For previous work on diacritics, see #426, netwerk-digitaal-erfgoed/network-of-terms-catalog#46 and netwerk-digitaal-erfgoed/network-of-terms-catalog#93. At least for Virtuoso sources (Adamlink), how diacritics are interpreted is out of our control. |
tip van onze ontwikkelaars... |
When searching for België in the GTAA no results are given, whilst searching for Belgie has among othersBelgië as result.
Testing by @wmelder showed the following:
The query for België via the construct_gtaa.rq query run via
curl -H Accept:text/turtle --data-urlencode "query@queries/construct_gtaa.rq" 'https://{username}:{password}@gtaa.apis.beeldengeluid.nl/sparql'
yields no results, but
curl -H "Content-type: application/x-www-form-urlencoded; charset=utf-8" -H Accept:text/turtle --data-urlencode "query@queries/construct_gtaa.rq" 'https://{username}:{password}@gtaa.apis.beeldengeluid.nl/sparql'
does give results!
It seems the Comunica client (Network of Terms) sends UTF-8, but doesn't include a character encoding header, so server-side it's considered US-ASCII (ISO-8859-1).
Should / can the charset be part of the dataset description of the GTAA within the Network of Terms (client-side solution). Of, should a default charset (utf-8) be hardcoded in the Comunica call with the option to override via de dataset description?
Some other searches which have problems with searching for terms with diacritics: Ampèrestraat (Adamlink) and Curaçaostraat (Gouda Tijdmachine). Haven't checked if adding a charset helps with these sources.
Some other search which do not have a problem with searching for terms with diacritics: Eichstätt (WO2 thesaurus), Galileïsche (AAT), Henriëtte (RKDartists)
The text was updated successfully, but these errors were encountered: