Content-encoding SPARQL query (België) #772

coret · 2022-10-21T08:51:59Z

When searching for België in the GTAA no results are given, whilst searching for Belgie has among othersBelgië as result.

Testing by @wmelder showed the following:

The query for België via the construct_gtaa.rq query run via
curl -H Accept:text/turtle --data-urlencode "query@queries/construct_gtaa.rq" 'https://{username}:{password}@gtaa.apis.beeldengeluid.nl/sparql'
yields no results, but
curl -H "Content-type: application/x-www-form-urlencoded; charset=utf-8" -H Accept:text/turtle --data-urlencode "query@queries/construct_gtaa.rq" 'https://{username}:{password}@gtaa.apis.beeldengeluid.nl/sparql'
does give results!

It seems the Comunica client (Network of Terms) sends UTF-8, but doesn't include a character encoding header, so server-side it's considered US-ASCII (ISO-8859-1).

Should / can the charset be part of the dataset description of the GTAA within the Network of Terms (client-side solution). Of, should a default charset (utf-8) be hardcoded in the Comunica call with the option to override via de dataset description?

Some other searches which have problems with searching for terms with diacritics: Ampèrestraat (Adamlink) and Curaçaostraat (Gouda Tijdmachine). Haven't checked if adding a charset helps with these sources.

Some other search which do not have a problem with searching for terms with diacritics: Eichstätt (WO2 thesaurus), Galileïsche (AAT), Henriëtte (RKDartists)

The text was updated successfully, but these errors were encountered:

wmelder · 2022-10-21T09:00:38Z

Should / can the charset be part of the dataset description of the GTAA within the Network of Terms (client-side solution). Of, should a default charset (utf-8) be hardcoded in the Comunica call with the option to override via de dataset description?

Adding a hardcoded charset in the HTTP header would suffice. Otherwise, the receiving server doesn't know what type encoding is sent.

wmelder · 2022-10-21T09:12:48Z

https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type

wmelder · 2022-10-21T15:26:06Z

Adding a hardcoded charset in the HTTP header would suffice. Otherwise, the receiving server doesn't know what type encoding is sent.

On second thoughts... what if the server doesn't handle the charset properly? Or doesn't have an UTF-8 default encoding? Then it would be nice if network of terms can provide a charset that the server will handle properly. In those cases a dataset parameter should be necessary.

ddeboer · 2022-10-25T10:16:30Z

What is construct_gtaa.rq and where can I find it?

wmelder · 2022-10-25T11:14:04Z

@ddeboer construct_gtaa.rq is basically the gtaa.rq query, but it may include VALUES for query and datasetUri, variables that are filled in from within the network of terms. To be able to use a test query file we renamed it. In itself not so exciting.

wmelder · 2022-10-25T11:15:31Z

Currently these are the contents of the file:

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX justskos: <http://justskos.org/ns/core#>
PREFIX text: <http://jena.apache.org/text#>

CONSTRUCT {
    ?uri a skos:Concept ;
        skos:prefLabel ?prefLabel ;
        skos:altLabel ?altLabel ;
        skos:hiddenLabel ?hiddenLabel ;
        skos:scopeNote ?scopeNote ;
        skos:broader ?broader_uri ;
        skos:narrower ?narrower_uri ;
        skos:related ?related_uri .
    ?broader_uri skos:prefLabel ?broader_prefLabel .
    ?narrower_uri skos:prefLabel ?narrower_prefLabel .
    ?related_uri skos:prefLabel ?related_prefLabel .
}
WHERE {
    VALUES ?query { "zelensky" }
    VALUES ?datasetUri {
        <http://data.beeldengeluid.nl/gtaa/Persoonsnamen>
        }
    ?uri text:query (skos:prefLabel skos:altLabel skos:hiddenLabel ?query) .
    ?uri skos:inScheme ?datasetUri ;
        justskos:status ?status .
    FILTER(?status IN ('approved', 'candidate'))

    OPTIONAL {
        ?uri skos:prefLabel ?prefLabel .
        FILTER(LANG(?prefLabel) = "nl" )
    }
    OPTIONAL {
        ?uri skos:altLabel ?altLabel .
        FILTER(LANG(?altLabel) = "nl")
    }
    OPTIONAL {
        ?uri skos:hiddenLabel ?hiddenLabel .
        FILTER(LANG(?hiddenLabel) = "nl")
    }
    OPTIONAL {
        ?uri skos:scopeNote ?scopeNote .
        FILTER(LANG(?scopeNote) = "nl")
    }
    OPTIONAL {
        ?uri skos:broader ?broader_uri .
        ?broader_uri skos:prefLabel ?broader_prefLabel .
        FILTER(LANG(?broader_prefLabel) = "nl")
    }
    OPTIONAL {
        ?uri skos:narrower ?narrower_uri .
        ?narrower_uri skos:prefLabel ?narrower_prefLabel .
        FILTER(LANG(?narrower_prefLabel) = "nl")
    }
    OPTIONAL {
        ?uri skos:related ?related_uri .
        ?related_uri skos:prefLabel ?related_prefLabel .
        FILTER(LANG(?related_prefLabel) = "nl")
    }
}
LIMIT 1000

wmelder · 2022-10-25T11:16:51Z

For this issue it should be modified a bit:

    VALUES ?query { "België" }
    VALUES ?datasetUri {
        <http://data.beeldengeluid.nl/gtaa/GeografischeNamen>
        }

ddeboer · 2022-10-25T13:06:45Z

For previous work on diacritics, see #426, netwerk-digitaal-erfgoed/network-of-terms-catalog#46 and netwerk-digitaal-erfgoed/network-of-terms-catalog#93. At least for Virtuoso sources (Adamlink), how diacritics are interpreted is out of our control.

wmelder · 2022-10-25T14:32:40Z

In de sparql doc staat dat een POST met application/sparql-query altijd in UTF-8 is. Maar bij een POST met x-www-form-urlencoded staat dat er niet bij. Mogelijk beter om de application/sparql-query variant te gebruiken (met unescaped UTF-8 dus).

tip van onze ontwikkelaars...

ddeboer self-assigned this Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content-encoding SPARQL query (België) #772

Content-encoding SPARQL query (België) #772

coret commented Oct 21, 2022

wmelder commented Oct 21, 2022

wmelder commented Oct 21, 2022

wmelder commented Oct 21, 2022

ddeboer commented Oct 25, 2022

wmelder commented Oct 25, 2022

wmelder commented Oct 25, 2022

wmelder commented Oct 25, 2022

ddeboer commented Oct 25, 2022 •

edited

Loading

wmelder commented Oct 25, 2022

Content-encoding SPARQL query (België) #772

Content-encoding SPARQL query (België) #772

Comments

coret commented Oct 21, 2022

wmelder commented Oct 21, 2022

wmelder commented Oct 21, 2022

wmelder commented Oct 21, 2022

ddeboer commented Oct 25, 2022

wmelder commented Oct 25, 2022

wmelder commented Oct 25, 2022

wmelder commented Oct 25, 2022

ddeboer commented Oct 25, 2022 • edited Loading

wmelder commented Oct 25, 2022

ddeboer commented Oct 25, 2022 •

edited

Loading