Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ISNI #959

Open
coret opened this issue Apr 25, 2023 · 8 comments
Open

Add ISNI #959

coret opened this issue Apr 25, 2023 · 8 comments

Comments

@coret
Copy link
Contributor

coret commented Apr 25, 2023

ISNI (International Standard Name Identifier) is an ISO standard, in use by numerous libraries, publishers, databases, and rights management organizations around the world. It is used to uniquely identify persons and organizations involved in creative activities, as well as public personas of both, such as pseudonyms, stage names, record labels or publishing imprints.

The ISNI isn't available via a SPARQL-endpoint, but the ISNI organizations and ISNI persons are available for download (under a CC0 license) in a RDF format via https://isni.org/page/linked-data/. The files are updated each 6 months. As such, the NDE could make a ISNI SPARQL-endpoint available like was done for Geonames NL/BE (and aas this is RDF already, no transformations are needed).

When this ISNI SPARQL-endpoint is available, the ISNI can be added to the Network of Terms by making the appropriate configs and CONSTRUCT queries.

@rschalkrce
Copy link
Contributor

Good suggestion! Can NDE host the triples?

@coret
Copy link
Contributor Author

coret commented Apr 25, 2023

In a technical sense: yes (just like https://demo.netwerkdigitaalerfgoed.nl/geonames/sparql for Geonames NL/BE).
In organisational sense: @EnnoMeijers ?

@EnnoMeijers
Copy link
Contributor

I suggest to publish the ISNI data through the sparql endpoint in a silent manner, like we do with the Geonames endpoint. The main purpose is support search queries by the NoT and not becoming the ISNI sparql endpoint for the rest of the world. By just putting it up and not advertising it we will be ok I think. If we do get too much attention we will talk to OCLC to discuss running a sparql endpoint through their infrastructure.

@rschalkrce
Copy link
Contributor

@ddeboer maybe we can use this case to demonstrate me how to add a term list?

@ddeboer
Copy link
Member

ddeboer commented Apr 25, 2023

@rschalkrce Sure! Please note that setting up a SPARQL endpoint is not a regular requirement: usually that should be provided by the dataset publisher.

@rschalkrce
Copy link
Contributor

rschalkrce commented May 1, 2023 via email

@coret
Copy link
Contributor Author

coret commented May 1, 2023

The provided ISNI RDF/XML and JSON-LD cannot be directly loaded into a triplestore, as @EnnoMeijers found out.

When processing with Apache Jena's riot there are a lot of errors. As a work-around the data can be converted to N-triples with rapper:


$ rapper -i rdfxml ISNI_organizations.rdf -o ntriples > ISNI_organizations.nt
rapper: Parsing URI file:///home/http/netwerk-digitaal-erfgoed/isni/ISNI_organizations.rdf with parser rdfxml
rapper: Serializing with serializer ntriples
rapper: Warning - URI file:///home/http/netwerk-digitaal-erfgoed/isni/ISNI_organizations.rdf:2 - Using node element 'catalog' without a namespace is forbidden.
rapper: Parsing returned 25.655.560 triples (2.9Gb)

$ rapper -i rdfxml ISNI_persons.rdf -o ntriples > ISNI_persons.nt
rapper: Parsing URI file:///home/http/netwerk-digitaal-erfgoed/isni/ISNI_persons.rdf with parser rdfxml
rapper: Serializing with serializer ntriples
rapper: Warning - URI file:///home/http/netwerk-digitaal-erfgoed/isni/ISNI_persons.rdf:2 - Using node element 'catalog' without a namespace is forbidden.
rapper: Parsing returned 205.547.991 triples (21.9Gb)

Validating the resulting ISNI_persons.nt file shows the following:

$ riot --validate ISNI_persons.nt
18:36:32 WARN  riot            :: [line: 2060478, col: 1 ] Bad IRI: <https://isni.org/isni/0000 0000 3355 254X> Spaces are not legal in URIs/IRIs.
18:36:32 WARN  riot            :: [line: 2060479, col: 1 ] Bad IRI: <https://isni.org/isni/0000 0000 3355 254X> Spaces are not legal in URIs/IRIs.
18:36:32 WARN  riot            :: [line: 2060480, col: 1 ] Bad IRI: <https://isni.org/isni/0000 0000 3355 254X> Spaces are not legal in URIs/IRIs.
18:36:33 WARN  riot            :: [line: 2861764, col: 1 ] Bad IRI: <https://isni.org/isni/0000 0000 8382 8137> Spaces are not legal in URIs/IRIs.
18:36:33 WARN  riot            :: [line: 2861765, col: 1 ] Bad IRI: <https://isni.org/isni/0000 0000 8382 8137> Spaces are not legal in URIs/IRIs.
18:36:33 WARN  riot            :: [line: 2861766, col: 1 ] Bad IRI: <https://isni.org/isni/0000 0000 8382 8137> Spaces are not legal in URIs/IRIs.
18:36:36 WARN  riot            :: [line: 3994194, col: 1 ] Bad IRI: <https://isni.org/isni/0000 0000 3590 8896> Spaces are not legal in URIs/IRIs.
18:36:36 WARN  riot            :: [line: 3994195, col: 1 ] Bad IRI: <https://isni.org/isni/0000 0000 3590 8896> Spaces are not legal in URIs/IRIs.
18:36:36 WARN  riot            :: [line: 3994196, col: 1 ] Bad IRI: <https://isni.org/isni/0000 0000 3590 8896> Spaces are not legal in URIs/IRIs.
.....

Thesse problem can be countered with the following 2 commands:

$ sed -i 's/\(\/isni\/[0-9]\{4\}\) \([0-9]\{4\}\) \([0-9]\{4\}\) \([0-9X]\{4\}\)/\1\2\3\4/g' ISNI_persons.nt
$ sed -i 's/\(\/isni\/[0-9]\{4\}\)\\u0020\([0-9]\{4\}\)\\u0020\([0-9]\{4\}\)\\u0020\([0-9X]\{4\}\)/\1\2\3\4/g' ISNI_persons.nt

But a checkup of a sample of the records showed another issue, the RDF seems incomplete!
This can also been seen on https://isni.org/isni/0000000395807810 : click one of two download links and you can observe in the data that one name is missing ("Coret, Wilhelmus Johannes Hendricus").

image

Finally, the provenance of the data is weak. It's just a literal "NTA". You'd expect a link to http://data.bibliotheken.nl/id/thes/p321693566 (with example https://isni.org/isni/0000000395807810).

Before continuing with adding ISNI to the Network of Terms, I'll contact ISNI about these issues.

@coret
Copy link
Contributor Author

coret commented May 1, 2023

Note: install the latest version or Apache Jena to get rid of a lot of errors during validation/conversion via riot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants