Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move/add datacatalog to Dataset Register #795

Open
coret opened this issue Nov 7, 2022 · 5 comments
Open

Move/add datacatalog to Dataset Register #795

coret opened this issue Nov 7, 2022 · 5 comments

Comments

@coret
Copy link
Contributor

coret commented Nov 7, 2022

All thesauri/vocabularies should be findable in the Dataset Register.
Including information about a landingpage/contactpoint and license.

@ddeboer
Copy link
Member

ddeboer commented Nov 24, 2022

@coret Should we produce these dataset description ourselves (adding them to dataset-register-entries) or do you plan to approach (some of) the publishers themselves to produce the desciptions?

@coret
Copy link
Contributor Author

coret commented Nov 25, 2022

@ddeboer I think we should start with producing the dataset description ourselves.

For some, this means only adding some attributes and then register the Github raw version (within the Network of Terms repo or mayby move them to https://github.com/netwerk-digitaal-erfgoed/dataset-register-entries) with the Datasetregister.

Some of the datasets (like the Goudse straten) are already in the Dataset Register. These have been provided by the source, so these sources should be informed of the specific schema:potentialAction requirements and should be changed at the source.

For all Dutch term sources, we want the source to provide the datasetdescription themselves (something for the Dataset Register support team :-), but this will take some time...

How will the Network of Terms lookup term sources in the Dataset Register? Via a list of dataset ID's (maintained within the Network of Terms) or looking (read: SPARQL-ing) for a specific property (like schema:potentialAction or nde:networkOfTerms = true )? Do we need to update the dataset requirements for this? And finally, properties like schema:potentialAction (mapped to a DCAT property) aren't yet stored in our triplestore.

@ddeboer
Copy link
Member

ddeboer commented Nov 26, 2022

How will the Network of Terms lookup term sources in the Dataset Register? Via a list of dataset ID's (maintained within the Network of Terms) or looking (read: SPARQL-ing) for a specific property (like schema:potentialAction or nde:networkOfTerms = true )? Do we need to update the dataset requirements for this? And finally, properties like schema:potentialAction (mapped to a DCAT property) aren't yet stored in our triplestore.

I suggest to keep schema:potentialAction, the list of dataset URIs and the actual SPARQL queries in the Network of Terms (for now). We can do a federated query over that stripped down Network of Terms catalog + the dataset descriptions from the sources themselves, if necessary via dataset-register-entries.

@ddeboer
Copy link
Member

ddeboer commented Jun 26, 2023

Some of the datasets (like the Goudse straten) are already in the Dataset Register.

@coret Can you point me to this dataset in the Dataset Register? I can’t seem to find it. It’s a good test case for this issue.

Found it, but it doesn’t have https://www.goudatijdmachine.nl/sparql/repositories/gtm as its distribution, which is used by the NoT.

Should we change this dataset’s URI in the NoT catalog to https://www.goudatijdmachine.nl/data/api/items/37818?

We have some additional complexity because the Register uses DCAT and the NoT Schema.org.

Also keep in mind that the NoT may want to add/override descriptions for its users. If both the NoT and the Dataset Register have a description, the NoT’s one should win.

@ddeboer
Copy link
Member

ddeboer commented Jun 27, 2023

We cannot use the SPARQL SERVICE keyword because GraphDB lacks a SPARQL service description, yielding:

Error: Could not retrieve https://triplestore.netwerkdigitaalerfgoed.nl/repositories/registry (HTTP status 400):
Missing parameter: query

I’m now experimenting with a Comunica-based federated query, but that is very slow. Perhaps because of:

Federated query execution does not just send the query to each source separately. Instead, the triples from all sources are considered one large virtual dataset, which can then be queried over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants