Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate image in Notebooks #13

Open
mrchristian opened this issue Jun 14, 2024 · 3 comments
Open

Duplicate image in Notebooks #13

mrchristian opened this issue Jun 14, 2024 · 3 comments
Assignees
Labels
demo enhancement New feature or request

Comments

@mrchristian
Copy link

mrchristian commented Jun 14, 2024

All fields

@calnfynn

This comment was marked as outdated.

@calnfynn calnfynn added the demo label Jul 8, 2024
@calnfynn
Copy link
Collaborator

calnfynn commented Jul 11, 2024

Problem: Duplicate Images caused by Duplicate Date Entries

The SPARQL-query returns two results for each item: One with the publication date stored in EDTF Date/Time and output exactly as it was entered (e.g. 2005), and another one with the publication date stored as dateTime (e.g. 2005-01-01T00:00:00) and output as a human-readable date (e.g. 1 January 2005).

This is what the results look like in .json:

{"head":
 {"vars":["itemLabel","itemDescr","imgItem","imgUrl","publishDate"]},
 "results":{
	 "bindings":[{
    "imgItem": {"type":"uri","value":"https://computational-publishing-service.wikibase.cloud/entity/Q211"},
    "imgUrl": {"type":"uri","value":"https://previous.bildindex.de/bilder/fmd10005860a.jpg"},
    "publishDate": {"datatype":"http://www.w3.org/2001/XMLSchema#edtf","type":"literal","value":"2018"},
    "itemLabel": {"xml:lang":"en","type":"literal","value":"Knight's Hall & Room 72 - to the east"},
    "itemDescr": {"xml:lang":"en","type":"literal","value":"Part of: Weikersheim Castle Saalbau Wolfgang Beringer, builder & Stonemason - Georg Stegle, master builder - design: Georges Robin, architect - Elias Gunzenhäuser, carpenter - Weikersheim, Marktplatz 11 - from 1595"}
},{
    "imgItem": {"type":"uri","value":"https://computational-publishing-service.wikibase.cloud/entity/Q211"},
    "imgUrl": {"type":"uri","value":"https://previous.bildindex.de/bilder/fmd10005860a.jpg"},
    "publishDate": {"datatype":"http://www.w3.org/2001/XMLSchema#dateTime","type":"literal","value":"2018-01-01T00:00:00Z"},
    "itemLabel": {"xml:lang":"en","type":"literal","value":"Knight's Hall & Room 72 - to the east"},
    "itemDescr": {"xml:lang":"en","type":"literal","value":"Part of: Weikersheim Castle Saalbau Wolfgang Beringer, builder & Stonemason - Georg Stegle, master builder - design: Georges Robin, architect - Elias Gunzenhäuser, carpenter - Weikersheim, Marktplatz 11 - from 1595"}
}]}}

That is problematic because they aren't recognised as the same item, causing a duplicate result. This leads to the same image being loaded twice in the Section Notebook.

Possible Solution 1: Changes to the SPARQL query

PREFIX cps: <https://computational-publishing-service.wikibase.cloud/entity/>
PREFIX cpss: <https://computational-publishing-service.wikibase.cloud/entity/statement/>
PREFIX cpsv: <https://computational-publishing-service.wikibase.cloud/value/>
PREFIX cpspt: <https://computational-publishing-service.wikibase.cloud/prop/direct/>
PREFIX cpsp: <https://computational-publishing-service.wikibase.cloud/prop/>
PREFIX cpsps: <https://computational-publishing-service.wikibase.cloud/prop/statement/>
PREFIX cpspq: <https://computational-publishing-service.wikibase.cloud/prop/qualifier/>

SELECT DISTINCT ?itemLabel ?itemDescr ?imgItem ?imgUrl ?publishDate
WHERE
{
  ?imgItem cpsp:P107 ?urlStatement. 
  ?urlStatement cpsps:P107 ?imgUrl. 
  ?imgItem cpsp:P60 ?dateStatement. 
  ?dateStatement cpsps:P60 ?publishDate. 
  ?imgItem cpsp:P6 ?partOfStatement.
  ?partOfStatement cpsps:P6 ?partOfItem.
  <placeholder>
  
  FILTER (datatype(?publishDate) = xsd:edtf) 

  SERVICE wikibase:label {
      bd:serviceParam wikibase:language "en,de".
      ?imgItem rdfs:label ?itemLabel.
      ?imgItem schema:description ?itemDescr.
    }
}

The FILTER (datatype(?publishDate) = xsd:edtf) bit removes all results that are not in EDTF format and leaves one datatype only, which results in every item being passed to the notebook exactly once.

+ : That still leaves the full functionality of EDTF.
– : Every new query will need this alteration.

Possible Solution 2: Changes in Wikibase?

Apparently the duplication issue is a consequence of SPARQL being used to display EDTF results (see here on the WikibaseEDTF Repo).

So it could also be fixed by changing the datatype of ?publishDate to "Point In Time", which doesn't have the problem.

But since PIT is used only for exact dates that would mean losing the possibility to enter fuzzy dates or time intervals → that's no problem for the current unambiguous data but might be for another set. I'm not sure if this a viable solution.

I also don't know if there isn't something else that could be changed in Wikibase to prevent that duplication from happening?

+ : No query alterations needed.
– : Loss of functionality.

@baillyk
Copy link
Collaborator

baillyk commented Jul 11, 2024

classical case of "it's not a bug, it's a feature" :)

it's a hard decision to make as we dont know the furure requirements yet. but i offer the following suggestion:
I would suggest to remove the EDTF datetype and switch to PIT for the following reasons:

  • makes the sparql queries simpler (helps to reduce complexity right now)
  • intervals can be defined also with the PIT type (by using two dates like startdate/enddate instead of one edtf)
  • EDTF is currently not fully supported by semantic wikibase which means it also leads to problems in visualizing the data in semantic mediawiki queries when edtf is holding complex content like intervals or fuzzy data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo enhancement New feature or request
Development

No branches or pull requests

3 participants