diff --git a/03-discovery-specification.bs b/03-discovery-specification.bs index 49754e7..5b41bd7 100644 --- a/03-discovery-specification.bs +++ b/03-discovery-specification.bs @@ -11,18 +11,18 @@ Mailing List: public-treecg@w3.org Mailing List Archives: https://lists.w3.org/Archives/Public/public-treecg/ Editor: Pieter Colpaert, https://pietercolpaert.be Abstract: - This specification defines how a client can find specific search trees of interest, as well as list the context information. + This specification defines how a client selects a specific dataset and search tree, as well as extracts relevant context information. -# The overview # {#overview} +# Definitions # {#overview} -A tree:Collection is a subclass of dcat:Dataset ([[!vocab-dcat-3]]). +A `tree:Collection` is a subclass of `dcat:Dataset` ([[!vocab-dcat-3]]). The specialization being that this particular dataset is a collection of _members_. -A tree:SearchTree is a subClassOf dcat:Distribution. +A `tree:SearchTree` is a subClassOf `dcat:Distribution`. The specialization being that it uses the main TREE specification to publish a search tree. -A node from which all other nodes can be found is a `tree:RootNode`, which MAY be explicitely typed as such. +A node from which all other nodes can be found is a `tree:RootNode`. Note: The `tree:SearchTree` and the `tree:RootNode` MAY be identified by the same IRI when no disambiguation is needed. @@ -30,70 +30,95 @@ A TREE client MUST be provided with a URL to start from, which we call the _entr # Initializing a client with a url # {#starting-from} -The goal of the client is to understand what `tree:Collection` it is using, and to find a `tree:RootNode` or search form to start the traversal phase from. +The goal of the client is to understand what `tree:Collection` it is using, and to find a `tree:RootNode` to start the traversal phase from. +This discovery specification extends the initialization step in the TREE specification for the cases in which multiple options are possible. -``` -IN: E: a URL of the entrypoint -OUT: N: tree:RootNode IRI and/or S: search form - ``` +The client MUST dereference the URL, which will result in a set of quads. The client now MUST first perform the init step from the main specification. +If that did not return any result, then the client MUST check whether the URL before redirects (`E`) has been used in one of the following discovery patterns described in the subsections: + 1. `E` is a `tree:Collection`: then the client needs to [select the right search tree](#tree-search-trees) + 2. `E` is a `dcat:Dataset`: then the client needs to [select the right distribution or dataservice from a catalog](#dcat-dataset) + 3. `E` is a `ldes:EventStream`: then the client MAY take into account [LDES specific properties](#ldes) + 4. `E` is a `dcat:Distribution`: then the client needs to [process it accordingly](#dcat-distribution) + 5 `E` is a `dcat:DataService`: then the client needs to [process it accordingly](#dcat-dataservice) + 6. `E` is a catalog or is not explicitly mentioned: then it needs to select a dataset based on [shape information](#tree-collection-shapes) and [DCAT Catalog information](#dcat-catalog) -The client MUST dereference the URL, which will result in a set of quads. -When the URL given to the TREE client, after all redirects, is used in a triple ex:C tree:view <> ., a client MUST assume the URL after redirects (`E'`) is an identifier of the intended `tree:RootNode` of the collection `ex:C`. -The client MUST check for this `tree:view` property and return the result of the discovery algorithm with `<> → N`. +## Selecting a collection via shapes ## {#tree-collection-shapes} -If there is no such triple, then the client MUST check whether the URL before redirects (`E`) has been used in one of the following patterns: - * `E tree:view ?N.` where there’s exactly one `?N`, then the algorithm MUST return `?N → N`. - * `E tree:rootNode ?N ; tree:search ?S .` then the algorithm MUST return `?N → N` and `?S → S`. - * `?DS dcat:servesDataset E ; dcat:endpointURL ?U` or `E dcat:endpointURL ?U`, then the algorithm MUST repeat the algorithm with `?U` as the entrypoint. +When multiple collections are found by a client, it can choose to prune the collections based on the `tree:shape` property. +The `tree:shape` property will refer to a first `sh:NodeShape`. +The collection MAY be pruned in case there is no overlap in properties the client needs. -Note: When data about the dataset, data service or search tree is found, it is a good idea to also pass this on to the client. +Issue: Will we document the precise algorithm to use? Should we extend shapes with cardinality approximations as well? -## tree:Collection ## {#collection} +## Selecting a collection via a catalog ## {#dcat-catalog} -In order to prioritize a specific view link, the relations and search forms in the entry nodes can be studied for their relation types, path or remaining items. -The class tree:ViewDescription indicates a specific TREE structure on a tree:Collection. -Through the property tree:viewDescription a tree:Node can link to an entity that describes the view, and can be reused in data portals as the dcat:DataService. +A DCAT Catalog is an overview of datasets, data services and distributions. +As TREE clients first need to select a dataset, and then a search tree to use, it aligns wll with how DCAT-AP works. +DCAT discovery extends upon the previous section in which a collection or dataset can be selected based on the `tree:shape` property. -
- ```turtle - ## What can be found in a tree:Node - ex:N1 a tree:Node ; - tree:viewDescription ex:View1 . - - ex:C1 a tree:Collection ; - tree:view ex:N1 . - - ## What can be found on a data portal - ex:C1 a dcat:Dataset . - ex:View1 a tree:ViewDescription, dcat:DataService ; - dcat:endpointURL ex:N1 ; # The entry point that can be advertised in a data portal - dcat:servesDataset ex:C1 . - ``` -
+For now, we will assume the DCAT information is available in subject pages. + +Issue: Do we need more text on how to handle different types of DCAT interfaces? + +The dataset descriptions can be used for filtering the datasets available in a catalog to a list of datasets that can be useful for the client. +Such properties may include the spatial extent, the time extent, or how it is possibly a part of another `dcat:Dataset`. + +Issue: How precise do we need to be in this specification? + +When the `dcat:Dataset` is a `tree:Collection`, the DCAT catalog is going to contain a `dct:type` property with `https://w3id.org/tree#Collection` or `https://w3id.org/ldes#EventStream` as the object. + +## Choosing from multiple SearchTrees with TREE ## {#tree-search-trees} + +Issue: This is yet to be done + +## Selecting a search tree via a DCAT dataset ## {#dcat-dataset} + +The are two ways in which you can find a search tree from a dataset: via the distributions and via the data services. Both need to be tested. +Selecting a distribution or data service when multiple are available needs to be done based on [the search tree description](tree-search-trees). +If nothing is available, all need to be tested by processing them as exemplifie din the next subsections. -When there is no tree:viewDescription property in this page, a client either already discovered the description of this view in an earlier tree:Node, either the current tree:Node is implicitly the ViewDescription. Therefore, when the property path tree:view → tree:viewDescription does not yield a result, the view properties MUST be extracted from the object of the tree:view triple. -A tree:Node can also be double typed as the tree:ViewDescription. A client must thus check for ViewDescriptions on both the current node without the tree:viewDescription qualification, as on the current node with the tree:viewDescription link. +### Selecting a search tree via DCAT Distribution ### {#dcat-distribution} -## dcat:Catalog ## {#collection} +`E dcat:distribution ?D . ?D dcat:downloadURL ?N .` then ?N is a rootnode of E. -When multiple collections are found by a client, it can choose to prune the collections based on the tree:shape property. -Therefore a data publisher SHOULD annotate a tree:Collection instance with a SHACL shape. -The tree:shape points to a SHACL description of the shape (sh:NodeShape). +Issue: This is yet to be done -Note: the shape can be a blank node, or a named node on which you should follow your nose when it is defined at a different HTTP URL. +### Selecting a search tree from a DCAT data service ### {#dcat-dataservice} -# Context data # {#context} + * `?DS dcat:servesDataset E ; dcat:endpointURL ?U` or `E dcat:endpointURL ?U`, then the algorithm MUST repeat the algorithm with `?U` as the entrypoint. + +Issue: This is yet to be done + +## Linked Data Event Streams ## {#ldes} + +In case the client is not made for query answering, but only for setting up a replication and synchronization system, then there is a special type that can be used to indicate the search tree is made for this purpose: the `ldes:EventSource`. +Clients that want to prioritize taking a _full_ copy MAY give full priority to this server hint. + +
+```turtle +E a ldes:EventSource ; + tree:rootNode|dcat:downloadURL . +``` +
+ +# Extracting content information # {#context} -Context information is important to understand who the creator of a certain dataset is, when it was last changed, what other datasets it was derived from, etc. +Issue: This is yet to be done -TODO +Context information enabled a cliento understand who the creator of a certain dataset is, when it was last changed, what other datasets it was derived from, etc. ## DCAT and dcterms ## {#context-dcat} +Issue: This is yet to be done + ## Provenance ## {#context-prov} +Issue: This is yet to be done + ## Linked Data Event Streams ## {#context-ldes} +Issue: This is yet to be done + LDES (https://w3id.org/ldes/specification) is a way to evolve search trees in a consistent way. It defines every member as immutable, and a collection as append-only. Therefore, one can make sure to only process each member once. Extra terms are added, such as the concept of an EventStream, retention policies and a timestampPath.