Skip to content

Proposal #1: custom metadata mapping

Emanuele Tajariol edited this page Apr 29, 2014 · 3 revisions

Rationale

CKAN CSW harvester (implemented in the ckanext-spatial extension) extracts information from ISO19139 records using a wide but fixed set of xpath.

Site developers/admins may need to extract some other information that are not already mapped.

We need a way to add new field mappings without editing the python code.

Proposal

It shall be possible to import any field from the harvested ISO record into the dataset extras fields.

What is needed is the name of the extra fields that should be created, and the XPath of the information to extract from the metadata record.

Configuration will optionally contain the extras_mappings field.
It will be a map with these contents:

  • key: the name of the extra field that will be created
  • value: the xpath of the data that will be extracted

An XPath extracts a nodeset from an XML document, so we need to set some constraints and encoding:

  • Text nodes only:
    We'll only want to handle XPath expressions that extract text nodes. If the XPath does not select a text, the harvester may throw an error.
  • Multiple values:
    If more than one text node is selected by a single XPath, the corresponding extras field will contain a list of strings encoded as a JSon array.
  • Empty values:
    If the XPath does not select anything, the extras field will be created as an empty string.

Example:

{
... other configuration fields ... ,

"extras_mappings":{
   "servicepurpose":"//gmd:identificationInfo/srv:SV_ServiceIdentification/gmd:purpose/gco:CharacterString/text()",
   "mytitle":"//gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:title/gco:CharacterString/text()"
}

Notes

Existing implementation

The existing implementation that extracts data from the ISO records is split in two steps:

Clone this wiki locally