This document is intended for muninn extension developers. Muninn is a generic archiving framework. To be able to use it to archive specific (types of) products, it is necessary to install (or implement) one or more extensions.
Readers of this document are assumed to be familiar with the content of the muninn README.rst file, in particular the sections "Extensions", "Data types", "Namespaces", and "Links".
A muninn extension is a Python module or package that implements the muninn extension interface. Muninn defines three types of extensions: namespace extensions (that contain namespace definitions), product type extensions (that contain product type plug-ins) and remote backend extensions (that contain remote backend plug-ins).
A namespace is a set of related properties, i.e. a set of (key, value) pairs. The namespace definition specifies the keys (field names) available within the namespace, their type, and whether or not they are optional.
For example, this is the definition of the core
namespace of muninn (see
also the file core.py
included in the muninn source distribution):
from muninn.schema import * class Core(Mapping): uuid = UUID active = Boolean hash = optional(Text) size = optional(Long) metadata_date = Timestamp archive_date = optional(Timestamp) archive_path = optional(ArchivePath) product_type = Text product_name = Text physical_name = Basename validity_start = optional(Timestamp) validity_stop = optional(Timestamp) creation_date = optional(Timestamp) footprint = optional(Geometry) remote_url = optional(Remote)
A product type plug-in is an instance of a class that handles all product type specific details. The most important function of a product type plug-in is to extract properties from a product and return them in a form the archiving framework understands.
To represent product properties, a class called muninn.Struct
is used,
which is essentially an empty class derived from object. Product properties are
added to this class via injection. Think of it as a dictionary, except that you
can also use .
to access the value bound to a specific product property.
A muninn.Struct
can be initialized with a python dictionary. This will also
convert all members that are dictionaries into muninn.Struct
objects.
By convention, product properties are named <namespace name>.<property name>. This means you usually have a single top-level Struct instance, that contains a separate Struct instance for each namespace. For example:
from muninn import Struct properties = Struct() properties.core = Struct() properties.core.product_type = "ABCD" properties.core.creation_date = datetime.datetime.utcnow() ... more of the same ... properties.xml_pi = Struct() properties.xml_pi.startTime = datetime.datetime.utcnow() ... more of the same ...
A remote backend plug-in adds the hability of an archive to pull products from remote sources using a protocol beyond the basic file/ftp/http/https protocols.
All attributes, functions, and methods described in this section are mandatory, unless explicitly stated otherwise.
Extensions are only allowed to raise muninn.Error or instances of exception
classes derived from muninn.Error
. If an extension raises an exception that
does not derive from muninn.Error
, or allows exceptions from underlying
modules to propagate outside of the extension, this should be considered a bug.
namespaces()
- Return a list containing the names of all namespaces defined by the extension.
namespace(namespace_name)
- Return the namespace definition of the specified namespace. An exception should be raised if the specified namespace is not defined by the extension.
All attributes, functions, and methods described in this section are mandatory, unless explicitly stated otherwise.
Extensions are only allowed to raise muninn.Error
or instances of exception
classes derived from muninn.Error
. If an extension raises an exception that
does not derive from muninn.Error
, or allows exceptions from underlying
modules to propagate outside of the extension, this should be considered a bug.
product_types()
- Return a list containing all product types for which this extension defines plug-ins.
product_type_plugin(product_type)
- Return an instance of a class that adheres to the product type plug-in API (see below) and that implements this interface for the specified product type. An exception should be raised if the extension does not support the specified product type.
All attributes, functions, and methods described in this section are mandatory, unless explicitly stated otherwise.
Extensions are only allowed to raise muninn.Error or instances of exception
classes derived from muninn.Error
. If an extension raises an exception that
does not derive from muninn.Error
, or allows exceptions from underlying
modules to propagate outside of the extension, this should be considered a bug.
remote_backends()
- Return a list containing the names of all remote backends defined by the extension.
remote_backend(name)
- Return the remote backend definition of the specified remote backend. An exception should be raised if the specified remote backend is not defined by the extension.
A product type plug-in is an instance of a class that implements the interface defined in this section.
All attributes, functions, and methods described in this section are mandatory, unless explicitly stated otherwise.
Product type plug-ins are only allowed to raise muninn.Error
or instances
of exception classes derived from muninn.Error
. If an extension raises an
exception that does not derive from muninn.Error
, or allows exceptions from
underlying modules to propagate outside of the extension, this should be
considered a bug.
use_enclosing_directory
This variable should equal True if products of the type the plug-in is designed to handle consist of multiple files, False otherwise.
In the majority of cases, a product is represented by a single path (i.e. file, or directory). For such cases, this attribute should be set to
False
, and theanalyze()
method defined below can expect to be called with a list containing a single path.If a product consist of two or more files that belong together (without them already being grouped together into a single top-level directory), this attribute should be set to
True
.use_hash
- Determines if a SHA1 hash will be computed for products of the type the plug-in is designed to handle. Since computing a hash is an expensive operation, it is useful to set this attribute to False if storing a hash is not required.
cascade_rule
Determines what should happen to products of the type the plug-in is designed to handle when all products linked to these products (as source products) have been stripped or removed. (A stripped product is a product for which the data on disk has been deleted, but the entry in the product catalogue has been kept).
Possible values are defined by the
muninn.extension.CascadeRule
enumeration and are given below:CascadeRule.IGNORE
- Do nothing.
CascadeRule.CASCADE_PURGE_AS_STRIP
- If all source products of a product have been removed, strip the product. If all source products of a product have been stripped, do nothing.
CascadeRule.CASCADE_PURGE
- If all source products of a product have been removed, remove the product. If all source products of a product have been stripped, do nothing.
CascadeRule.STRIP
- If all source products of a product have been removed, strip the product. If all source products of a product have been stripped, strip the product.
CascadeRule.CASCADE
- If all source products of a product have been removed, remove the product. If all source products of a product have been stripped, strip the product.
CascadeRule.PURGE
- If all source products of a product have been removed, remove the product. If all source products of a product have been stripped, remove the product.
This attribute is optional. If it is left undefined,
CascadeRule.IGNORE
is assumed.
identify(self, paths)
Returns
True
if the specified list of paths constitutes a product of the product type the plug-in is designed to handle,False
otherwise.Note that a return value of
True
does not necessarily imply that properties can be extracted from the product without errors. For example, a valid implementation of this method could be as simple as checking the (base) names of the specified paths against an expected pattern.analyze(self, paths)
Return properties extracted from the product that consists of the specified list of paths as a nested
Struct
(key, value) pair structure. Note that muninn will itself set the core metadata properties foruuid
,active
,hash
,size
,metadata_date
,archive_date
,archive_path
,product_type
, andphysical_name
. So these do not have the be returned by theanalyze()
function (they will be ignored if provided).Optionally, a list of tags can be returned from this method in addition to the extracted product properties. Any tags returned will be applied to the product once it has been successfully ingested.
To include a list of tags, the method should return a tuple (or list) of two elements. The first element should be the nested Struct (key, value) pair structure containing product properties, and the second element should be the list of tags.
enclosing_directory(self, properties)
Return the name to be used for the enclosing directory.
Within the archive, any product is represented by a single path. For products that consist of multiple paths, this is achieved by transparently wrapping everything in an enclosing directory inside the archive.
A commonly used implementation of this method is to return the product name, i.e.
properties.core.product_name
.This method is optional if
use_enclosing_directory
isFalse
.archive_path(self, properties)
Return the path, relative to the root of the archive, where the product, of the product type this plug-in is designed to handle, should be stored, based on the product properties passed in as a nested
Struct
(key, value) pair structure.That is, this method uses the product properties passed in to generate a relative path inside the archive where the product will be stored.
A commonly used implementation is to return <product type>/<year>/<month>/ <day>/<uuid>/<logical product name>, where the date corresponds to the validity start of the product.
In some cases, a different implementation is required. For example, when products cannot be said to cover a time range, as is the case for some auxiliary products.
post_ingest_hook(self, archive, properties)
- This function is optional. If it exists, it will be called after a successful ingest of the product.
post_pull_hook(self, archive, properties)
- This function is optional. If it exists, it will be called after a successful pull of the product.
export_<format name>(self, archive, product, target_path)
Methods starting with
export_
can be used to implement product type specific export functionality. For example, a methodexport_tgz
could be implemented that exports a product as a gzipped tarball. The return value is the absolute path of the exported product.These methods can use the archive instance passed in to, for example, locate associated products to be included in the exported product.
The target path is a path to the directory in which the exported product should be stored. The export method is free to create additional directories under this path, for example to create a <year>/<month>/<day> structure.
These methods are optional.
A Remote backend plug-in is an instance of a class that implements the interface defined in this section.
All attributes, functions, and methods described in this section are mandatory, unless explicitly stated otherwise.
Remote backend plug-ins are only allowed to raise muninn.Error
or instances
of exception classes derived from muninn.Error
. If an extension raises an
exception that does not derive from muninn.Error
, or allows exceptions from
underlying modules to propagate outside of the extension, this should be
considered a bug.
pull(self, archive, product)
- Download the product specified. The existing product metadata should already specify the location of the file(s).