Install and configure Datahub::Factory, an application to transfer and convert data from a (museum) Collection Management System to an exchange format (LIDO) or a Datahub instance.
- This module uses meltwater-cpan to install the Datahub::Factory CPAN package. However, the module is included, not configured. You are free to configure it elsewhere in your manifests without running into errors.
There are two parts to this module, datahub_factory
to install and configure Datahub::Factory and datahub_factory::pipeline
to create and manage pipeline configuration files.
datahub_factory
takes no options:
include datahub_factory
To create a pipeline configuration file (installed in /etc/datahub-factory/pipelines
), use the defined type datahub_factory::pipeline
:
datahub_factory::pipeline {'test.ini':
importer_plugin => 'Adlib',
importer_options => {
file_name => '/tmp/adlib.xml',
data_path => 'recordList.record.*',
},
fixer_plugin => 'Fix',
fixer_id_path => 'administrativeMetadata.recordWrap.recordID.0._',
fixer_options => {
file_name => '/tmp/msk.fix'
},
exporter_plugin => 'Exporter',
exporter_options => {
datahub_url => 'my.thedatahub.io',
datahub_format => 'LIDO',
oauth_client_id => 'datahub',
oauth_client_secret => 'datahub',
oauth_username => 'datahub',
oauth_password => 'datahub',
},
setup_cron => true,
cron_frequency => {
hour => 2,
minute => 0
},
}
This creates the pipeline /etc/datahub-factory/pipelines/test.ini
that fetches data from an Adlib data dump (/tmp/adlib.xml
), uses a Catmandu fix called /tmp/msk.fix
and submits it to a Datahub instance running at my.thedatahub.io
. This entire operation is run every night at 2:00 by cron.
The base class must be included before you can define a pipeline, but takes no options.
Create a pipeline configuration file in /etc/datahub-factory/pipelines
and optionally creates a cron job to run the pipeline at a certain interval.
Configuring the pipeline is done by first selecting the importer, exporter and fixer plugin to use (<type>_plugin
) and then passing a hash of key, value-pairs to <type>_options
. The contents of the hash depend on the options the plugin requires.
Add a cron job by setting setup_cron
to true
and passing a frequency (in the format puppet-cron expects) to setup_frequency
. The job is run by the datahub-factory
user (which is created automatically).
importer_plugin
: select the importer plugin to use.importer_options
: pass options to the importer plugin. Valid options are dependent on the plugin used.fixer_plugin
: select the fixer plugin.fixer_options
: options for the fixer plugin.fixer_id_path
: set the path of an ID of every item that is transformed by the fixer (after the transformation) to use in logging.exporter_plugin
: select the exporter plugin.exporter_options
: options for the exporter plugin.setup_cron
: set totrue
to create a cron job for this pipeline.cron_frequency
: pass a frequency for crone (in the format puppet-cron expects).
Pull requests welcome at https://github.com/thedatahub/puppet-datahub_factory.