Skip to content

Library for generating Docker images with an ADAMS workflow running inside.

License

Notifications You must be signed in to change notification settings

waikato-datamining/adamsflow2docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

adamsflow2docker

Library for generating Docker images with an ADAMS workflow running inside.

Command-line

Converts ADAMS workflows into Docker images.


Usage: [--help] [-m MAVEN_HOME] [-u MAVEN_USER_SETTINGS]
       [-j JAVA_HOME] -M MODULES -V VERSION [-d DEPENDENCY...]
       [-D FILE...] [-J JAR_OR_DIR...] [-v JVM...] -i INPUT
       -b DOCKER_BASE_IMAGE [-I DOCKER_INSTRUCTIONS]
       -o OUTPUT_DIR

Options:
-m, --maven_home MAVEN_HOME
	The directory with a local Maven installation to use instead of the
	bundled one.

-u, --maven_user_settings MAVEN_USER_SETTINGS
	The file with the maven user settings to use other than
	$HOME/.m2/settings.xml.

-j, --java_home JAVA_HOME
	The Java home to use for the Maven execution.

-M, --module MODULES
	The comma-separated list of ADAMS modules to use for the application,
	e.g.: adams-weka,adams-groovy,adams-excel

-V, --version VERSION
	The version of ADAMS to use, e.g., '20.1.1' or '20.2.0-SNAPSHOT'.

-d, --dependency DEPENDENCY
	The additional maven dependencies to use for bootstrapping ADAMS
	(group:artifact:version), e.g.: nz.ac.waikato.cms.weka:kfGroovy:1.0.12

-D, --dependency-file FILE
	The file(s) with additional maven dependencies to use for bootstrapping
	ADAMS (group:artifact:version), one dependency per line.

-J, --external-jar JAR_OR_DIR
	The external jar or directory with jar files to also include in the
	application.

-v, --jvm JVM
	The parameters to pass to the JVM to launch the workflow with.

-i, --input INPUT
	The ADAMS workflow to use.

-b, --docker_base_image DOCKER_BASE_IMAGE
	The docker base image to use, e.g. 'openjdk:11-jdk-slim-buster'.

-I, --docker_instructions DOCKER_INSTRUCTIONS
	File with additional docker instructions to use for generating the
	Dockerfile.

-o, --output_dir OUTPUT_DIR
	The directory to output the bootstrapped application, workflow and
	Dockerfile in.

Example

For this example we use the weka_filter_pipeline.flow workflow and the additional weka_filter_pipeline.dockerfile Docker instructions. This workflow polls an input directory for ARFF files to clean with the InterquartileRange filter to remove outliers and extreme values. The clean datasets get placed in the output directory. The original input file is moved to the output directory as well, but with the extension .original instead of .arff.

The command-lines for this example assume this directory structure:

/some/where
|
+- data
|  |
|  +- adamsflow2docker   // contains the jar
|  |
|  +- flows
|  |  |
|  |  +- weka_filter_pipeline.flow       // actual flow
|  |  |
|  |  +- weka_filter_pipeline.dockerfile  // additional Dockerfile instructions
|  |
|  +- in    // input directory
|  |
|  +- out   // output directory
|
+- output
|  |
|  +- adamsflow  // will contain all the generated data, including "Dockerfile"

For our Dockerfile, we use the openjdk:11-jdk-slim-buster base image (-b), which contains an OpenJDK 11 installation on top of a Debian "buster" image. The weka_filter_pipeline.flow workflow (-i) then gets turned into a Docker image using the following command-line:

java -jar /some/where/data/adamsflow2docker/adamsflow2docker-0.0.2-spring-boot.jar \
  -i /some/where/data/flows/weka_filter_pipeline.flow \ 
  -o /some/where/output/adamsflow \
  -b openjdk:11-jdk-slim-buster \
  -I /some/where/data/flows/weka_filter_pipeline.dockerfile  

Now we build the docker image called adamsflow from the Dockerfile that has been generated in the output directory /some/where/output/adamsflow (-o option in previous command-line):

cd /some/where/output/adamsflow
sudo docker build -t adamsflow .

With the image built, we can now push the raw ARFF files through for cleaning. For this to work, we map the in/out directories from our directory structure into the Docker container (using the -v option) and we supply these input and output directories via the INPUT and OUTPUT environment variables (using the -e option) for the flow to pick them up. In order to see a few more messages, we also turn on the debugging output that is part of the workflow, using the VERBOSE environment variable:

sudo docker run -ti \
  -v /some/where/data/in:/data/in \
  -v /some/where/data/out:/data/out \
  -e INPUT=/data/in/ \
  -e OUTPUT=/data/out/ \
  -e VERBOSE=true \
  adamsflow

Releases

Maven

    <dependency>
      <groupId>nz.ac.waikato.cms.adams</groupId>
      <artifactId>adamsflow2docker</artifactId>
      <version>0.0.2</version>
    </dependency>