THESE DOCS ARE DEPRECATED SEE ActionML.com/docs

The Guides are moved

The markdown templates are now in https://github.com/actionml/docs.actionml.com. Changes there are automatically published to the live site: actionml.com/docs. Please make any PRs to that new repos.

#PredictionIO Standalone Server Guide: The Driver Machine

This is a guide to setting up the PredictionIO EventServer and Universal Recommender PredictionServer in a standalone fashion so all constituent services run on a single machine. At the end of this guide we will spin up a Spark cluster and offload the majority of training work to the cluster, then take it offline so it costs nothing while idle.

AWS

Create an instance on AWS (other services may work too, but this is tested on AWS) that has enough memory to run all of the PredictionIO services. This will be something like an m3.xlarge or m3.2xlarge.

##Before You Start

Follow the Small HA Cluster-setup instructions except for the following differences:

First remember that we will be setting up only one machine so where you see references to more than one, ignore the other machines.
Use the Driver Machine's DNS name for setup but never "localhost". This is so it will be easier to scale later.
Do not use /etc/hosts to add names for the Driver Machine, use the internal AWS DNS name in all configs.

For some not well understood reason you must use localhost to point HBase's Zookeeper to the Driver Machine when not in a cluster. So in /usr/local/hbase/conf/hbase-site.xml use the following:

 <configuration>
   <property>
     <name>hbase.rootdir</name>
     <value>hdfs://driver-machine:9000/hbase</value>
   </property>
 
   <property>
     <name>hbase.cluster.distributed</name>
     <value>true</value>
   </property>
 
   <property>
     <name>hbase.zookeeper.property.dataDir</name>
     <value>hdfs://driver-machine:9000/zookeeper</value>
   </property>
 
   <property>
     <name>hbase.zookeeper.quorum</name>
     <value>localhost</value>
   </property>
 
   <property>
     <name>hbase.zookeeper.property.clientPort</name>
     <value>2181</value>
   </property>
 </configuration>

Notice the hbase.zookeeper.quorum is localhost. Substituting

Do not create the /usr/local/hbase/conf/backupmasters file

Do not use HDFS for the PredictionIO "models" storage so set these value in /usr/local/pio/conf/pio-env.sh

 PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
 PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
 PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
 PIO_STORAGE_SOURCES_LOCALFS_HOSTS=$PIO_FS_BASEDIR/models

start platform services

 $ /usr/local/hadoop/sbin/start-dfs.sh
 $ /usr/local/spark/start-all.sh # if using the local host to run Spark

start the pio services and the EventServer
```
 $ pio-start-all
```

to restart pio serives

 $ pio-stop-all
 $ jps -lm 
 $ # check for orphaned HMaster or HRegionServer or QuorumServer
 $ # non-eventserver Console and kill separately to get a clean state
 $ kill some-pid

install pip to import data to the EventServer

 $ sudo apt-get install python-pip
 $ sudo pip install predictionio
 $ sudo pip install datetime

get the Universal Recommender

 $ git clone https://github.com/actionml/template-scala-parallel-universal-recommendation/tree/v0.3.0 universal
 $ cd universal
 $ pio app list # to see datasets in the EventServer
 $ pio app new handmade # if the app is not there
 $ python examples/import_handmade.py --access_key key-from-app-list

to retrain after any change to data or engin.json

 $ pio build # do this before every train
 $ pio train -- --master spark://some-master:7077 --driver-memory 3g

to retrain after a pio config change first restart pio as above, them retrain, no need to reimport unless you have rebuild HBase, in which case start from "start platform services" above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

single-driver-machine.md

single-driver-machine.md

THESE DOCS ARE DEPRECATED SEE ActionML.com/docs

AWS

Files

single-driver-machine.md

Latest commit

History

single-driver-machine.md

File metadata and controls

THESE DOCS ARE DEPRECATED SEE ActionML.com/docs

AWS