Skip to content

Latest commit

 

History

History
99 lines (68 loc) · 4 KB

single-driver-machine.md

File metadata and controls

99 lines (68 loc) · 4 KB

THESE DOCS ARE DEPRECATED SEE ActionML.com/docs

The Guides are moved

The markdown templates are now in https://github.com/actionml/docs.actionml.com. Changes there are automatically published to the live site: actionml.com/docs. Please make any PRs to that new repos.

#PredictionIO Standalone Server Guide: The Driver Machine

This is a guide to setting up the PredictionIO EventServer and Universal Recommender PredictionServer in a standalone fashion so all constituent services run on a single machine. At the end of this guide we will spin up a Spark cluster and offload the majority of training work to the cluster, then take it offline so it costs nothing while idle.

AWS

Create an instance on AWS (other services may work too, but this is tested on AWS) that has enough memory to run all of the PredictionIO services. This will be something like an m3.xlarge or m3.2xlarge.

##Before You Start

Follow the Small HA Cluster-setup instructions except for the following differences:

  • First remember that we will be setting up only one machine so where you see references to more than one, ignore the other machines.

  • Use the Driver Machine's DNS name for setup but never "localhost". This is so it will be easier to scale later.

  • Do not use /etc/hosts to add names for the Driver Machine, use the internal AWS DNS name in all configs.

  • For some not well understood reason you must use localhost to point HBase's Zookeeper to the Driver Machine when not in a cluster. So in /usr/local/hbase/conf/hbase-site.xml use the following:

     <configuration>
       <property>
         <name>hbase.rootdir</name>
         <value>hdfs://driver-machine:9000/hbase</value>
       </property>
     
       <property>
         <name>hbase.cluster.distributed</name>
         <value>true</value>
       </property>
     
       <property>
         <name>hbase.zookeeper.property.dataDir</name>
         <value>hdfs://driver-machine:9000/zookeeper</value>
       </property>
     
       <property>
         <name>hbase.zookeeper.quorum</name>
         <value>localhost</value>
       </property>
     
       <property>
         <name>hbase.zookeeper.property.clientPort</name>
         <value>2181</value>
       </property>
     </configuration>
    

    Notice the hbase.zookeeper.quorum is localhost. Substituting

  • Do not create the /usr/local/hbase/conf/backupmasters file

  • Do not use HDFS for the PredictionIO "models" storage so set these value in /usr/local/pio/conf/pio-env.sh

     PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
     PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
     PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
     PIO_STORAGE_SOURCES_LOCALFS_HOSTS=$PIO_FS_BASEDIR/models
    
  • start platform services

     $ /usr/local/hadoop/sbin/start-dfs.sh
     $ /usr/local/spark/start-all.sh # if using the local host to run Spark
    
  • start the pio services and the EventServer

     $ pio-start-all
    
  • to restart pio serives

     $ pio-stop-all
     $ jps -lm 
     $ # check for orphaned HMaster or HRegionServer or QuorumServer
     $ # non-eventserver Console and kill separately to get a clean state
     $ kill some-pid
    
  • install pip to import data to the EventServer

     $ sudo apt-get install python-pip
     $ sudo pip install predictionio
     $ sudo pip install datetime
    
  • get the Universal Recommender

     $ git clone https://github.com/actionml/template-scala-parallel-universal-recommendation/tree/v0.3.0 universal
     $ cd universal
     $ pio app list # to see datasets in the EventServer
     $ pio app new handmade # if the app is not there
     $ python examples/import_handmade.py --access_key key-from-app-list
    
  • to retrain after any change to data or engin.json

  •  $ pio build # do this before every train
     $ pio train -- --master spark://some-master:7077 --driver-memory 3g
    
  • to retrain after a pio config change first restart pio as above, them retrain, no need to reimport unless you have rebuild HBase, in which case start from "start platform services" above.