Ansible Role: Setup a hadoop cluster

This scripts a multi node cluster for hadoop

How to setup the hadoop cluster

Once the cluster is up in running.

Login into node1 as hduser.
Source the ~/.bashrc to setup the env variables.
Set java_home in hadoop directory - /usr/local/hadoop/etc/hadoop/hadoop-env.sh (ansible tasks will be added)
Format node - /usr/local/hadoop/bin/hadoop namenode -format
Start hadoop - /usr/local/hadoop/sbin/start-all.sh
Check services are running using jps command jps
Check all node is running sudo netstat -plten | grep java
Create a temp directory mkdir -p /tmp/gutenberg(tasks already
Copy the contents http://www.gutenberg.org/cache/epub/20417/pg20417.txt
Create the hdfs directory hdfs dfs -mkdir -p /user/hduser
Copy files /tmp to hdfs hadoop hdfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberg
To run the example to to this folder - /usr/local/hadoop/share/hadoop/mapreduce hadoop jar hadoopexamples.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
Copy the output - hadoop hdfs -getmerge /user/hduser/gutenberg-output /tmp/gutenberg-output

MIT / BSD

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
roles		roles
README.md		README.md
Vagrantfile		Vagrantfile
ansible.cfg		ansible.cfg
info.php		info.php
playbook.retry		playbook.retry
playbook.yml		playbook.yml
ubuntu-xenial-16.04-cloudimg-console.log		ubuntu-xenial-16.04-cloudimg-console.log