This scripts a multi node cluster for hadoop
- Clone this repository
- vagrant up
Once the cluster is up in running.
- Login into node1 as hduser.
- Source the ~/.bashrc to setup the env variables.
- Set java_home in hadoop directory - /usr/local/hadoop/etc/hadoop/hadoop-env.sh (ansible tasks will be added)
- Format node - /usr/local/hadoop/bin/hadoop namenode -format
- Start hadoop - /usr/local/hadoop/sbin/start-all.sh
- Check services are running using jps command jps
- Check all node is running sudo netstat -plten | grep java
- Create a temp directory mkdir -p /tmp/gutenberg(tasks already
- Copy the contents http://www.gutenberg.org/cache/epub/20417/pg20417.txt
- Create the hdfs directory hdfs dfs -mkdir -p /user/hduser
- Copy files /tmp to hdfs hadoop hdfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberg
- To run the example to to this folder - /usr/local/hadoop/share/hadoop/mapreduce hadoop jar hadoopexamples.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
- Copy the output - hadoop hdfs -getmerge /user/hduser/gutenberg-output /tmp/gutenberg-output
MIT / BSD