######################################
######################################
- Installation of Virtual machine:
- Go to https://www.virtualbox.org and download a version for your OS.(I have tried with Ubuntu)
- After download, run the .exe file and install the virtual box on your machine.
- Download image for Hadoop:
- Go to https://www.cloudera.com/downloads.html and Click on Download Hortronworks Sandbox link.
- Click on Download Hortronworks HDP 'Download Now' button.
- Choose installation type as "Virtual Box" in Get Started Now section and click on 'Let's Go' button.
- Fill up the details and click on 'Continue' and then 'Submit' Button.
- Download the 2.5.0 version of Sandbox HDP Virtualbox Downloads.
- Open downloaded .ova file
- Click on 'Import'.
- Click on 'Start' to start the machine.
- Web UI for Hadoop Ambari:
- Go to http://127.0.0.1:8888/ .
- Click on 'Please Disable Popup Blocker'
- Sign in with username: maria_dev and password: maria_dev
- To reset password for user 'admin': - su root - ambari-admin-password-reset
- Data preparation:
-
You can get the data directly from here: https://github.com/Kavita-Yadav/Learning-Hadoop-and-bigData/tree/master/MovieLensData.
-
It has detailed description of data from grouplens.
OR
-
Go to https://grouplens.org/ or (Direct link for 100k dataset: https://grouplens.org/datasets/movielens/100k/).
-
Download MovieLens 100k Dataset by downloading 'ml-100k.zip' data file.
-
Unzip 'ml-100k.zip'.
-
You can also try this with 1M, 10M and 20M data from here https://grouplens.org/datasets/movielens/.