Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 2.88 KB

hadoop_distro.md

File metadata and controls

31 lines (22 loc) · 2.88 KB

Hadoop Distributions

Choice of Hadoop Distribution

PNDA can be provisioned with either Hortonworks HDP or Cloudera CDH as the Hadoop distribution

The Hadoop distribution provides the main data storage and data processing capabilities. The distribution brings together all the upstream component projects that make up 'Hadoop' in a tested package, with pre-built binaries and APIs for automated setup.

How to select the distribution

First you need to decide based on feature set, licencing or pricing, which distribution to use. See below for a basic overview of the differences.

In terms of physically selecting the distribution, the Hadoop distribution to use is configured at PNDA creation time as a single setting in the pnda_env.yaml file. The PNDA setup instructions cover this at the appropriate point.

The PNDA mirror (which provides all the resources required during PNDA creation) contains both CDH and HDP components, so there is no need to select one or the other when creating the PNDA mirror.

Hortonworks HDP

  • Hortonworks HDP is 100% open source
  • Uses Amabri for cluster monitoring, management, additional UIs and setup
  • Provides Hive for MPP SQL queries instead of Impala (a performance evaluation would be advisable for the specific workloads you will run, if this type of SQL query is important to your use cases).
  • A commercial subscription is required for support.

Cloudera CDH & Cloudera Manager

NOTE

All information provided about Hortonworks and Cloudera products and services is for convenience purposes only and may not reflect current licencing or pricing. Please visit or contact Hortonworks and Cloudera directly to determine your rights and obligations.