PNDA can be provisioned with either Hortonworks HDP or Cloudera CDH as the Hadoop distribution
The Hadoop distribution provides the main data storage and data processing capabilities. The distribution brings together all the upstream component projects that make up 'Hadoop' in a tested package, with pre-built binaries and APIs for automated setup.
First you need to decide based on feature set, licencing or pricing, which distribution to use. See below for a basic overview of the differences.
In terms of physically selecting the distribution, the Hadoop distribution to use is configured at PNDA creation time as a single setting in the pnda_env.yaml file. The PNDA setup instructions cover this at the appropriate point.
The PNDA mirror (which provides all the resources required during PNDA creation) contains both CDH and HDP components, so there is no need to select one or the other when creating the PNDA mirror.
- Hortonworks HDP is 100% open source
- Uses Amabri for cluster monitoring, management, additional UIs and setup
- Provides Hive for MPP SQL queries instead of Impala (a performance evaluation would be advisable for the specific workloads you will run, if this type of SQL query is important to your use cases).
- A commercial subscription is required for support.
- Cloudera CDH is 100% open source
- Cloudera Manager and some other components are proprietary (certain core features may be used for free but advanced cluster management features require a commercial licence)
- Uses Cloudera Manager for cluster monitoring, management and setup
- Uses Hue for additional UIs
- Provides Impala for MPP SQL queries
- A commercial subscription is required for support.
All information provided about Hortonworks and Cloudera products and services is for convenience purposes only and may not reflect current licencing or pricing. Please visit or contact Hortonworks and Cloudera directly to determine your rights and obligations.