first commit

pndaproject · Jul 4, 2016 · dd6b1eb · dd6b1eb
commit dd6b1eb
Show file tree

Hide file tree

Showing 158 changed files with 6,666 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,9 @@
+_book
+pnda-guide.pdf
+pnda-guide.zip
+pnda-guide.epub
+pnda-guide.mobi
+*.graffle
+node_modules
+archives
+scripts
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,92 @@
+# Change Log
+All notable changes to this project will be documented in this file.
+
+## [0.1.0] 2016-07-01
+### First version
+
+## [Pre-release]
+
+### 2016-06-30
+
+* Added package server instructions.
+* Updated website to [pnda.io](http://pnda.io).
+
+### 2016-06-29
+
+* Fixed deployment issue. 
+
+### 2016-06-28
+
+* Renamed PaNDA to PNDA.  
+
+### 2016-06-17
+
+* Renamed data exploration [lab](exploration/lab.md) to tutorial.
+* Added links to repository. 
+
+### 2016-06-16
+
+* Renamed platform-heat-templates to pnda-heat-templates, platform-dib-elements to pnda-dib-elements. 
+* Updated standard PNDA flavor.
+
+### 2016-06-09
+
+* Updated repos.
+* Removed links between repos, as they work in the guide but not on the GitHub website.
+
+### 2016-06-08
+
+* [jupyter lab](exploration/lab.md): Added Jupyter lab.
+* [platform requirements](provisioning/platform_requirements.md): Updated resource requirements.
+* Removed TODO file.
+
+### 2016-06-08
+
+* [downloads](downloads/README.md): Added links to download book.
+* [date](README.md): The last updated date is now set automatically.
+
+### 2016-06-07
+
+* [getting started](gettingstarted/README.md): Explained "use the data generation tools to generate test data sets".
+* [getting started](gettingstarted/README.md): On last line, better instructions for opening Impala and what to look for.
+* [datasets](console/datasets.md): Added keep mode, removed nolimit policy.
+* Upated multiple repo README files.
+* Updated console screenshots with new design.
+* Added placeholders for [consumer](consumer/lab.md) and [exploration](exploration/lab.md) labs.
+* [cover](cover.jpg): Added cover.
+* [provisioning/heat.md](provisioning/heat.md): Added getting started with Heat document. 
+* [jupyter](exploration/jupyter.md): Added page that explains what you can do with a notebook.
+* [HBase tutorial](applications/ksh.md): Added lab from DevNet.
+* [OpenTSDB tutorial](applications/kso.md): Added lab from DevNet.
+
+### 2016-05-20
+
+* [console](console/README.md): Added better description of console workflow, and detailed screenshots.
+* [references](others/README.md): Assembled all external hyperlinks into one page.
+* [deployment manager](repos/platform-deployment-manager/README.md): Removed entire section that was already copied to the [applications](applications/README.md) page. Moved design section to the beginning of the chapter.
+* [openbmp](producer/openbmp.md): Removed not very "useful links".
+* [examples](applications/examples.md): Added links to example repos.
+* [opendl](producer/opendl.md): Added link to PNDA integration instructions.
+* [example-spark-batch](repos/example-spark-batch/README.md): Added better build instructions.
+* [example-spark-streaming](repos/example-spark-streaming/README.md): Added better build instructions.
+* [deployment manager](repos/platform-deployment-manager/README.md): Added requirements and building sections.
+* [README](README.md): The version number is now updated from the git release version using scripts/version.sh.
+
+### 2016-05-17
+
+* [CHANGELOG.md](CHANGELOG.md): Added this file.
+* [TODO.md](TODO.md): Added file.
+* [provisioning/heat.md](provisioning/heat.md): Added new document with a link to the Heat wiki.
+* [security](security/README.md): Replaced with new blueprint, and removed everything else under security.
+* [provisioning/platform_requirements.md](provisioning/platform_requirements.md): Formatted resource requirements into tables.
+* [log-aggregation](log-aggregation/README.md): Added file. Added external links.
+* [introduction](README.md): Added paragraph for log aggregation. Shortend info for applications and packages.  
+* [pnda-dib-elements](repos/pnda-dib-elements/README.md): Updated formatting.
+* [metrics](console/metrics.md): Added link to log aggregation page.
+* Capitalization: Heat not HEAT, YARN not Yarn.
+* Spacing: [80 GB not 80GB](http://www.engadget.com/2010/12/16/32-gb-versus-32gb-almost-everyone-is-writing-it-wrong/).
+* [logstash](repos/prod-logstash-codec-avro/README.md): Replaced producer/logstash.md with prod-logstash-codec-avro readme.
+* [packages](console/packages.md): Added link to getting started with info on uploading packages.
+* [datasets](console/datasets.md): Added link to getting started with info on where datasets come from.
+* [pmacct](producer/pmacct.md): Moved after OpenBMP.
+* [consumers](consumer/README.md): Removed links to DevNet Learning Labs.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,13 @@
+Copyright (c) 2016 Cisco and/or its affiliates.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
diff --git a/README.md b/README.md
@@ -0,0 +1,168 @@
+# PNDA Guide
+
+PNDA is a simple, scalable, open big data platform supporting operational and business intelligence analysis for networks and services. This guide provides an overview of PNDA, and will tell you how to set up and use PNDA in your own environment. 
+
+This guide covers PNDA release 3. 
+
+Last updated: June 30, 2016
+
+Version: 0.1.2
+
+## [Overview](overview/README.md)
+
+This chapter covers the main components of PNDA, including:
+
+- Data ingress using Logstash, Open Daylight & the bulk ingest tool
+- Data distribution with Kafka & Zookeeper
+- High velocity stream processing with Spark Streaming it
+- High volume batch processing with Spark
+- Free form data exploration with Jupyter
+- Structured query over big data with Impala
+- Handling time series with OpenTSDB & Grafana
+
+## [Download Book](downloads/README.md)
+
+You can read the latest version of this guide online, or download the book in a number of formats.
+
+## [Getting Started](gettingstarted/README.md)
+
+This checklist will get you started setting up a fully operational PNDA cluster, with data flowing in and out.
+
+## [Provisioning](provisioning/README.md)
+
+This chapter describes how to provision a PNDA cluster, and includes some background information on SaltStack and OpenStack Heat.
+
+ * [Platform requirements](provisioning/platform_requirements.md)
+ * [Getting started with Heat](provisioning/heat.md)
+ * [Creating images for use with Heat templates](repos/pnda-dib-elements/README.md)
+ * [Using the PNDA Heat templates](repos/pnda-heat-templates/README.md)
+ * [Getting started with SaltStack](provisioning/saltstack.md)
+ * [Provisioning with Salt Cloud](provisioning/salt-cloud.md)
+ * [Configuring a Salt Master](provisioning/saltmaster.md)
+
+## [Console](console/README.md)
+
+The PNDA console provides a real-time overview of all the components in a PNDA cluster. The home page shows health statistics for each component, color-coded by status. Components are grouped into categories, including data distribution, data processing, data storage, applications, etc.
+
+Other pages on the console let you view detailed metrics, deploy packages, run applications, and set data retention policies. 
+
+ * [Metrics](console/metrics.md)
+ * [Packages](console/packages.md)
+ * [Applications](console/applications.md)
+ * [Datasets](console/datasets.md)
+
+## [Producers](producer/README.md)
+
+Kafka is the "front door" of PNDA. It handles ingest of data streams from network sources and distributes data to all interested consumers. This chapter covers how to integrate and develop "producers", which feed data into Kafka.
+
+ * [Preparing data](producer/data-preparation.md)
+ * [Integrating Logstash](repos/prod-logstash-codec-avro/README.md)
+ * [Integrating OpenDaylight](producer/opendl.md)
+ * [Integrating OpenBMP](producer/openbmp.md)
+ * [Integrating Pmacct](producer/pmacct.md)
+ * [Developing a custom producer](producer/producer.md)
+
+## [Bulk Ingest](bulkingest/README.md)
+
+In addition to streaming ingest via Kafka producers, PNDA also provides an offline bulk ingest tool for those who would like to migrate pre-existing data into the PNDA platform. 
+
+ * [Bulk-ingest tool](repos/platform-tools/bulkingest/README.md)
+
+## [Consumers](consumer/README.md)
+
+Kafka has a simple, clean design that moves complexity traditionally found inside message brokers into its producers and consumers. A Kafka consumer pulls messages from one or more topics using Zookeeper for discovery, issuing fetch requests to the  brokers leading the partitions it wants to consume. Rather than the broker maintaining state and controlling the flow of data, each consumer controls the rate at which it consumes messages.
+
+## [Packages & Applications](applications/README.md)
+
+Packages are independently deployable units of application layer functionality, and applications are instances of packages. You can use the PNDA console to deploy packages and manage the application lifecycle. The Deployment Manager documentation explains the structure of packages, and the REST API used to deploy them. 
+
+ * [Deployment Manager](repos/platform-deployment-manager/README.md)
+ * [Example Applications](applications/examples.md)
+ * [Spark Streaming and HBase tutorial](applications/ksh.md)
+ * [Spark Streaming and OpenTSDB tutorial](applications/kso.md)
+
+## [Log Aggregation](log-aggregation/README.md)
+
+Logs from the various component services that make up PNDA, and the applications that run on PNDA, are collected and stored on the logserver node. 
+
+## [Structured Query](query/README.md)
+
+Apache Impala is a parallel execution engine for SQL queries. It supports low-latency access and interactive exploration of data in HDFS and HBase. Impala allows data to be stored in a raw form, with aggregation performed at query time without requiring upfront aggregation of data.
+
+* [Impala](query/impala.md)
+
+## [Data Exploration](exploration/README.md)
+
+The [Jupyter Notebook](http://jupyter.org) is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. In PNDA, it supports exploration and presentation of data from HDFS and HBase.
+
+* [Using Jupyter](exploration/jupyter.md)
+* [Exploratory data analytics tutorial](exploration/lab.md)
+
+## [Time Series](timeseries/README.md)
+
+OpenTSDB is a scalable time series database that lets you store and serve massive amounts of time series data, without losing granularity. Grafana is a graph and dashboard builder for visualizing time series metrics.
+
+* [OpenTSDB](timeseries/opentsdb.md)
+* [Grafana](timeseries/grafana.md)
+
+## [Security](security/README.md)
+
+A big data infrastructure like PNDA involves a multitude of technologies and tools, and may be deployed in a multi-tenant environment. Providing enterprise grade security for such system is not only complex, but is of primary concern for any production deployment. If you are implementing a client for a PNDA interface or developing a PNDA application, this chapter will cover some security guidelines that you should adhere to when working with individual components. 
+
+## [Repositories](repos/README.md)
+
+The PNDA distribution consists of the following source code repositories and sub-projects:
+
+### Provisioning
+
+ * [platform-salt](repos/platform-salt/README.md): provisioning logic for creating PNDA
+ * [platform-salt-cloud](repos/platform-salt-cloud/README.md): cluster templates for creating PNDA with salt-cloud
+ * [pnda-heat-templates](repos/pnda-heat-templates/README.md): cluster templates for creating PNDA with Heat
+ * [pnda-dib-elements](repos/pnda-dib-elements/README.md): tools for building disk image templates
+ * [pnda-package-server-docker](repos/pnda-package-server-docker/README.md): tools for creating package server
+
+### Platform
+
+ * [platform-libraries](repos/platform-libraries/README.md): libraries for working with interactive notebooks
+ * [platform-tools](repos/platform-tools/README.md): tools for operating a cluster
+     * [bulkingest](repos/platform-tools/bulkingest/README.md): tools for performing a bulk ingest of data
+ * [platform-console-frontend](repos/platform-console-frontend/README.md): “single pane of glass” giving operational overview and access to application and data management functions
+ * [platform-console-backend](repos/platform-console-backend/README.md): APIs that provide data to the console frontend
+   * [console-backend-data-logger](repos/platform-console-backend/console-backend-data-logger/README.md): APIs to ingest data
+   * [console-backend-data-manager](repos/platform-console-backend/console-backend-data-manager/README.md): APIs to provide data
+ * [platform-testing](repos/platform-testing/README.md): modules that test both the end to end platform and individual components and collect metrics
+ * [platform-deployment-manager](repos/platform-deployment-manager/README.md): API to manage packages and application deployment and lifecycle
+ * [platform-data-mgmnt](repos/platform-data-mgmnt/README.md): tools to manage data retention
+   * [data-service](repos/platform-data-mgmnt/data-service/README.md): API to set data retention policies
+   * [hdfs-cleaner](repos/platform-data-mgmnt/hdfs-cleaner/README.md): cron job to clean up HDFS data
+   * [oozie-templates](repos/platform-data-mgmnt/oozie-templates/README.md): templates that archive or delete data
+ * [platform-elk-dashboards](repos/platform-elk-dashboards/README.md): pre-configured ELK dashboards
+ * [platform-package-repository](repos/platform-package-repository/README.md): manages a simple package repository backed by OpenStack Swift
+
+### Forked Projects
+
+ * [gobblin](repos/gobblin/README.md): customized fork of the Gobblin data ingest frameworkjup
+
+### Producers
+
+ * [prod-odl-kafka](repos/prod-odl-kafka/README.md): plugin to ingest data from OpenDaylight
+ * [prod-logstash-codec-avro](repos/prod-logstash-codec-avro/README.md): plugin to ingest data from Logstash
+
+### Examples
+
+ * [example-spark-batch](repos/example-spark-batch/README.md): example batch data processing application
+ * [example-spark-streaming](repos/example-spark-streaming/README.md): example streaming data processing application
+ * [example-jupyter-notebooks](repos/example-jupyter-notebooks/README.md): examples for working with Jupyter notebooks
+ * [example-kafka-clients](repos/example-kafka-clients/README.md): examples for working with kafka clients
+   * [java](repos/example-kafka-clients/java/README.md)
+   * [php](repos/example-kafka-clients/php/README.md)
+   * [python](repos/example-kafka-clients/python/README.md)
+ * [example-kafka-spark-opentsdb-app](repos/example-kafka-spark-opentsdb-app/README.md): example consumer that feeds data to OpenTSDB
+
+### Documentation
+
+ * [pnda-guide](README.md): this guide
+
+## [References](others/README.md)
+
+## [Changelog](CHANGELOG.md)
diff --git a/SUMMARY.md b/SUMMARY.md
@@ -0,0 +1,85 @@
+# Summary
+
+* [PNDA Guide](README.md)
+* [Overview](overview/README.md)
+* [Download Book](downloads/README.md)
+* [Getting Started](gettingstarted/README.md)
+* [Provisioning](provisioning/README.md)
+    * [Platform requirements](provisioning/platform_requirements.md)
+    * [Getting started with Heat](provisioning/heat.md)
+    * [Creating images for use with Heat templates](repos/pnda-dib-elements/README.md)
+    * [Using the PNDA Heat templates](repos/pnda-heat-templates/README.md)
+    * [Building the package server](repos/pnda-heat-templates/README.md)
+    * [Getting started with SaltStack](provisioning/saltstack.md)
+    * [Provisioning with Salt Cloud](provisioning/salt-cloud.md)
+    * [Configuring a Salt Master](provisioning/saltmaster.md)
+* [Console](console/README.md)
+    * [Metrics](console/metrics.md)
+    * [Packages](console/packages.md)
+    * [Applications](console/applications.md)
+    * [Datasets](console/datasets.md)
+* [Producers](producer/README.md)
+    * [Preparing data](producer/data-preparation.md)
+    * [Integrating Logstash](repos/prod-logstash-codec-avro/README.md)
+    * [Integrating OpenDaylight](producer/opendl.md)
+    * [Integrating OpenBMP](producer/openbmp.md)
+    * [Integrating Pmacct](producer/pmacct.md)
+    * [Developing a producer](producer/producer.md)
+* [Bulk Ingest](bulkingest/README.md)
+    * [Bulk-ingest tool](repos/platform-tools/bulkingest/README.md)
+* [Consumers](consumer/README.md)
+* [Packages & Applications](applications/README.md)
+    * [Deployment Manager](repos/platform-deployment-manager/README.md)
+    * [Example Applications](applications/examples.md)
+    * [Spark Streaming and HBase tutorial](applications/ksh.md)
+    * [Spark Streaming and OpenTSDB tutorial](applications/kso.md)
+* [Log Aggregation](log-aggregation/README.md)
+* [Structured Query](query/README.md)
+    * [Impala](query/impala.md)
+* [Data Exploration](exploration/README.md)
+    * [Jupyter](repos/example-jupyter-notebooks/README.md)
+    * [Exploratory data analytics tutorial](exploration/lab.md)
+* [Time Series](timeseries/README.md)
+    * [OpenTSDB](timeseries/opentsdb.md)
+    * [Grafana](timeseries/grafana.md)
+* [Security](security/README.md)
+* [Repositories](repos/README.md)
+    * Provisioning
+        * [platform-salt](repos/platform-salt/README.md)
+        * [platform-salt-cloud](repos/platform-salt-cloud/README.md)
+        * [pnda-heat-templates](repos/pnda-heat-templates/link.md)
+        * [pnda-dib-elements](repos/pnda-dib-elements/link.md)
+        * [pnda-package-server-docker](repos/pnda-package-server-docker/link.md)
+    * Platform
+        * [platform-libraries](repos/platform-libraries/README.md)
+        * [platform-tools](repos/platform-tools/README.md)
+        * [platform-console-frontend](repos/platform-console-frontend/README.md)
+        * [platform-console-backend](repos/platform-console-backend/README.md)
+            * [console-backend-data-logger](repos/platform-console-backend/console-backend-data-logger/README.md)
+            * [console-backend-data-manager](repos/platform-console-backend/console-backend-data-manager/README.md)
+        * [platform-testing](repos/platform-testing/README.md)
+        * [platform-deployment-manager](repos/platform-deployment-manager/link.md)
+        * [platform-data-mgmnt](repos/platform-data-mgmnt/README.md)
+            * [data-service](repos/platform-data-mgmnt/data-service/README.md)
+            * [hdfs-cleaner](repos/platform-data-mgmnt/hdfs-cleaner/README.md)
+            * [oozie-templates](repos/platform-data-mgmnt/oozie-templates/README.md)
+        * [platform-elk-dashboards](repos/platform-elk-dashboards/README.md)
+        * [platform-package-repository](repos/platform-package-repository/README.md)
+    * Forked Projects
+        * [gobblin](repos/gobblin/README.md)
+    * Producers
+        * [prod-odl-kafka](repos/prod-odl-kafka/README.md)
+        * [prod-logstash-codec-avro](repos/prod-logstash-codec-avro/link.md)
+    * Examples
+        * [example-spark-batch](repos/example-spark-batch/README.md)
+        * [example-spark-streaming](repos/example-spark-streaming/README.md)
+        * [example-jupyter-notebooks](repos/example-jupyter-notebooks/link.md)
+        * [example-kafka-clients](repos/example-kafka-clients/README.md)
+            * [java](repos/example-kafka-clients/java/README.md)
+            * [php](repos/example-kafka-clients/php/README.md)
+            * [python](repos/example-kafka-clients/python/README.md)
+        * [example-kafka-spark-opentsdb-app](repos/example-kafka-spark-opentsdb-app/README.md)
+    * Documentation
+        * pnda-guide
+* [References](others/README.md)
+* [Changelog](CHANGELOG.md)