Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
pumpikano committed Apr 6, 2016
1 parent bacfc71 commit b70a259
Showing 1 changed file with 9 additions and 3 deletions.
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
# Spark Partition Server

`spark-partition-server` is a set of light-weight Python components to launch servers on the executors of a Spark cluster.
`spark-partition-server` is a set of light-weight Python components to launch servers on the executors of an Apache Spark cluster.

## Overview

Spark is designed for manipulating and distributing data within the cluster, but not for allowing clients to interact with the data directly. `spark-partition-server` provides primitives for launching arbitrary servers on partitions of an RDD, registering and managing the partitions servers on the driver, and collecting any resulting RDD after the partition servers are shutdown.
[Apache Spark](https://spark.apache.org/) is designed for manipulating and distributing data within a cluster, but not for allowing clients to interact with the data directly. `spark-partition-server` provides primitives for launching arbitrary servers on partitions of an RDD, registering and managing the partitions servers on the driver, and collecting any resulting RDD after the partition servers are shutdown.

There are many use-cases such as building ad hoc search clusters to query data more quickly by skipping Spark's job planning, allowing external services to interact directly with in-memory data on Spark as part of a computing pipeline, and enabling distributed computations amongst executors involving direct communication. Spark Partition Server itself provides building blocks for these use cases.
There are many use-cases such as building ad hoc search clusters to query data more quickly by skipping Spark's job planning, allowing external services to interact directly with in-memory data on Spark as part of a computing pipeline, and enabling distributed computations amongst executors involving direct communication (eg. [CaffeOnSpark](https://github.com/yahoo/CaffeOnSpark)). Spark Partition Server provides building blocks for these use cases.

## Installation

```
pip install spark-partition-server
```

## Simple Usage Example

Expand Down

0 comments on commit b70a259

Please sign in to comment.