Apache Spark™ is a unified analytics engine for large-scale data processing.
Feature
- Speed: Run workloads 100x faster than original Hadoop MapReduce.
- Compatibility: Written in Scala. But can also support Java, Python, R, and SQL
- Generality: Combine SQL, streaming, and complex analytics.
- Runs Everywhere: Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources.
- You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes.
- Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.