Skip to content

Latest commit

 

History

History
36 lines (23 loc) · 1.45 KB

Spark.md

File metadata and controls

36 lines (23 loc) · 1.45 KB

Spark

Apache Spark™ is a unified analytics engine for large-scale data processing.

Overview

Feature

  • Speed: Run workloads 100x faster than original Hadoop MapReduce.
  • Compatibility: Written in Scala. But can also support Java, Python, R, and SQL
  • Generality: Combine SQL, streaming, and complex analytics. Generality
  • Runs Everywhere: Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources.
    • You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes.
    • Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.

Libraries

MLlib for machine learning

Spark Modes

Links