python

[SPARK-22939][PYSPARK] Support Spark UDF in registerFunction

Jan 4, 2018

5aadbc9 · Jan 4, 2018

This branch is 22750 commits behind apache/spark:master.

Name	Name	Last commit message	Last commit date
parent directory ..
docs	docs	[SPARK-21866][ML][PYSPARK] Adding spark image reader	Nov 22, 2017
lib	lib	[SPARK-21278][PYSPARK] Upgrade to Py4J 0.10.6	Jul 5, 2017
pyspark	pyspark	[SPARK-22939][PYSPARK] Support Spark UDF in registerFunction	Jan 4, 2018
test_support	test_support	[SPARK-19610][SQL] Support parsing multiline CSV files	Feb 28, 2017
.gitignore	.gitignore	[SPARK-3946] gitignore in /python includes wrong directory	Oct 14, 2014
MANIFEST.in	MANIFEST.in	[SPARK-18652][PYTHON] Include the example data and third-party licens…	Dec 6, 2016
README.md	README.md	[SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setup.py file.	Dec 27, 2017
pylintrc	pylintrc	[SPARK-13596][BUILD] Move misc top-level build files into appropriate…	Mar 7, 2016
run-tests	run-tests	[SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integr…	Jun 28, 2015
run-tests.py	run-tests.py	[SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add…	Sep 1, 2017
setup.cfg	setup.cfg	[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed	Nov 16, 2016
setup.py	setup.py	[SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setup.py file.	Dec 27, 2017

README.md

Apache Spark

Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.

http://spark.apache.org/

Online Documentation

You can find the latest Spark documentation, including a programming guide, on the project web page

Python Packaging

This README file only contains basic information related to pip installed PySpark. This packaging is currently experimental and may change in future versions (although we will do our best to keep compatibility). Using PySpark requires the Spark JARs, and if you are building this from source please see the builder instructions at "Building Spark".

The Python packaging for Spark is not intended to replace all of the other use cases. This Python packaged version of Spark is suitable for interacting with an existing cluster (be it Spark standalone, YARN, or Mesos) - but does not contain the tools required to setup your own standalone Spark cluster. You can download the full version of Spark from the Apache Spark downloads page.

NOTE: If you are using this with a Spark standalone cluster you must ensure that the version (including minor version) matches or you may experience odd errors.

Python Requirements

At its core PySpark depends on Py4J (currently version 0.10.6), but some additional sub-packages have their own extra requirements for some features (including numpy, pandas, and pyarrow).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

python

python

README.md

Apache Spark

Online Documentation

Python Packaging

Python Requirements

Files

python

Directory actions

More options

Directory actions

More options

Latest commit

History

python

Folders and files

parent directory

README.md

Apache Spark

Online Documentation

Python Packaging

Python Requirements