Skip to content

Latest commit

 

History

History
29 lines (24 loc) · 1.49 KB

python-runtime.md

File metadata and controls

29 lines (24 loc) · 1.49 KB

Python Runtime

The python runtime provides a simple mechanism to run a transform on a set of input data to produce a set of output data, all within a single python execution environment. We currently support two options of Python execution:

  • Sequential execution - all files are processed sequentially
  • Usage of the Python multiprocessing pool. In this case execution start user-defined number of processors which allows to parallelize data processing

Note some of transformers, for example, exact dedup do not support multi processing Python runtime, as they rely on a shared classes, which are not supported by this runtime

To support multiprocessing pool based runtime, Python execution introduced an additional parameter:

  • runtime_num_processors defines the number of processors to use for execution. If this number is greater then 0, multiprocessing pool runtime is used with number of processor equal to num_processors. Default number of processors is 0.

Usage of this parameter allows user to choose the type of Python execution runtime and configure parallelism in the case of multiprocessing pool.

A PythonTransformLauncher class is provided that enables the running of the transform. For example,

launcher = PythonTransformLauncher(YourTransformConfiguration())
launcher.launch()

The YourTransformConfiguration class configures your transform. More details can be found in the transform tutorial.