Skip to content

Commit

Permalink
#149: Update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
kevinwallimann committed Jul 15, 2020
1 parent f86fcfb commit cce7667
Showing 1 changed file with 11 additions and 7 deletions.
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,21 +182,25 @@ Any additional properties for the `DataStreamWriter` can be added with the prefi

#### Common writer properties

| Property Name | Description |
| :--- | :--- |
| Property Name | Required |Description |
| :--- | :---: | :--- |
| `writer.common.checkpoint.location` | Yes | Used for Spark property `checkpointLocation`. The checkpoint location has to be unique among different workflows. |
| `writer.common.trigger.type` | Either `Once` for one-time execution or `ProcessingTime` for micro-batch executions for micro-batch execution. Default: `Once`. See also [Combination of trigger and termination method](#combination-of-trigger-and-termination-method) |
| `writer.common.trigger.processing.time` | Interval in ms for micro-batch execution (using `ProcessingTime`). Default: 0ms, i.e. execution as fast as possible. |
| `writer.common.trigger.type` | No | Either `Once` for one-time execution or `ProcessingTime` for micro-batch executions for micro-batch execution. Default: `Once`. |
| `writer.common.trigger.processing.time` | No | Interval in ms for micro-batch execution (using `ProcessingTime`). Default: 0ms, i.e. execution as fast as possible. |

#### Combination of Trigger and termination method
#### Behavior of Triggers

| Trigger (`writer.common.trigger.type`) | With timeout (`ingestor.spark.termination.timeout`) | Runtime | Details |
| Trigger (`writer.common.trigger.type`) | Timeout (`ingestor.spark.termination.timeout`) | Runtime | Details |
| :--- | :--- | :--- | :--- |
| Once | No timeout | Limited | Consumes all data that is available at the beginning of the micro-batch. The query processes exactly one micro-batch and stops then, even if more data would be available at the end of the micro-batch. |
| Once | With timeout | Limited | Same as above, but terminates at the timeout. If the timeout is reached before the micro-batch is processed, it won't be completed and no data will be committed. |
| ProcessingTime | With timeout | Limited | Consumes data in micro-batches and only stops when the timeout is reached or the query is killed. |
| ProcessingTime | No timeout | Long-running | Consumes data in micro-batches and only stops when the query is killed. |

- Note 1: The first micro-batch of the query will contain all available messages to consume and can therefore be quite large,
even if the trigger `ProcessingTime` is configured, and regardless of what micro-batch interval is configured.
To limit the size of a micro-batch, the property `reader.option.maxOffsetsPerTrigger` should be used. See also http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
- Note 2: It's possible to define a timeout for trigger `Once`. If the timeout is reached before the micro-batch is processed, it won't be completed and no data will be committed. Such a behavior seems quite unpredictable and therefore we don't recommend it.

See the [Spark Documentation](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers) for more information about triggers.

#### Hyperdrive Context
Expand Down

0 comments on commit cce7667

Please sign in to comment.