#149: Update readme

AbsaOSS · Jul 15, 2020 · cce7667 · cce7667
1 parent f86fcfb
commit cce7667
Showing 1 changed file with 11 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -182,21 +182,25 @@ Any additional properties for the `DataStreamWriter` can be added with the prefi
 
 #### Common writer properties
 
-| Property Name | Description |
-| :--- | :--- | 
+| Property Name | Required |Description |
+| :--- | :---: | :--- |
 | `writer.common.checkpoint.location` | Yes | Used for Spark property `checkpointLocation`. The checkpoint location has to be unique among different workflows. |
-| `writer.common.trigger.type` | Either `Once` for one-time execution or `ProcessingTime` for micro-batch executions for micro-batch execution. Default: `Once`. See also [Combination of trigger and termination method](#combination-of-trigger-and-termination-method) |
-| `writer.common.trigger.processing.time` | Interval in ms for micro-batch execution (using `ProcessingTime`). Default: 0ms, i.e. execution as fast as possible. |
+| `writer.common.trigger.type` | No | Either `Once` for one-time execution or `ProcessingTime` for micro-batch executions for micro-batch execution. Default: `Once`. |
+| `writer.common.trigger.processing.time` | No | Interval in ms for micro-batch execution (using `ProcessingTime`). Default: 0ms, i.e. execution as fast as possible. |
 
-#### Combination of Trigger and termination method
+#### Behavior of Triggers
 
-| Trigger (`writer.common.trigger.type`) | With timeout (`ingestor.spark.termination.timeout`) | Runtime | Details |
+| Trigger (`writer.common.trigger.type`) | Timeout (`ingestor.spark.termination.timeout`) | Runtime | Details |
 | :--- | :--- | :--- | :--- |
 | Once | No timeout | Limited | Consumes all data that is available at the beginning of the micro-batch. The query processes exactly one micro-batch and stops then, even if more data would be available at the end of the micro-batch. |
-| Once | With timeout | Limited | Same as above, but terminates at the timeout. If the timeout is reached before the micro-batch is processed, it won't be completed and no data will be committed. |
 | ProcessingTime | With timeout | Limited | Consumes data in micro-batches and only stops when the timeout is reached or the query is killed. |
 | ProcessingTime | No timeout | Long-running | Consumes data in micro-batches and only stops when the query is killed. |
 
+- Note 1: The first micro-batch of the query will contain all available messages to consume and can therefore be quite large,
+ even if the trigger `ProcessingTime` is configured, and regardless of what micro-batch interval is configured.
+ To limit the size of a micro-batch, the property `reader.option.maxOffsetsPerTrigger` should be used. See also http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
+- Note 2: It's possible to define a timeout for trigger `Once`. If the timeout is reached before the micro-batch is processed, it won't be completed and no data will be committed. Such a behavior seems quite unpredictable and therefore we don't recommend it.
+
 See the [Spark Documentation](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers) for more information about triggers.
 
 #### Hyperdrive Context