Skip to content

Commit

Permalink
chore: start beta5 updateS (#312)
Browse files Browse the repository at this point in the history
  • Loading branch information
morenol authored Dec 15, 2024
1 parent da37cd2 commit ecef3e8
Show file tree
Hide file tree
Showing 7 changed files with 95 additions and 39 deletions.
2 changes: 1 addition & 1 deletion sdf/SDF_VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
sdf-beta4
sdf-beta6
2 changes: 1 addition & 1 deletion sdf/_embeds/install-sdf.bash
Original file line number Diff line number Diff line change
@@ -1 +1 @@
fvm install sdf-beta4
fvm install sdf-beta5
20 changes: 12 additions & 8 deletions sdf/cli/deploy.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -56,17 +56,17 @@ SDF - Stateful Dataflow
Usage: <COMMAND>

Commands:
show Show or List states. Use `show state --help` for more info
select
delete
restart
stop
show Show or List states or dataflows Use `show --help` for more info
select Select dataflow in context
delete Delete a dataflow
restart Restart a dataflow
stop Stop a dataflow
sql Start sql mode
exit Stop interactive session
help Print this message or the help of the given subcommand(s)

```

#### `show state`
Show states or show state for given namespace and key.
Expand All @@ -88,9 +88,13 @@ Options:
Where:
* `--key` and `--filter` refines the result.
#### SQL mode
Use the SQL mode in the CLI, to be able to run SQL queries on SDF states. See more details in [sql mode for sdf run]
### Managing dataflow in interactive shell
Please see the [deployment] section for more details.

[deployment]: /sdf/deployment
[deployment]: /sdf/deployment
[sql mode for sdf run]: /sdf/cli/run.mdx#sql-mode
64 changes: 58 additions & 6 deletions sdf/cli/run.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,8 @@ Options:
when set, it will skip running the service
--build-profile <BUILD_PROFILE>
[default: release]
--dev
set runtime to use dev mode [env: DEV=]
--prod
set runtime to use production mode [env: PROD=]
set runtime to use production mode this will disable dev configurations [env: PROD=]
--force-update
Force update
```
Expand All @@ -49,7 +47,6 @@ Where:
* `--env` sets environment variables to be passed to operators
* `--skip-running` - compiles components and exists without running the dataflow
* `--build-profile` - sets the build profile
* `--dev` - sets runtime to apply dev specific parameters
* `--prod` - sets runtime to apply prod specific parameters
* `--force-update` - forces the update of the project dependencies

Expand All @@ -68,10 +65,11 @@ Usage: <COMMAND>

Commands:
show Show or List states. Use `show state --help` for more info
sql Start sql mode
exit Stop interactive session
help Print this message or the help of the given subcommand(s)
```


#### `show state`

Show states or show state for given namespace and key.
Expand All @@ -95,7 +93,6 @@ Where:

#### Examples


##### Run command

Navigate to the directory with `dataflow.yaml` file, and run the command:
Expand Down Expand Up @@ -128,3 +125,58 @@ Show the detailed information:
Key Window succeeded failed
stats * 2 0
```

#### SQL mode

Use the SQL mode in the CLI, to be able to run SQL queries on SDF states. For a given dataflow, we will have in context for SQL all the dataframe states, which are basically the states with an `arrow-row` value.

For states that are scoped to a window, we will have access to the last flush state. For states that are not window aware we will have access to the global state.

In order to enter the SQL mode, type `sql` in the SDF interactive shell. In the SQL mode we could perform any sql command supported by the polars engine.

#### Examples:

##### Run command

Navigate to the directory with `dataflow.yaml` file, and run the command:

```bash
$ sdf run
```

##### Enter the SQL mode

Using the sql command:

```bash
>> sql
SDF SQL version sdf-beta5
Type .help for help.
```

#### Show tables in context
```bash
sql>> show tables
shape: (1, 1)
┌────────────────┐
│ name │
│ --- │
│ str │
╞════════════════╡
│ count_per_word │
└────────────────┘
```

#### Perform a query

```bash
sql>> select * from count_per_word;
shape: (0, 2)
┌──────┬─────────────┐
│ _key ┆ occurrences │
│ --- ┆ --- │
│ str ┆ u32 │
╞══════╪═════════════╡
│ abc │ 10 |
└──────┴─────────────┘
```
2 changes: 1 addition & 1 deletion sdf/concepts/dataflow-yaml.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ To develop package from start:

* Create a local package
* Add `dev` section to the `dataflow.yaml` file to locate the local package.
* Run the dataflow with the `--dev` flag to load the local package instead of downloading them from the Hub.
* Run the dataflow without the `--prod` flag to load the local package instead of downloading them from the Hub.
* Repeat the process until the package is ready for publishing.
* Then publish the package to the Hub.

Expand Down
15 changes: 10 additions & 5 deletions sdf/concepts/state-dataframe.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ Then this will be mapped to arrow dataframe as follows:
| banana | 2 |
| grape | 1 |

## Updating a Dataframe state

To update the state, you can use the `update-state` operator as below:

Expand All @@ -55,15 +56,16 @@ This API is invoked by the `update-state` operator, which only returns the value

In the example, `count_per_word` represents a row value of the dataframe. If operator sees `apple`, it will be first row in the dataframe above.

However, aggregate operators like `flush` can access the entire state and perform aggregation across all partitions. In this case, the `count_per_word` state function returns the entire DataFrame, not just individual rows. You can then perform DataFrame operations using the SQL API. The snippet below shows how to use SQL to get the 3 most frequent words.
## SQL function

Aggregate operators like `flush`, or external services that reference a state can perform SQL queries on the aggregated data of all partitions of a state. In order to do that, is introduced a function `sql` to the context. The `sql` state function performs the SQL operation passed as parameter on the aggregated view of the states and not in their individual rows. The snippet below shows how to use SQL to get the 3 most frequent words.

```yaml
flush:
run: |
fn aggregate_wordcount() -> Result<TopWords> {
let word_counts = count_per_word();
let top3 = word_counts.sql("select * from count_per_word order by count desc limit 3")?;
let top3 = sql("select * from count_per_word order by count desc limit 3")?;
let rows = top3.rows()?;
let mut top_words = vec![];
Expand All @@ -81,15 +83,18 @@ flush:
}
```

The output of the `sql` function implements also the following methods that will be described above: sql, rows, col, key, next

## SQL API

For any state that is dataframe, you can use SQL API to perform dataframe operation. SDF uses polar SQL to perform dataframe operation.
The result of the SQL operation is always dataframe. So you can perform multiple SQL operation to get the desired result.

The SQL is executed in the context of the dataframe. And name of the dataframe is state as illustrated below:
The SQL is executed in the context of all the available dataframes, so you can perform any JOIN or complex SQL operations with them. Each dataframe is represented as a table, and each table name is their state name replacing hyphens(-) with underscores(_) as illustrated below.


```rust
let top3 = word_counts.sql("select * from count_per_word order by count desc limit 3")?;
let top3 = sql("select * from count_per_word order by count desc limit 3")?;
```

## Row API
Expand Down
29 changes: 12 additions & 17 deletions sdf/whatsnew.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,6 @@ To upgrade CLI to the beta4, run the following command:

<CodeBlock language="bash">{InstallFvm}</CodeBlock>

Make sure that [`wasm32-wasip2`](https://doc.rust-lang.org/rustc/platform-support/wasm32-wasip2.html#wasm32-wasip2) target is installed. Typically, can be installed via:

```bash
$ rustup target add wasm32-wasip2
```

To upgrade host workers, shutdown and restart the worker:

```bash
Expand All @@ -30,26 +24,27 @@ $ sdf worker create <host-worker-name>

For upgrading cloud workers, please contact [InfinyOn support](#infinyon-support).

### Compatibility and Breaking changes
## Featured change

- Added [sql mode] to interactive shell. With this change, user should be able to run SQL queries (including JOINS) in states of dataflow. The states that support the queries are the [dataframe states]. In particular, when the state has a window context, the queries are againts the last flushed state.

### CLI changes
- `sdf setup` now checks that Fluvio is running and that we can connect to it.

### Changes
- renamed [configuration used to connect to remote clusters] from `profile` to `remote_cluster_profile`.
- updated to use `wasm32-wasip2` target for building wasm modules.
- `sdf run` not longer accepts `--dev`. Develoment mode is now the default for `sdf run`. If you want to run in non-development mode use `--prod`.

### Improvements

## Improvements
- Support definition of [nested types].
- Added capability to run complex queries like join on states in operator context through the [sql function].
- Performance improvements.
- Improved error messages when nested types definitions are wrong.

## Bug Fixes
- When using windows, events with an older timestamp are skipped now.
### Changes
- Replaced dashes in tables. Previously, when the state name has dashes in it, we were escaping the state name in sql context with quotes. From sdf-beta5, we should access them using `_` instead of `-` on the table name in order to avoid the escaping.

## InfinyOn Support

For any questions or issues, please contact InfinyOn support at [email protected] or https://discordapp.com/invite/bBG2dTz

[configuration used to connect to remote clusters]: concepts/dataflow-yaml.mdx#topics
[nested types]: concepts/types.mdx#nested-types
[sql mode]: cli/run.mdx#sql-mode
[sql function]: concepts/state-dataframe.mdx#sql-function
[dataframe states]: concepts/state-dataframe.mdx

0 comments on commit ecef3e8

Please sign in to comment.