Skip to content

Commit

Permalink
Adjusted to cover v24.5.2
Browse files Browse the repository at this point in the history
  • Loading branch information
nsoubelet committed Oct 26, 2024
1 parent 214ab95 commit 047eb4c
Show file tree
Hide file tree
Showing 14 changed files with 185 additions and 43 deletions.
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ pipeline {
stage('Build and Publish') {
environment {
SERVICE_NAME = "kubling-docs"
VERSION = "24.5.1"
VERSION = "24.5.2"
DOCKERHUB_CREDS = credentials('kubling-dockerhub')
}
steps {
Expand Down
11 changes: 11 additions & 0 deletions component/Tag.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
function Tag({ description, bgColor, txtColor, txtFont }) {

return (
<span
class={ `inline-flex items-center rounded-md px-2 py-1 text-sm font-medium ring-1 ring-inset ring-gray-500/10 ${ txtFont ?? "font-gilroyMedium" } ${ bgColor ?? "bg-gray-50" } ${ txtColor ?? "text-gray-600" }` }>
{description}
</span>
);

}
export default Tag;
2 changes: 1 addition & 1 deletion pages/CLI/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The main goal of the CLI is to facilitate common tasks required when configuring
It is designed to be easily embedded into pipelines, as well as into developers' environments.

### A note on its size
Although Kubling is not a heavy application (its OCI Image is \<120MB), and its Community Edition (CE) version includes nearly all the features needed during
Although Kubling is not a heavy application (its OCI Image is \~125MB), and its Community Edition (CE) version includes nearly all the features needed during
development—making it a perfect fit for local development environments—certain tasks like testing can be challenging and time-consuming
if every small change to a Script Module requires restarting a test container.

Expand Down
1 change: 1 addition & 0 deletions pages/Engine/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"queries": "Queries",
"script-context-members": "",
"data-sources": "Data Sources",
"aggregators": "Aggregators",
"endpoints": "",
"properties": "",
"transactions": ""
Expand Down
62 changes: 62 additions & 0 deletions pages/Engine/aggregators.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Aggregators

The **aggregator** is a special `schema` type that enables automatic view creation based on other data sources.
This feature simplifies querying across multiple sources by aggregating data into views, which are easier to manage.

Consider a scenario where we have several Kubernetes clusters, each defined as a data source within our VDB (Virtual Database).
If we want to know, for example, which cluster a specific deployment is running on, one approach is to write a query that joins tables from all relevant schemas.
However, this process becomes increasingly complex when managing dozens of clusters.

Relational databases address such scenarios with `VIEWs`, and Kubling follows the same pattern. A `VIEW` is a virtual table constructed from the result of a query,
which can be defined in DDL. But in our Kubernetes example, where each cluster has its own schema, it’s unclear where to define a `VIEW` that spans all clusters.

This is where **aggregators** come in. They automatically generate views from tables across multiple `schemas`, streamlining multi-cluster data queries.

Example:
```yaml
aggregatorSchemaModels:
- name: "k8s"
schemas:
- "k8s_1"
- "k8s_2"
tables:
- name: "DEPLOYMENT"
options:
updatable: false
- name: "NAMESPACE"
options:
updatable: false
- name: "DEPLOYMENT_CONDITIONS"
options:
updatable: false
- name: "DEPLOYMENT_CONTAINER"
options:
updatable: false
- name: "DEPLOYMENT_CONTAINER_VOLS"
options:
updatable: false
```
This configuration creates a new `SCHEMA` called `k8s` in the `VDB`.
The tables in this schema, such as `DEPLOYMENT` and `NAMESPACE`, are actually `VIEWs` that aggregate data from the `k8s_1` and `k8s_2` schemas using a `UNION ALL`.

## Primary Keys
When using aggregators, be cautious when selecting the primary key for your `SCHEMA` definitions, as key duplication may occur.

If a data source does not provide a globally unique key, you should use the Kubling `val_pk` directive while considering the aggregation implications.

For instance, in Kubernetes, if you have only one cluster, it may not be necessary to create primary keys that concatenate the cluster name.
However, if you plan to add more clusters in the future, you should construct the primary key to include a cluster identifier, as shown below:
```sql
...
identifier string OPTIONS(val_pk 'clusterName+metadata__namespace+metadata__name+name'),
PRIMARY KEY(identifier),
...
```

## Errors or empty results
Since `VIEWS` in the aggregator context rely on `UNION ALL` (which includes duplicates) to merge results,
the query planner (DQP) evaluates costs differently compared to standard queries.
In the case of Kubernetes data sources, this behavior might be influenced by the `blankNamespaceStrategy` configuration.

For more information on `blankNamespaceStrategy`, check out the detailed explanation [here](/Engine/data-sources#the-importance-of-blank-namespace-strategy-).
71 changes: 55 additions & 16 deletions pages/Engine/data-sources.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { Callout, Steps } from 'nextra/components'
import Tag from '../../component/Tag'

# Data Sources

Expand Down Expand Up @@ -113,6 +114,41 @@ Sample config in VDB file:

[Check here](/schemas#kubernetes-data-source-configuration) the Kubernetes Data Source configuration spec.

### The importance of blank namespace strategy <Tag description={"v24.5.2+"} />

In Kubernetes, resources can be divided into two main groups: namespaced and non-namespaced resources. For the first group, it is mandatory to place the resource within a namespace, whereas the second group consists of cluster-wide or node-wide resources.

This type of design introduces certain challenges to Kubling's DQP. The first challenge is how to behave when querying namespaced resources without specifying a namespace.

For example, the following query:
```sql
SELECT * from DEPLOYMENT
```

Could mean two things to the user:
1. **Return all cluster deployments** or...
2. **Return deployments in the default namespace**, because if I needed specific namespaces or all, I would explicitly specify that in the query.

The second challenge is how to weigh queries (often referred to as node cardinality) when choosing option 2, mentioned above.
The query planner is designed in a way that makes decisions based on internal statistics, which may sometimes yield unpredictable results when APIs are design as Kubernetes'.

Let's use the following query as an example:
```sql
SELECT * from DEPLOYMENT dp
JOIN NAMESPACE AS ns ON dp.metadata__namespace = ns.metadata__name
```

In certain circumstances, the DQP may decide, based on internal statistics (or the lack thereof), that fetching all deployments and all namespaces in parallel,
then evaluating them in memory to return only those that matches criteria `dp.metadata__namespace = ns.metadata__name`, even rewriting the criteria as `IN`, instead of equals,
is cheaper than fetching all namespaces and then iterating and fetching deployments one by one.

If this happens and option 2 is selected, the result will likely include deployments from the default namespace, which is not what the query is clearly trying to retrieve.

As a conclusion, if you're planning to use `JOINS`, and to avoid unpredictable results, we suggest opting for **option 1**, which can be configured via `blankNamespaceStrategy` in
the Kubernetes Data Source configuration.

### DDL

Kubling comes with a built-in, statically compiled module for Kubernetes that uses, by default, a DDL files with all the supported tables.
Please note that it has only a few synthetic `TABLES`, therefore, in case you need a much more specialized or even reduced version, please use the
following as a starting point.
Expand Down Expand Up @@ -169,8 +205,8 @@ CREATE FOREIGN TABLE NODE_CONDITIONS
metadata__name string NOT NULL OPTIONS(synthetic_type 'parent'),
metadata__namespace string OPTIONS(synthetic_type 'parent'),
lastTransitionTime string,
lastUpdateTime string,
lastTransitionTime timestamp,
lastUpdateTime timestamp,
message string,
reason string,
status string,
Expand Down Expand Up @@ -272,6 +308,7 @@ CREATE FOREIGN TABLE DEPLOYMENT
schema string OPTIONS(val_constant '{{ schema.name }}'),
metadata__name string,
metadata__namespace string,
metadata__uid string,
metadata__labels json OPTIONS(parser_format 'asJsonPretty'),
spec__template__spec__containers json OPTIONS(parser_format 'asJsonPretty'),
spec__selector__matchLabels json OPTIONS(parser_format 'asJsonPretty'),
Expand All @@ -295,9 +332,10 @@ CREATE FOREIGN TABLE DEPLOYMENT_CONDITIONS
schema string OPTIONS(val_constant '{{ schema.name }}'),
metadata__name string NOT NULL OPTIONS(synthetic_type 'parent'),
metadata__namespace string OPTIONS(synthetic_type 'parent'),
metadata__uid string OPTIONS(synthetic_type 'parent'),
lastTransitionTime string,
lastUpdateTime string,
lastTransitionTime timestamp,
lastUpdateTime timestamp,
message string,
reason string,
status string,
Expand All @@ -315,6 +353,7 @@ CREATE FOREIGN TABLE DEPLOYMENT_CONTAINER
metadata__name string NOT NULL OPTIONS(synthetic_type 'parent'),
metadata__namespace string OPTIONS(synthetic_type 'parent'),
metadata__labels string OPTIONS(updatable false, synthetic_type 'parent'),
metadata__uid string OPTIONS(synthetic_type 'parent'),
image string NOT NULL,
name string NOT NULL,
Expand All @@ -337,6 +376,7 @@ CREATE FOREIGN TABLE DEPLOYMENT_CONTAINER_VOLS
metadata__name string NOT NULL OPTIONS(synthetic_type 'parent'),
metadata__namespace string OPTIONS(synthetic_type 'parent'),
metadata__labels string OPTIONS(synthetic_type 'parent'),
metadata__uid string OPTIONS(synthetic_type 'parent'),
containerName string OPTIONS(synthetic_type 'parent_array_key', synthetic_parent_field 'name'),
containerImage string OPTIONS(synthetic_type 'parent_array_key', synthetic_parent_field 'image'),
Expand Down Expand Up @@ -375,7 +415,7 @@ CREATE FOREIGN TABLE PERSISTENT_VOLUME
spec__volumeMode string,
spec__additionalProperties json OPTIONS(parser_format 'asJsonPretty'),
status__lastPhaseTransitionTime string,
status__lastPhaseTransitionTime timestamp,
status__message string,
status__phase string,
status__reason string,
Expand Down Expand Up @@ -720,10 +760,10 @@ CREATE FOREIGN TABLE EVENT
action string,
deprecatedCount integer,
deprecatedFirstTimestamp string,
deprecatedLastTimestamp string,
deprecatedFirstTimestamp timestamp,
deprecatedLastTimestamp timestamp,
deprecatedSource json OPTIONS(parser_format 'asJsonPretty'),
eventTime json OPTIONS(parser_format 'asJsonPretty'),
eventTime timestamp,
note string,
reason string,
regarding json OPTIONS(parser_format 'asJsonPretty'),
Expand Down Expand Up @@ -840,7 +880,7 @@ CREATE FOREIGN TABLE HORIZONTAL_POD_AUTOSCALER
status__currentMetrics json OPTIONS(parser_format 'asJsonPretty'),
status__currentReplicas integer,
status__desiredReplicas integer,
status__lastScaleTime string,
status__lastScaleTime timestamp,
status__observedGeneration long,
identifier string NOT NULL OPTIONS(val_pk 'clusterName+metadata__namespace+metadata__name' ),
Expand Down Expand Up @@ -991,12 +1031,12 @@ CREATE FOREIGN TABLE JOB
status__active integer,
status__completedIndexes string,
status__completionTime string,
status__completionTime timestamp,
status__conditions json OPTIONS(parser_format 'asJsonPretty'),
status__failed integer,
status__failedIndexes string,
status__ready integer,
status__startTime string,
status__startTime timestamp,
status__succeeded integer,
status__terminating integer,
status__uncountedTerminatedPods json OPTIONS(parser_format 'asJsonPretty'),
Expand Down Expand Up @@ -1028,8 +1068,8 @@ CREATE FOREIGN TABLE CRON_JOB
spec__timeZone string,
status__active json OPTIONS(parser_format 'asJsonPretty'),
status__lastScheduleTime string,
status__lastSuccessfulTime string,
status__lastScheduleTime timestamp,
status__lastSuccessfulTime timestamp,
identifier string NOT NULL OPTIONS(val_pk 'clusterName+metadata__namespace+metadata__name' ),
PRIMARY KEY(identifier),
Expand Down Expand Up @@ -1241,11 +1281,11 @@ CREATE FOREIGN TABLE LEASE
metadata__namespace string,
metadata__labels json OPTIONS(parser_format 'asJsonPretty'),
spec__acquireTime string,
spec__acquireTime timestamp,
spec__holderIdentity string,
spec__leaseDurationSeconds integer,
spec__leaseTransitions integer,
spec__renewTime string,
spec__renewTime timestamp,
identifier string NOT NULL OPTIONS(val_pk 'clusterName+metadata__namespace+metadata__name' ),
PRIMARY KEY(identifier),
Expand Down Expand Up @@ -1283,7 +1323,6 @@ OPTIONS(updatable true,
supports_idempotency false,
tags 'kubernetes;{{ schema.properties.cluster_name }};crd;customresourcedefinition');
```

</details>

## Script Document Data Source (`SCRIPT_DOCUMENT_JS`)
Expand Down
1 change: 0 additions & 1 deletion pages/Engine/queries.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ import { Callout, Steps } from 'nextra/components'

Kubling provides most of the functionality of SQL-92 DML.
The idea of this document is not to cover SQL exhaustively, but rather highlights how SQL is used in Kubling.
For details about the exact form of SQL that Kubling accepts, [see grammar.](/Engine/grammar)

## Identifiers

Expand Down
27 changes: 14 additions & 13 deletions pages/Observability/metrics.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { Callout } from 'nextra/components'
import Tag from '../../component/Tag'

# Metrics

Expand Down Expand Up @@ -152,46 +153,46 @@ Provides information about currently **active** HTTP requests.
## Engine related metrics
These metrics come with without any tags by default. You can specify yours in the `metricsCommonTags` attribute of the [Main application configuration](/schemas#main-application-configuration).

### `kubling.engine.sql.sessions.active` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.engine.sql.sessions.active` <Tag description={"Gauge"} />
Provides the number of current active SQL Sessions (Native & PG transports).

### `kubling.engine.sql.plan.active` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.engine.sql.plan.active` <Tag description={"Gauge"} />
Number of current active SQL atomic plans. Atomic plans are effective work items within the DQP, that is, once the DQP identified the Data Sources involved in the query.

### `kubling.engine.sql.plan.enqueued.current` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.engine.sql.plan.enqueued.current` <Tag description={"Gauge"} />
Number of current Jobs (queries sent by any valid session) waiting for a free thread to process the plan.

### `kubling.engine.sql.plan.enqueued_time.max` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.engine.sql.plan.enqueued_time.max` <Tag description={"Gauge"} />
The maximum time a plan spent waiting in the queue for a free thread during the current execution.

### `kubling.engine.sql.threads.active` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.engine.sql.threads.active` <Tag description={"Gauge"} />
Number of current active SQL related threads, like DQP's plan processing. Take into account that one query may trigger multiple threads, one per atomic plan.

### `kubling.engine.sql.jobs.total` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.engine.sql.jobs.total` <Tag description={"Gauge"} />
Number of total submitted SQL Jobs.

### `kubling.engine.sql.jobs.completed` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.engine.sql.jobs.completed` <Tag description={"Gauge"} />
Number of total completed SQL Jobs.

### `kubling.engine.sql.jobs.enqueued` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.engine.sql.jobs.enqueued` <Tag description={"Gauge"} />
Number of current enqueued SQL Jobs.

### `kubling.engine.sql.jobs.max` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.engine.sql.jobs.max` <Tag description={"Gauge"} />
Maximum number of SQL jobs running in parallel during the current execution.

### `kubling.engine.sql.jobs.enqueued.max` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.engine.sql.jobs.enqueued.max` <Tag description={"Gauge"} />
Maximum number of SQL jobs enqueued during the current execution.

### `kubling.js.threads.active` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.js.threads.active` <Tag description={"Gauge"} />
Provides the number of current active JavaScript context threads.

### `kubling.js.executions` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Counter</span>
Provides the number of total JavaScript context threads claimed during current execution.

### `kubling.js.auth.thread.active` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Gauge</span>
### `kubling.js.auth.thread.active` <Tag description={"Gauge"} />
Provides the number of current active JavaScript Auth* context threads. Same measurement as `kubling.js.threads.active`.

### `kubling.js.auth.executions` <span class="inline-flex items-center rounded-md bg-gray-50 px-2 py-1 text-sm font-medium text-gray-600 ring-1 ring-inset ring-gray-500/10 font-gilroyMedium">Counter</span>
### `kubling.js.auth.executions` <Tag description={"Counter"} />
Provides the number of total JavaScript Auth* context threads claimed during current execution. Same measurement as `kubling.js.executions`.

## Prometheus
Expand Down
1 change: 1 addition & 0 deletions pages/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
"Observability": "",
"CLI": "",
"schemas": "Configuration files schemas",
"changelog": "Product Changelog",
"roadmap": "Product Roadmap",
"contact": {
"title": "Contact ↗",
Expand Down
7 changes: 7 additions & 0 deletions pages/changelog.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Product changelog

## v24.5.2
- Added `blankNamespaceStrategy` option to the Kubernetes Data Source Configuration.
- Added a new aggregator schema button in the console, under Virtual Databases.
- Fixed the JavaScript filesystem cache, which in some circumstances resulted in a false cache miss when the resource had been visited before, especially during `import`.
- Added `metadata__uid` in all deployment related `TABLE`, in the built-in Kubernetes DDL
Loading

0 comments on commit 047eb4c

Please sign in to comment.