Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic pipelines - a new foreach block #1480

Merged
merged 33 commits into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
94509e7
Proposal for dynamic pipelines
ptodev Aug 15, 2024
1f2ce8d
Foreach prototype
ptodev Oct 31, 2024
247b68d
Initial implementation
ptodev Nov 20, 2024
afdeb23
wip
wildum Nov 20, 2024
b917da4
Fixes to summation1 and summation2
ptodev Dec 17, 2024
439b300
fix foreach run
wildum Dec 17, 2024
7de0aa1
foreach uses the value from the collection via the var
wildum Dec 18, 2024
5174a33
compute an ID for the foreach instances and add tests
wildum Dec 19, 2024
2290e44
rework foreach txtar tests
wildum Jan 7, 2025
534e07e
support using modules inside of foreach
wildum Jan 8, 2025
68b7c7c
cleanup
wildum Jan 9, 2025
e192510
update frontend to use the moduleID of the component instead of the m…
wildum Jan 9, 2025
56437f7
plug the foreach node to the UI
wildum Jan 9, 2025
2625f1d
fix internal template components link
wildum Jan 9, 2025
91778c6
update comment in component references
wildum Jan 10, 2025
0d95adc
cleanups
wildum Jan 13, 2025
0b9f400
Disable debug metrics for components inside foreach, and for foreach …
ptodev Jan 15, 2025
6148600
Add stability lvl to config blocks (#2441)
wildum Jan 17, 2025
b91aee3
Add tests for types other than integers (#2436)
ptodev Jan 23, 2025
b690b16
Add docs for foreach (#2447)
ptodev Jan 23, 2025
6bf628c
use full hash on foreach instances and fix test
wildum Jan 27, 2025
417bd4d
Add a changelog entry.
ptodev Jan 27, 2025
86435e4
typo
wildum Jan 31, 2025
0b42dc3
allow non alphanum strings
wildum Jan 31, 2025
8611fa3
add test for wrong collection type
wildum Jan 31, 2025
7db7859
added capsule test
wildum Jan 31, 2025
1ab03b9
Add more tests for non-alphanumeric strings.
ptodev Jan 31, 2025
00c8dca
Apply suggestions from code review
ptodev Feb 10, 2025
737cd0d
Apply suggestions from code review
ptodev Feb 10, 2025
9781fab
Add comments regarding the override registry for modules
ptodev Feb 10, 2025
5ec5380
Rename hashObject to objectFingerprint
ptodev Feb 11, 2025
8f2f642
add comment for the hash function
wildum Feb 12, 2025
43c4d74
add additional detail to the comment
wildum Feb 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ Main (unreleased)

- Add the possibility to export span events as logs in `otelcol.connector.spanlogs`. (@steve-hb)

- (_Experimental_) A new `foreach` block which starts an Alloy pipeline for each item inside a list. (@wildum, @thampiotr, @ptodev)

### Enhancements

- (_Experimental_) Log instance label key in `database_observability.mysql` (@cristiangreco)
Expand Down
252 changes: 252 additions & 0 deletions docs/design/1443-dynamic-pipelines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
# Proposal: Alloy proposal process

* Author: Paulin Todev (@ptodev), Piotr Gwizdala (@thampiotr)
* Last updated: 2024-08-15
* Original issue: https://github.com/grafana/alloy/issues/1443

## Abstract

We are proposing a new feature to the [Alloy standard library][stdlib].
It will be similar to a `map` operation over a collection such as a `list()`.
Each `map` transformation will be done by a chain of components (a "sub-pipeline") created for this transformation.
Each item in the collection will be processed by a different "sub-pipeline".

The final solution may differ from a standard `map` operation, since there may be multiple outputs for the same input.
For example, the sub-pipeline may branch into different `prometheus.relabel` components,
each of which sends outputs to different components outside of the sub-pipeline.

[stdlib]: https://grafana.com/docs/alloy/latest/reference/stdlib/

## Use cases

<!-- TODO: Add more use cases. It'd be helpful to gather feedback from the community and from solutions engineers. -->

### Using discovery components together with prometheus.exporter ones

Discovery components output a list of targets. It's not possible to input those lists directly to most exporter components.

Suppose we have a list of targets produced by a `discovery` component:

```
[
{"__address__" = "redis-one:9115", "instance" = "one"},
{"__address__" = "redis-two:9116", "instance" = "two"},
]
```

The [Alloy type][alloy-types] of the list above is `list(map(string))`.
However, you may want to pipe information from this list of targets to a component which doesn't work with a `list()` or a `map()`.
For example, you may want to input the `"__address__"` string to a `prometheus.exporter.redis`,
and you may want to use the `"instance"` string in a `discovery.relabel`.

[alloy-types]: https://grafana.com/docs/alloy/latest/get-started/configuration-syntax/expressions/types_and_values/

## Proposal 1: A foreach block

A `foreach` block may start several sub-pipelines for a `collection` specified in its arguments.

```alloy
// All components in the sub-pipeline will be scoped under "foreach.default/1/...".
// Here, "1" is sub-pipeline number 1.
// This way component names won't clash with other sub-pipelines from the same foreach,
// and with the names of components outside of the foreach.
foreach "default" {

// "collection" is what the for loop will iterate over.
collection = discovery.file.default.targets

// Each item in the collection will be accessible via the "target" variable.
// E.g. `target["__address__"]`.
var = "target"

// A sub-pipeline consisting of components which process each target.
...
}
```

<details>
<summary>Example</summary>

```alloy
discovery.file "default" {
files = ["/Users/batman/Desktop/redis_addresses.yaml"]
}

// Every component defined in the "foreach" block will be instantiated for each item in the collection.
// The instantiated components will be scoped using the name of the foreach block and the index of the
// item in the collection. For example: /foreach.redis/0/prometheus.exporter.redis.default
foreach "redis" {
collection = discovery.file.default.targets
// Here, "target" is a variable whose value is the current item in the collection.
var = "target"

prometheus.exporter.redis "default" {
redis_addr = target["__address__"] // we can also do the necessary rewrites before this.
}

discovery.relabel "default" {
targets = prometheus.exporter.redis.default.targets
// Add a label which comes from the discovery component.
rule {
target_label = "filepath"
// __meta_filepath comes from discovery.file
replacement = target["__meta_filepath"]
}
}

prometheus.scrape "default" {
targets = discovery.relabel.default.targets
forward_to = prometheus.remote_write.mimir.receiver
}
}

prometheus.remote_write "mimir" {
endpoint {
url = "https://prometheus-prod-05-gb-south-0.grafana.net/api/prom/push"
basic_auth {
username = ""
password = ""
}
}
}
```

</details>

Pros:
* The `foreach` name is consistent with other programming languages.

Cons:
* It looks less like a component than a `declare.dynamic` block.
In order to instantiate multiple `foreach` blocks with similar config, you'd have to wrap them in a `declare` block.

## Proposal 2: A declare.dynamic block

A new `declare.dynamic` block would create a custom component which starts several sub-pipelines internally.
Users can use `argument` and `export` blocks, just like in a normal `declare` block.

```alloy
declare.dynamic "ex1" {
argument "input_targets" {
optional = false
comment = "We will create a sub-pipeline for each target in input_targets."
}

argument "output_metrics" {
optional = false
comment = "All the metrics gathered from all pipelines."
}

// A sub-pipeline consisting of components which process each target.
...
}

declare.dynamic.ex1 "default" {
input_targets = discovery.file.default.targets
output_metrics = [prometheus.remote_write.mimir.receiver]
}
```

<details>
<summary>Example</summary>

```alloy
// declare.dynamic "maps" each target to a sub-pipeline.
// Each sub-pipeline has 1 exporter, 1 relabel, and 1 scraper.
// Internally, maybe one way this can be done via serializing the pipeline to a string and then importing it as a module?
declare.dynamic "redis_exporter" {
argument "input_targets" {
optional = false
comment = "We will create a sub-pipeline for each target in input_targets."
}

argument "output_metrics" {
optional = false
comment = "All the metrics gathered from all pipelines."
}

// "id" is a special identifier for every "sub-pipeline".
// The number of "sub-pipelines" is equal to len(input_targets).
prometheus.exporter.redis id {
redis_addr = input_targets["__address__"]
}

discovery.relabel id {
targets = prometheus.exporter.redis[id].targets
// Add a label which comes from the discovery component.
rule {
target_label = "filepath"
// __meta_filepath comes from discovery.file
replacement = input_targets["__meta_filepath"]
}
}

prometheus.scrape id {
targets = prometheus.exporter.redis[id].targets
forward_to = output_metrics
}

}
discovery.file "default" {
files = ["/Users/batman/Desktop/redis_addresses.yaml"]
}

declare.dynamic.redis_exporter "default" {
input_targets = discovery.file.default.targets
output_metrics = [prometheus.remote_write.mimir.receiver]
}

prometheus.remote_write "mimir" {
endpoint {
url = "https://prometheus-prod-05-gb-south-0.grafana.net/api/prom/push"
basic_auth {
username = ""
password = ""
}
}
}
```

</details>

Pros:
* Looks more like a component than a `foreach` block.
* Flexible number of inputs and outputs.

Cons:
* A name such as `declare.dynamic` doesn't sound as familiar to most people than `foreach`.
* It may not be practical to implement this in a way that there's more than one possible input collection.
* How can we limit users to having just one collection?
* Having another variant of the `declare` block can feel complex.
Can we just add this functionality to the normal `declare` block, so that we can avoid having a `declare.dynamic` block?

## Proposal 3: Do nothing

It is customary to always include a "do nothing" proposal, in order to evaluate if the work is really required.

Pros:
* No effort required.
* Alloy's syntax is simpler since we're not adding any new types of blocks.

Cons:
* Not possible to integrate most `prometheus.exporter` components with the `discovery` ones.

## Unknowns

We should find answers to the unknowns below before this proposal is accepted:

* Will the solution only work for `list()`? What about `map()`?
* If we go with a `foreach`, we could have a `key` attribute in addition to the `var` one. That way we can also access the key. The `key` attribute can be a no-op if `collection` is a map?
* What about debug metrics? Should we aggregate the metrics for all "sub-pipelines"?
* If there is 1 series for each sub-pipeline, the amount of metrics could be huge.
Some service discovery mechanisms may generate a huge number of elements in a list of targets.
* If we want to aggregate the metrics, how would we do that? Is it even possible to do in within Alloy?
* Can we have a configuration parameter which dictates whether the metrics should be aggregated or not?
* Do we have to recreate the sub-pipelines every time a new collection is received,
even if the new collection has the same number of elements?
* Do we need to have more than one output, of a different type?
* Do we need to have more than one input, of a different type?

## Recommended solution

<!-- TODO: Fill this later -->
Loading
Loading