Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Validate the LogPipeline OTel Setup and test the performance of the log agent #1705

Open
wants to merge 71 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
cfb1422
chore: Add image.repository to otel values charts
TeodorSAP Dec 4, 2024
33c3696
chore: functional log agent otel values file
TeodorSAP Dec 6, 2024
641dfcd
chore: LogAgent load test setup and config files
TeodorSAP Dec 18, 2024
c7ce037
Merge branch 'main' into chore/otel-logs-validation
TeodorSAP Dec 18, 2024
babe0c7
chore: Update load test files
TeodorSAP Dec 20, 2024
18eb9ed
docs: Add the log agent load test investigations results and final co…
TeodorSAP Dec 20, 2024
64ae962
chore: Add additional load test instruction
TeodorSAP Dec 20, 2024
a1aee11
Merge branch 'main' into chore/otel-logs-validation
TeodorSAP Jan 8, 2025
e52ed38
chore: WIP
TeodorSAP Jan 13, 2025
88f4385
chore: New findings
TeodorSAP Jan 15, 2025
cb80544
chore: configuration WIP
TeodorSAP Jan 16, 2025
dc4081b
WIP
TeodorSAP Jan 16, 2025
64465ad
chore: documentation update
TeodorSAP Jan 17, 2025
9f313af
chore: Documentation insights
TeodorSAP Jan 17, 2025
a95075a
chore: Fully document benchmarking session #2
TeodorSAP Jan 20, 2025
77358a1
chore: .md changes
TeodorSAP Jan 20, 2025
9f5de42
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
86031b0
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
5ff70b0
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
acbbc7a
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
48ebcc0
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
8e08ba7
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
7be8538
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
42c5244
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
dee9619
chore: Update config validation doc
TeodorSAP Jan 20, 2025
aae1ae7
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
0d3aa2e
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
a1c9c73
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
4e4e14e
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
e8fb721
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
fd77bfd
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
1c7082c
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
ddd2df5
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
8a9ba37
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
03bef54
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
cbc6ae7
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
ab06ae7
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
d1cd4d2
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
1e86e8f
chore: Update config validation doc
TeodorSAP Jan 20, 2025
69f57e3
Merge branch 'chore/otel-logs-validation' of github.com:TeodorSAP/tel…
TeodorSAP Jan 20, 2025
ac76542
chore: Update config validation doc
TeodorSAP Jan 20, 2025
8a6d3d9
chore: Update config validation doc
TeodorSAP Jan 20, 2025
55ab833
chore: Update config validation doc
TeodorSAP Jan 20, 2025
c75de3f
chore: Update config validation doc
TeodorSAP Jan 20, 2025
254239f
Merge branch 'main' into chore/otel-logs-validation
TeodorSAP Jan 21, 2025
d81dfc8
chore: Update config validation doc
TeodorSAP Jan 21, 2025
6b3bc2a
chore: Update config validation doc
TeodorSAP Jan 21, 2025
71319a6
chore: Update config validation doc
TeodorSAP Jan 21, 2025
be17e38
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
8a57bf0
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
2f21872
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
1d293b4
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
c3b1e9e
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
cc0098b
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
3d352b8
chore: Update config validation doc
TeodorSAP Jan 21, 2025
7a07704
Merge branch 'chore/otel-logs-validation' of github.com:TeodorSAP/tel…
TeodorSAP Jan 21, 2025
ec690fc
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
2d4e517
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
313001d
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
ba72832
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
e4f5c4a
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
452b97c
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
6ecd74e
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
69b0534
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
e35fe33
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
f5f4835
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
c8ea74a
chore: remove unfinished telemetrygen load test config file
TeodorSAP Jan 21, 2025
0774e38
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
3303b78
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
22186df
chore: Update config validation doc
TeodorSAP Jan 21, 2025
5b254c7
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 31 additions & 26 deletions docs/contributor/benchmarks/otlp-logs-validation.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
# OTel LogPipeline Setup Validation

This file documents the process of validating the whole LogPipeline with OTLP output flow. It defines the setup, that consists of the manually deployed log agent, the already-implemented log gateway, and log generators using flog.
- [Setup Configuration Steps](#setup-configuration-steps)
- [Resources Under Investigation](#resources-under-investigation)
- [Benchmarking Setup](#benchmarking-setup)
- [Performance Tests Results](#performance-tests-results)
- [📊 Benchmarking Session #1](#-benchmarking-session-1)
- [📊 Benchmarking Session #2](#-benchmarking-session-2)
- [Conclusions](#conclusions)

The scope is to performance test the agent, observing the resulting values (such as throughput, resource consumption, reaction to backpressure), and to compare the agent to the previous FluentBit-based setup.

TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
## Configuring the Log Agent

## Setup Configuration Steps
To configure the log agent, deploy the [OTLP Logs Validation YAML](./otlp-logs-validation.yaml) either with Helm or manually:

- To set up the log agent with Helm, run:

Expand All @@ -30,26 +36,25 @@ The scope is to performance test the agent, observing the resulting values (such
```


## Relevant/Configurable Resources
## Resources Under Investigation
We investigate the following resources (for details, see the [OTLP Logs Validation YAML](./otlp-logs-validation.yaml)):

TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- Log Agent ConfigMap (OTel Config)
- Log Agent DaemonSet

See [OTLP Logs Validation YAML](./otlp-logs-validation.yaml)

TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
#### Things to take into consideration (at implementation)
**Things to take into consideration, when implementing the Log Agent into Telemetry Manager:**
- Dynamically include/exclude of namespaces, based on LogPipeline spec attributes.
- Exclude FluentBit container in OTel configuration, and OTel container in FluentBit configuration.
- `receivers/filelog/operators`: The copy body to `attributes.original` must be avoided if `dropLogRawBody` flag is enabled.

### How does checkpointing work
**How does checkpointing work?**
By enabling the storeCheckpoint preset (Helm), the `file_storage` extension is activated in the filelog receiver.
- The `file_storage` has the path `/var/lib/otelcol`.
- Later, this path is mounted as a `hostPath` volume in the DaemonSet spec.
- The extension is also set in the `storage` property of the filelog receiver.

> By enabling the storeCheckpoint preset (Helm) the `file_storage` extension is activated in the receiver
> - The `file_storage` has the path `/var/lib/otelcol`
> - This path is later mounted as a `hostPath` volume in the DaemonSet spec
> - The extension is also set in the `storage` property of the filelog receiver

> `storage` = The ID of a storage extension to be used to store file offsets. File offsets enable the receiver to pick up where it left off in the case of a collector restart. If no storage extension is used, the receiver manages offsets only in memory.
> **NOTE:** `storage` = The ID of a storage extension to be used to store file offsets. File offsets enable the filelog receiver to pick up where it left off in the case of a collector restart. If no storage extension is used, the receiver manages offsets only in memory.


## Benchmarking Setup
Expand Down Expand Up @@ -79,7 +84,7 @@ See [OTLP Logs Validation YAML](./otlp-logs-validation.yaml)
k apply -f telemetry-manager/hack/load-tests/log-backpressure-config.yaml
```

4. PromQL Queries used for measuring the results:
4. You can use the following PromQL Queries for measuring the results (same/similar queries were used in measuring the results of the performance tests executed below):
``` sql
-- RECEIVED
round(sum(rate(otelcol_receiver_accepted_log_records{service="telemetry-log-agent-metrics"}[20m])))
Expand All @@ -100,15 +105,15 @@ See [OTLP Logs Validation YAML](./otlp-logs-validation.yaml)

## Performance Tests Results

### 📊 Benchmarking Session #1

| Icon | Meaning |
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
| ---- | ---------------------------------------------------- |
| ⏳ | Full-test, involving the whole setup, usually 20 min |
| 🪲 | Debugging session, usually shorter, not so reliable |
| 🏋️‍♀️ | Backpressure scenario |
| ⭐️ | Best results observed (in a given scenario) |

TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
### 📊 Benchmarking Session #1

#### ⏳ 18 Dec 2024, 13:45 - 14:05 (20 min)
- **Generator:** 10 replicas x 10 MB
- **Agent:** no CPU limit, no queue
Expand Down Expand Up @@ -386,22 +391,22 @@ See [OTLP Logs Validation YAML](./otlp-logs-validation.yaml)


## Comparison with FluentBit Setup
In the FluentBit setup, for the very same (initial) scenario (i.e. 10 generator replicas [old set-up] / 2 agents), the [load test](https://github.com/kyma-project/telemetry-manager/actions/runs/12691802471) outputs the following values for the agent:
In the FluentBit setup, for the very same (initial) scenario (that is, 10 generator replicas [old set-up] or 2 agents), the [load test](https://github.com/kyma-project/telemetry-manager/actions/runs/12691802471) outputs the following values for the agent:
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- Exported Log Records/second: 27.8K

TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

## Conclusions
- Before 15 Jan. (first session):
### Benchmarking Session #1 (before 15 Jan)
- Compared to the FluentBit counterpart setup, a lower performance can be expected.
- Backpressure is currently not backpropagated from the gateway to the agent, resulting in logs being queued/lost on the gateway end. That's because the agent has no way of knowing when to stop, thus exports data continuously (this is a known issue, which is expected be solved by the OTel community in the next half year).
- Backpressure is currently not backpropagated from the gateway to the agent, resulting in logs being queued or lost on the gateway end. That's because the agent has no way of knowing when to stop, thus exports data continuously (this is a known issue, which is expected be solved by the OTel community in the next half year).
- If the load is increased (that is, more generators, more logs, or more data), the log agent slows down.
- The network communication between the agent and the gateway or/and the gateway represent a bottleneck in this setup. That's concluded because higher throughput was observed when using just a debug endpoint as an exporter.
- CPU and memory consumption are surprisingly low, and this was not improved by removing the limits (quite the opposite was observed, with the CPU throttling more often and the throughput decreasing).
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- If the batch processor is enabled, throughput increased. But this comes at the cost of losing logs in some scenarios.
- Further methods of improving the throughput might still be worth investigating.
- After 15 jan. (second session):
- Removing the gateway improves throughput
- We now better understand the performance impact of each OTEL processor and of enabling/disabling compression
- Generators configuration greatly influence the setup => more generators exporting less data and taking less CPU leads to higher throughput than fewer generators taking more CPU and exporting more data
- There is a hard limit (see debug endpoint scenario) that we still not fully understand, since strictly based on the benchmarking numbers of OTEL, we should be getting higher throughput (i.e. something related to the infrastructure could be influencing this).
- We have now a more performant setup configuration, being more comparable with the numbers from the FluentBit setup

### Benchmarking Session #2 (after 15 Jan)
- Removing the gateway improves throughput.
- We now better understand the performance impact of each OTel processor and of enabling or disabling compression.
- The generators' configuration greatly influences the setup: More generators exporting less data and taking less CPU leads to higher throughput than fewer generators taking more CPU and exporting more data.
- There is a hard limit (see debug endpoint scenario) that we still don't fully understand, because strictly based on the benchmarking numbers of OTel, we should be getting higher throughput. It's possible that something related to the infrastructure could be influencing this.
- We now have a more performant setup configuration, being more comparable with the numbers from the FluentBit setup
75 changes: 0 additions & 75 deletions hack/load-tests/log-agent-setup-telemetrygen.yml

This file was deleted.

Loading