Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Validate the LogPipeline OTel Setup and test the performance of the log agent #1705

Open
wants to merge 71 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
cfb1422
chore: Add image.repository to otel values charts
TeodorSAP Dec 4, 2024
33c3696
chore: functional log agent otel values file
TeodorSAP Dec 6, 2024
641dfcd
chore: LogAgent load test setup and config files
TeodorSAP Dec 18, 2024
c7ce037
Merge branch 'main' into chore/otel-logs-validation
TeodorSAP Dec 18, 2024
babe0c7
chore: Update load test files
TeodorSAP Dec 20, 2024
18eb9ed
docs: Add the log agent load test investigations results and final co…
TeodorSAP Dec 20, 2024
64ae962
chore: Add additional load test instruction
TeodorSAP Dec 20, 2024
a1aee11
Merge branch 'main' into chore/otel-logs-validation
TeodorSAP Jan 8, 2025
e52ed38
chore: WIP
TeodorSAP Jan 13, 2025
88f4385
chore: New findings
TeodorSAP Jan 15, 2025
cb80544
chore: configuration WIP
TeodorSAP Jan 16, 2025
dc4081b
WIP
TeodorSAP Jan 16, 2025
64465ad
chore: documentation update
TeodorSAP Jan 17, 2025
9f313af
chore: Documentation insights
TeodorSAP Jan 17, 2025
a95075a
chore: Fully document benchmarking session #2
TeodorSAP Jan 20, 2025
77358a1
chore: .md changes
TeodorSAP Jan 20, 2025
9f5de42
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
86031b0
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
5ff70b0
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
acbbc7a
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
48ebcc0
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
8e08ba7
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
7be8538
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
42c5244
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
dee9619
chore: Update config validation doc
TeodorSAP Jan 20, 2025
aae1ae7
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
0d3aa2e
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
a1c9c73
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
4e4e14e
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
e8fb721
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
fd77bfd
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
1c7082c
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
ddd2df5
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
8a9ba37
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
03bef54
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
cbc6ae7
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
ab06ae7
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
d1cd4d2
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 20, 2025
1e86e8f
chore: Update config validation doc
TeodorSAP Jan 20, 2025
69f57e3
Merge branch 'chore/otel-logs-validation' of github.com:TeodorSAP/tel…
TeodorSAP Jan 20, 2025
ac76542
chore: Update config validation doc
TeodorSAP Jan 20, 2025
8a6d3d9
chore: Update config validation doc
TeodorSAP Jan 20, 2025
55ab833
chore: Update config validation doc
TeodorSAP Jan 20, 2025
c75de3f
chore: Update config validation doc
TeodorSAP Jan 20, 2025
254239f
Merge branch 'main' into chore/otel-logs-validation
TeodorSAP Jan 21, 2025
d81dfc8
chore: Update config validation doc
TeodorSAP Jan 21, 2025
6b3bc2a
chore: Update config validation doc
TeodorSAP Jan 21, 2025
71319a6
chore: Update config validation doc
TeodorSAP Jan 21, 2025
be17e38
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
8a57bf0
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
2f21872
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
1d293b4
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
c3b1e9e
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
cc0098b
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
3d352b8
chore: Update config validation doc
TeodorSAP Jan 21, 2025
7a07704
Merge branch 'chore/otel-logs-validation' of github.com:TeodorSAP/tel…
TeodorSAP Jan 21, 2025
ec690fc
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
2d4e517
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
313001d
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
ba72832
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
e4f5c4a
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
452b97c
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
6ecd74e
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
69b0534
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
e35fe33
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
f5f4835
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
c8ea74a
chore: remove unfinished telemetrygen load test config file
TeodorSAP Jan 21, 2025
0774e38
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
3303b78
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
22186df
chore: Update config validation doc
TeodorSAP Jan 21, 2025
5b254c7
Update docs/contributor/benchmarks/otlp-logs-validation.md
TeodorSAP Jan 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/contributor/benchmarks/load-test-logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ The tests are executed for 20 minutes, so that each test case has a stabilized o
<div class="table-wrapper" markdown="block">

| config | logs received l/s | logs exported l/s | logs queued | cpu | memory MB | no. restarts of gateway | no. restarts of generator |
| --- | --- | --- | --- | --- | --- | ---|
| single | 7193 | 7195 | 16824 | 2.5 | 826 | 0 | 1 |
| batch | 16428 | 16427 | 0 | 3 | 265 | 0 | 1 |
| ------ | ----------------- | ----------------- | ----------- | --- | --------- | ----------------------- | ------------------------- |
| single | 7193 | 7195 | 16824 | 2.5 | 826 | 0 | 1 |
| batch | 16428 | 16427 | 0 | 3 | 265 | 0 | 1 |
</div>

## Interpretation
Expand Down
312 changes: 312 additions & 0 deletions docs/contributor/benchmarks/otlp-logs-validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,312 @@
# OTel LogPipeline set-up validation
NHingerl marked this conversation as resolved.
Show resolved Hide resolved

This file documents the process of validating the whole LogPipeline with OTLP output flow. It starts by defining the setup, that consists of the manually deployed log agent, the already-implemented log gateway, and log generators using flog.
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

The scope is to performance test the agent, observing the resulting values, in terms of throughput, resource consumption, reaction to backpressure, etc. And compare it to the previous FluentBit-based setup.
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved


## 1. Set-up configuration steps
NHingerl marked this conversation as resolved.
Show resolved Hide resolved

### With Helm
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

``` bash
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
k apply -f telemetry-manager/config/samples/operator_v1alpha1_telemetry.yaml

// Execute knowledge-hub/scripts/create_cls_log_pipeline.sh with the corresponding environment variables

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

helm install -n kyma-system logging open-telemetry/opentelemetry-collector -f telemetry-manager/docs/contributor/pocs/assets/otel-log-agent-values.yaml
```

### Manual
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

``` bash
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
k apply -f telemetry-manager/config/samples/operator_v1alpha1_telemetry.yaml

// Execute knowledge-hub/scripts/create_cls_log_pipeline.sh with the corresponding environment variables

k apply -f ./otlp-logs-validation.yaml
```


TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

## 2. Resulting Resources
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
### Agent ConfigMap (OTel Config)
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

See [OTLP Logs Validation YAML](./otlp-logs-validation.yaml)
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

#### Things to take into consideration (at implementation)
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- Dynamically inclusion/exclusion of namespaces, based on LogPipeline spec attributes
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- Exclude FluentBit container in OTel configuration and OTel container in FluentBit configuration
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- `receivers/filelog/operators`: The copy body to `attributes.original` must be avoided if `dropLogRawBody` flag is enabled
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

### Agent DaemonSet

See [OTLP Logs Validation YAML](./otlp-logs-validation.yaml)

### How does checkpointing work
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

- By enabling the storeCheckpoint preset (Helm) the `file_storage` extension is activated in the receiver
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- The `file_storage` has the path `/var/lib/otelcol`
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- This is later mounted as a `hostPath` volume in the DaemonSet spec
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- Also set in the `storage` property of the filelog receiver
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
> `storage` = The ID of a storage extension to be used to store file offsets. File offsets allow the receiver to pick up where it left off in the case of a collector restart. If no storage extension is used, the receiver will manage offsets in memory only.
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved



## 3. Benchmarking and Performance Tests Results
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

Setup Configuration:
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
``` bash
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
k create ns prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm upgrade --install -n "prometheus" "prometheus" prometheus-community/kube-prometheus-stack -f hack/load-tests/values.yaml --set grafana.adminPassword=myPwd

k apply -f telemetry-manager/hack/load-tests/log-agent-test-setup.yaml
```

For executing the load tests, the generated logs have to be isolated, hence the following line should be replaced in the ConfigMap of the log agent:
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

``` yaml
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
receivers:
filelog:
# ...
include:
- /var/log/pods/*/*/*.log # replace with "/var/log/pods/log-load-test*/*flog*/*.log"
```

For the 🏋️‍♀️ Backpressure Scenario additionally apply:
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
``` bash
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
k apply -f telemetry-manager/hack/load-tests/log-backpressure-config.yaml
```

PromQL Queries:
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
``` sql
-- RECEIVED
round(sum(rate(otelcol_receiver_accepted_log_records{service="telemetry-log-agent-metrics"}[20m])))

-- EXPORTED
round(sum(rate(otelcol_exporter_sent_log_records{service="telemetry-log-agent-metrics"}[20m])))

-- QUEUE
avg(sum(otelcol_exporter_queue_size{service="telemetry-log-agent-metrics"}))

-- MEMORY
round(sum(avg_over_time(container_memory_working_set_bytes{namespace="kyma-system", container="collector"}[20m]) * on(namespace,pod) group_left(workload) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{namespace="kyma-system", workload="telemetry-log-agent"}[20m])) by (pod) / 1024 / 1024)

-- CPU
round(sum(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="kyma-system"}[20m]) * on(namespace,pod) group_left(workload) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{namespace="kyma-system", workload="telemetry-log-agent"}[20m])) by (pod), 0.1)
```

TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
### ⭐️ Best Results (Scenario: Single Pipeline)
| Batching | RECEIVED | EXPORTED | QUEUE | MEMORY | CPU |
| :------: | :-------: | :-------: | :---: | :----: | :---: |
| ❌ | max. 8.9K | max. 8.9K | 0 | ~63 | ~0.5 |
| ✅ | 8.6K | 8.6k | 0 | ~73 | ~0.6 |

### ⭐️🏋️‍♀️ Best Results (Scenario: Single Pipeline with Backpressure)
| Batching | RECEIVED | EXPORTED | QUEUE | MEMORY | CPU |
| :------: | :------: | :------: | :---: | :----: | :---: |
| ❌ | 6.8K | 6.8K | ~328 | ~66 | ~0.5 |
| ✅ | - | - | - | - | - |

### 📊 Benchmarking Sessions

| Icon | Meaning |
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
| ---- | ---------------------------------------------------- |
| ⏳ | Full-test, involving the whole setup, usually 20 min |
| 🪲 | Debugging session, usually shorter, not so reliable |
| 🏋️‍♀️ | Backpressure Scenario |
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
| ⭐️ | Best results observed (in a given scenario) |

TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
#### ⏳ 18 Dec 2024, 13:45 - 14:05 (20 min)
- **Generator:** 10 replicas x 10 MB
- **Agent:** no CPU limit, no queue
- **Results:**
- Agent RECEIVED/EXPORTED: 6.06K
- Agent Memory:
- Pod1: 70
- Pod2: 70
- Agent CPU:
- Pod1: 0.5
- Pod2: 0.4
- Gateway RECEIVED/EXPORTED: 6.09K
- Gateway QUEUE: 0

#### ⏳ 18 Dec 2024, 14:08 - 14:28 (20 min)
- **Generator:** 20 replicas x 10 MB
- **Agent:** no CPU limit, no queue
- **Results:**
- Agent RECEIVED/EXPORTED: 4.93K
- Agent Memory:
- Pod1: 71
- Pod2: 72
- Agent CPU:
- Pod1: 0.5
- Pod2: 0.4
- Gateway RECEIVED/EXPORTED: 4.93K
- Gateway QUEUE: 0 (max. 6 at some point)

#### ⏳ 18 Dec 2024, 14:50 - 15:10 (20 min)
- **Generator:** 10 replicas x 20 MB
- **Agent:** no CPU limit, no queue
- **Results:**
- Agent RECEIVED/EXPORTED: 5.94K
- Agent Memory:
- Pod1: 76
- Pod2: 81
- Agent CPU:
- Pod1: 0.5
- Pod2: 0.5
- Gateway RECEIVED/EXPORTED: 5.94K
- Gateway QUEUE: 0

#### ⏳⭐️ 18 Dec 2024, 15:24 - 15:34 (10 min)
- **Generator:** 10 replicas x 10 MB
- **Agent:** with CPU limit (1), no queue
- **Results:**
- Agent RECEIVED/EXPORTED: 8.9K
- Agent Memory: 64/62
- Agent CPU: 0.5/0.5
- Gateway RECEIVED/EXPORTED: 8.9K
- Gateway QUEUE: 0

#### 🏋️‍♀️⭐️ 18 Dec 2024, 15:36 - 15:56 (20 min) (backpressure scenario)
- **Generator:** 10 replicas x 10 MB
- **Agent:** with CPU limit (1), no queue
- **Results:**
- Agent RECEIVED/EXPORTED: 6.8K
- Agent Memory:
- Pod1: 66
- Pod2: 67
- Agent CPU:
- Pod1: 0.6
- Pod2: 0.5
- Gateway RECEIVED: 6.8K
- Gateway EXPORTED: 256
- Gateway QUEUE: 328
- **Remarks:**
- Agent does not stop when gateway refuses logs (because backpressure does not backpropagate)
- It slows down/stops in other scenarios (see bellow) => SUCCESS
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

#### 🪲 19 Dec 2024, Agent exports logs to a debug endpoint (5 min)
- no networking involved
- 12/14 log generators x 10 MB
- 19.5K => ~20K
- MEM: 43/47
- CPU: 0.7/0.8

#### 🪲 19 Dec 2024, Agent exports logs directly to mock backend (5 min)
- networking, but avoiding gateway
- 10 log generators x 10 MB
- 5.3K
- MEM: 58/59
- CPU: 0.4/0.5
- 12 log generators x 10 MB
- not increasing

#### 🪲 19 Dec 2024, Agent exports logs directly to mock backend with batching processor (5 min)
- networking, but with batching mechanism in-place
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- 10 log generators x 10 MB, batch size: 1024
- 8.3K
- MEM: 68/73
- CPU: 0.5/0.6
- 12 log generators x 10 MB, batch size: 1024
- starts decreasing (~7.5K)
- 10 log generators x 10 MB, batch size: 2048
- ~9K
- MEM: 74/79
- CPU: 0.6/0.7

#### ⏳ 19 Dec 2024, 13:46 - 14:06 (20 min)
- **Generator:** 10 replicas x 10 MB
- **Agent:** with CPU limit (1), no queue, with batch processing (1024)
- **Results:**
- Agent RECEIVED/EXPORTED: 8.46K
- Gateway RECEIVED/EXPORTED: 8.46K
- Agent Memory: 69/76
- Agent CPU: 0.5/0.7
- Gateway QUEUE: 0 (max 191)

#### ⏳ 19 Dec 2024, ??:?? - ??:?? (20 min)
NHingerl marked this conversation as resolved.
Show resolved Hide resolved
- **Generator:** 10 replicas x 10 MB
- **Agent:** with CPU limit (1), no queue, with batch processing (2048)
- **Results:**
- lower throughput as for the 1024 scenario
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

#### ⏳⭐️ 19 Dec 2024, 15:55 - 16:15 (20 min)
- **Agent:** with CPU limit (1), no queue, with batch processing (1024)
- **Mock Backend:** memory limit x2 (2048Mi)
- **Generator:** 10 replicas x 10 MB
- **Results:**
- Agent RECEIVED/EXPORTED: 8.18K
- Gateway RECEIVED/EXPORTED: 8.18K
- Agent Memory: 70/71
- Agent CPU: 0.6/0.6
- Gateway QUEUE: 0
- **Generator:** 12 replicas x 10 MB (16:18 - 16:35)
- **Results:**
- Agent RECEIVED/EXPORTED: 8.6k
- Gateway RECEIVED/EXPORTED: 8.6k
- Agent Memory: 73/74
- Agent CPU: 0.7/0.6
- Gateway QUEUE: 0
- **Generator:** 14 replicas x 10 MB (16:35 - 16:40)
- **Results:**
- Agent RECEIVED/EXPORTED: 7.54K
- Gateway RECEIVED/EXPORTED: 7.54K
- lower

#### ⏳ 19 Dec 2024, 16:50 - 17:10 (20 min)
- **Generator:** 12 replicas x 10 MB
- **Agent:** with CPU limit (1), no queue, with batch processing (2048)
- **Mock Backend:** memory limit x2 (2048Mi)
- **Results:**
- Agent RECEIVED/EXPORTED: 8.1K
- Gateway RECEIVED/EXPORTED: 8.11K
- Agent Memory: 74/81
- Agent CPU: 0.6/0.5
- Gateway QUEUE: 0 (max 2)

#### 🪲 20 Dec 2024, Multiple agents loading the gateway (5 min)
- **Setup:** 10 nodes, 10 agents, 1 generator / node (DaemonSet)
- **Results (WITH BATCHING):**
- Agent RECEIVED/EXPORTED: 61.5K => 6.1K / agent instance
- Gateway RECEIVED/EXPORTED: 61.5K/29.5K => 30K/14.7K / gateway instance
- Agent Memory: 61-68/agent
- Agent CPU: 0.4-0.8/agent
- Gateway QUEUE: 510 (max 512, full)
- ~10% exporter failed enqueue logs
- 0% receiver refused logs
- 0% exporter send failed logs
- **Results (WITHOUT BATCHING):**
- Agent RECEIVED/EXPORTED: 31.4K => 3.1K / agent instance
- Gateway RECEIVED/EXPORTED: 31.4K => 11.4K / gateway instance
- Agent Memory: 61-68/agent
- Agent CPU: 0.4-0.5/agent
- Gateway QUEUE: 0 (max 6)
- 0% exporter failed enqueue logs
- 0% receiver refused logs
- 0% exporter send failed logs


## 4. Comparison with FluentBit setup
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
In the FluentBit setup, for the very same scenario, the [load test](https://github.com/kyma-project/telemetry-manager/actions/runs/12691802471) outputs the following values for the agent:
- Exported Log Records per second: 3.913
- Received Log Records per second: 3.868


## 5. Conclusions
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved

- A lower performance can be expected, compared to the FluentBit counterpart setup.
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- Backpressure is currently not backpropagated from the gateway to the agent, resulting in logs being queued/lost on the gateway end, since the agent has no way of knowing when to stop, thus exports data continuously. (This is a known issue, that should get solved by the OTel community in the next half year)
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- Agent slows down if the load is increased (i.e. more generators / more logs / more data).
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- The network communication between the agent and the gateway or/and the gateway represent a bottleneck in this setup, since when using just a debug endpoint as an exporter, higher throughput was observed.
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- CPU and Memory consumption are surprisingly low, and this was not improved by removing the limits (quite the opposite was observed, with the CPU throttling more often and the throughput decreasing).
NHingerl marked this conversation as resolved.
Show resolved Hide resolved
- When enabling the batch processor, throughput was increasing, but this comes at the cost of losing logs in some scenarios.
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
- More/other methods of improving the throughput might still be worth investigating.
TeodorSAP marked this conversation as resolved.
Show resolved Hide resolved
Loading
Loading