Validate new OTLP log setup #1631

a-thaler · 2024-11-22T09:28:55Z

Description
Review and collect the result of the PoCs done for #556 and come up with a final config for the OTLP based log agent.
The agent should:

tail logs from the container runtime
map the content to OTLP
if applicable parse the JSON payload
send the data synchronuous to the log gateway
will pause tailing in case of gateway refusals

With that setup, perform a load test to see that the maximum ingestion rate of a Cloud Logging instance can be achieved.

Document the final configmap and document the load test results

Considerations:

An observed maximum output per overall fluentbit DaemonSet observed was 60 MB/s or 44K records/s (where the backend could not stand the load)
An observed maximum per fluentbit instance was 4,4MB/s or 15K records/s
Cloud Logging can handle in fluentbit format 2,5K logs/s in standard or avg. 15K logs/s (max 25K logs/s) in large
Cloud Logging can handle in OTLP format 10K logs/s in standard or 30K logs/s in large

After first perf test results, following aspects should be double-checked:

Check the official perf test results and see if they reflect similar results. Is the resource utilization that low as well?
The agent throughput was halfed by switching from debugexporter to a OTLP exporter with a mock backend, can that impact be really that drastically?
Tryout the new batchingexporter and check the perf increase for the agent
Based on the actual requirements (survive spike scenarios where backends are refusing data for some shorter amount of time), check if a persistent queue really brings a benefit over a simple in-memory and backpressure scenario?

The key decision points are:

Is the performance good enough to have logging based on the otel-collector already
Decide on architecture (agent pushes directly or via gateway) considering aspects around persistency and memory consumption (APIServer caching)

a-thaler added the area/logs LogPipeline label Nov 22, 2024

a-thaler mentioned this issue Nov 22, 2024

Logging OTLP support #556

Open

24 tasks

TeodorSAP self-assigned this Nov 27, 2024

TeodorSAP linked a pull request Dec 20, 2024 that will close this issue

docs: Validate the LogPipeline OTel Setup and test the performance of the log agent #1705

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate new OTLP log setup #1631

Validate new OTLP log setup #1631

a-thaler commented Nov 22, 2024 •

edited

Loading

Validate new OTLP log setup #1631

Validate new OTLP log setup #1631

Comments

a-thaler commented Nov 22, 2024 • edited Loading

a-thaler commented Nov 22, 2024 •

edited

Loading