Vector scaling challenges with Prometheus Alertmanager to Kafka integration #22602

niksarangi · 2025-03-06T09:46:32Z

niksarangi
Mar 6, 2025

Our team is developing a proof of concept to forward alerts from Prometheus Alertmanager to a Kafka topic. We selected Vector as the data pipeline solution based on initial research and compatibility requirements.

Environment Configuration on Local Machine:

Source: Prometheus Alertmanager instance
Pipeline: Vector with appropriate source configuration in vector.toml
Destination: Kafka topic configured as sink
Test Harness: Custom Java application designed to generate high volumes of test alerts

Issue Description

We are encountering significant performance limitations during load testing of our Vector implementation. When our test harness sends 1-2 million alerts to Alertmanager, only approximately 40,000-45,000 alerts (4% of the total volume) successfully reach our Kafka topic.

Despite enabling verbose logging mode in Vector, we observe no visibility into the incoming messages being processed, which complicates our troubleshooting efforts.

Questions

Are there known throughput limitations in Vector when processing high volumes from Alertmanager?
What configuration parameters should we examine to improve processing capacity?
Are there specific buffer settings, batch configurations, or resource allocations we should adjust?
Could there be a communication limitation between Alertmanager and Vector that we need to address?

Additional Context:
vector.toml config:

[sources.alertmanager_source]
type = "http_server"
address = "0.0.0.0:8686"
buffer.max_event = 10000
buffer.type = "disk"
concurrency = 10

encoding.codec = "json"

[transforms.split_alerts]
type = "remap"
inputs = ["alertmanager_source"]
source = '''
parsed = parse_json!(.message) #ensure json parsing
alerts = parsed.alerts
if !is_array(alerts) { alerts = [alerts] } #convert array to non array
. = alerts
'''
[sinks.kafka_out]
type = "kafka"
inputs = ["split_alerts"]
bootstrap_servers = "localhost:9092"
topic = "vector-test"

encoding.codec = "json"

batch.max_events = 5000
batch.timeout_ms = 100000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector scaling challenges with Prometheus Alertmanager to Kafka integration #22602

{{title}}

Replies: 0 comments

Select a reply

Vector scaling challenges with Prometheus Alertmanager to Kafka integration #22602

niksarangi Mar 6, 2025

encoding.codec = "json"

Replies: 0 comments

niksarangi
Mar 6, 2025