You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on a metrics ingestion pipeline with Vector, where the source is a Kafka cluster and the approximate throughput is 15 MB/s. The messages we receive have an average size between 300 and 2000 bytes.
To control the memory usage, I have made some adjustments in the librdkafka_options section and in the sinks configuration. Below, I share an extract of my configuration:
[sinks.prometheus_exporter]
type = "prometheus_exporter"
inputs = ["internal_metrics"]
address = "0.0.0.0:9201"`
My question is the following:
Are there any additional settings or parameters I can adjust in Vector (or librdkafka_options) to further optimize memory usage, without compromising throughput, in a 15 MB/s metric environment? Also, what methods do you recommend for verifying and measuring the actual size of messages in my Kafka topic?
To give you more context, I am currently using a customised solution that consumes 50% less resources than Vector....
I welcome any suggestions or experiences you can share, thank you in advance!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello to all,
I'm working on a metrics ingestion pipeline with Vector, where the source is a Kafka cluster and the approximate throughput is 15 MB/s. The messages we receive have an average size between 300 and 2000 bytes.
To control the memory usage, I have made some adjustments in the librdkafka_options section and in the sinks configuration. Below, I share an extract of my configuration:
`[api]
enabled = true
[sources.kafka_in]
type = "kafka"
bootstrap_servers = "kafka1.example.com:9093,kafka2.example.com:9093,kafka3.example.com:9093"
group_id = "vector2kafka-XXXXXXXX.metrics"
topics = ["metrics_topic"]
tls.enabled = true
tls.verify_certificate = false
tls.ca_file = "/certs/ca.crt"
tls.crt_file = "/certs/tls.crt"
tls.key_file = "/certs/tls.key"
decoding.codec = "native"
auto_offset_reset = "earliest"
commit_interval_ms = 2000
fetch_wait_max_ms = 10
session_timeout_ms = 10000
librdkafka_options = {
"fetch.message.max.bytes" = "8192",
"queued.min.messages" = "100",
"queued.max.messages.kbytes" = "8192",
"socket.receive.buffer.bytes" = "65536"
}
acknowledgements.enabled = true
[sources.internal_metrics]
type = "internal_metrics"
[transforms.filter_metrics]
type = "filter"
inputs = ["kafka_in"]
condition = 'exists(.tags.leanix) && .tags.leanix == "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" && exists(.tags.environment) && .tags.environment != "production"'
[transforms.route_metrics]
type = "route"
inputs = ["filter_metrics"]
route.private = 'exists(.tags.private) && .tags.private == "true"'
route.default = '!(exists(.tags.private) && .tags.private == "true")'
[sinks.vmbucket1_public]
type = "prometheus_remote_write"
inputs = ["route_metrics.default"]
endpoint = "http://vm-bucket-1.example.internal:8480/insert/0/prometheus/api/v1/write"
compression = "snappy"
healthcheck = false
batch.timeout_secs = 1
batch.max_events = 300
buffer.max_events = 500
[sinks.vmbucket1_private]
type = "prometheus_remote_write"
inputs = ["route_metrics.private"]
endpoint = "http://vm-bucket-1.example.internal:8480/insert/103/prometheus/api/v1/write"
compression = "snappy"
healthcheck = false
batch.timeout_secs = 1
batch.max_events = 300
buffer.max_events = 500
[sinks.prometheus_exporter]
type = "prometheus_exporter"
inputs = ["internal_metrics"]
address = "0.0.0.0:9201"`
My question is the following:
Are there any additional settings or parameters I can adjust in Vector (or librdkafka_options) to further optimize memory usage, without compromising throughput, in a 15 MB/s metric environment? Also, what methods do you recommend for verifying and measuring the actual size of messages in my Kafka topic?
To give you more context, I am currently using a customised solution that consumes 50% less resources than Vector....
I welcome any suggestions or experiences you can share, thank you in advance!
Beta Was this translation helpful? Give feedback.
All reactions