Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(opentelemetry source): add support for enriching logs with HTTP headers #21674

Merged
merged 5 commits into from
Nov 5, 2024

Conversation

jblazquez
Copy link
Contributor

@jblazquez jblazquez commented Nov 1, 2024

Summary

This PR adds a new http.headers configuration to the opentelemetry source, which injects HTTP headers received in the OTLP/HTTP request into the log events, similarly to how http_server does it. Please see #21673 for the motivation behind this change.

This is my first contribution to Vector, so I'm not sure whether the changes in this PR are in the right form to be accepted. I wanted to create the PR to show other contributors the proposed implementation, but I'm open to different approaches.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

Added unit tests.

Also tested with an existing OTEL processing pipeline, replacing an existing http_server source (see below) with an opentelemetry source using the new http.headers config from this change.

Also ran a wrk load test to verify that performance is much higher with this solution than with an http_source that looks like this:

[sources.opentelemetry_logs_http_server]
type = "http_server"
address = "0.0.0.0:14318"
path = "/v1/logs"
decoding.codec = "protobuf"
decoding.protobuf.desc_file = "/path/to/opentelemetry-proto.desc"
decoding.protobuf.message_type = "opentelemetry.proto.collector.logs.v1.ExportLogsServiceRequest"
headers = [
    # To get access to the client IP address. 
    "x-forwarded-for",
]

See the linked issue (#21673) for more info on the performance impact of this new feature.

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

References

Closes: #21673

@jblazquez jblazquez requested a review from a team as a code owner November 1, 2024 01:29
@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Nov 1, 2024
@@ -103,12 +114,69 @@ pub(crate) fn build_warp_filter(
log_filters.or(trace_filters).unify().boxed()
}

fn enrich_events(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is similar to SimpleHttpSource::enrich_events, but it only injects headers into the event.

Copy link
Contributor

@pront pront Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the code blocks that injects the headers are identical (except for the source names). I think it would be nice to convert this into a common function and move it to a new file src/sources/util/http_common.rs.


use crate::http::{KeepaliveConfig, MaxConnectionAgeLayer};
use crate::sources::http_server::{HttpConfigParamKind, SimpleHttpConfig};
Copy link
Contributor Author

@jblazquez jblazquez Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure if it's OK to use types from the source::http_server module here, but other sources (e.g. heroku_logs) also do it, so I assumed it is OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also move HttpConfigParamKind into the common file.

Copy link
Contributor Author

@jblazquez jblazquez Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided against making this change for now, as it touches a number of unrelated files and it wasn't super clear which common file to move this type to.

@pront pront added the source: opentelemetry Anything `opentelemetry` source related label Nov 1, 2024
@pront pront self-assigned this Nov 1, 2024
Copy link
Contributor

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jblazquez, thank you for this contribution!

headers_config.get(header_name).map(HeaderValue::as_bytes);

log_namespace.insert_source_metadata(
SimpleHttpConfig::NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with OpentelemetryConfig (or this can also resolved by the refactoring I suggested above).

@@ -103,12 +114,69 @@ pub(crate) fn build_warp_filter(
log_filters.or(trace_filters).unify().boxed()
}

fn enrich_events(
Copy link
Contributor

@pront pront Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the code blocks that injects the headers are identical (except for the source names). I think it would be nice to convert this into a common function and move it to a new file src/sources/util/http_common.rs.

src/sources/opentelemetry/tests.rs Show resolved Hide resolved
@jblazquez jblazquez requested review from a team as code owners November 2, 2024 03:31
@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Nov 2, 2024
@jblazquez
Copy link
Contributor Author

@pront, thank you for the review. Now that I'm more confident about these changes, I reworked the PR to address your feedback:

  • 516d1da: Creates a new sources::util::http::headers module and adds an add_headers function there. I tried to follow the existing pattern for the sources::util::http::query::add_query_parameters function, including the Cargo features.
  • 36895e3: Modifies the http_server source to use the new add_headers function.
  • 43f474f: Modifies the opentelemetry source to use the new add_headers function, adds tests, and updates the documentation and changelog.
  • 0bf233c: Corrects the description of the behavior of the headers option in http_server. Unlike query_parameters, the headers do not overwrite existing fields with the same name. Added a test to confirm the correct behavior.

In addition to the commits above, I also added the following commit that extends this PR to also bring support for the host_key configuration to the opentelemetry source, for more consistency with http_server. If you think this change should be done in a different PR please let me know, but I felt it was closely related to the headers feature.

  • ea5c56f: Adds support for the host_key config to opentelemetry. I followed the existing pattern from http_server for plumbing through the remote IP.

Comment on lines 71 to 72
let remote_addr = conn.peer_addr();
let remote_addr_ref = enable_source_ip.then_some(remote_addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is preferable because it's Some only enable_source_ip is true:

Suggested change
let remote_addr = conn.peer_addr();
let remote_addr_ref = enable_source_ip.then_some(remote_addr);
let remote_addr = enable_source_ip.then(|| conn.peer_addr());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, if I come back to this work to add host_key to opentelemetry I will do this. I had copied what the http_server source did.

@@ -0,0 +1,3 @@
The `opentelemetry` source can now be configured to enrich log events with the source IP and any HTTP headers received in the OTLP/HTTP request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also mention that now a host_key can be specified.

Note, I am skeptical about the connection of the host_key with source_ip. I understand there's precedence in other sources but I can see an argument in favor of making it a different setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit ea5c56f mentioned this, but I reverted that commit to reduce the scope of this PR, since you said you were not sure that host_key should mean source_ip.

Comment on lines 83 to 85
if let Some(remote_addr_inner) = remote_addr_ref.as_ref() {
request.extensions_mut().insert(*remote_addr_inner);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some(remote_addr_inner) = remote_addr_ref.as_ref() {
request.extensions_mut().insert(*remote_addr_inner);
}
if let Some(addr) = remote_addr {
request.extensions_mut().insert(addr);
}

.unwrap()
.is_timestamp());
assert_eq!(
meta.get(path!("opentelemetry", "host")).unwrap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to be remote_ip. I think this is also broken in http_server because we hardcode path!("host") when enriching events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wondered this, but I assumed it was intentional that - when using log namespacing - the source IP was always under path!("host") and that the host_key configuration was only for the legacy namespace.

But you bring up some good questions about the behavior of this setting, so I reverted the host_key change from this PR.

@pront
Copy link
Contributor

pront commented Nov 4, 2024

@pront, thank you for the review. Now that I'm more confident about these changes, I reworked the PR to address your feedback:

  • 516d1da: Creates a new sources::util::http::headers module and adds an add_headers function there. I tried to follow the existing pattern for the sources::util::http::query::add_query_parameters function, including the Cargo features.
  • 36895e3: Modifies the http_server source to use the new add_headers function.
  • 43f474f: Modifies the opentelemetry source to use the new add_headers function, adds tests, and updates the documentation and changelog.
  • 0bf233c: Corrects the description of the behavior of the headers option in http_server. Unlike query_parameters, the headers do not overwrite existing fields with the same name. Added a test to confirm the correct behavior.

In addition to the commits above, I also added the following commit that extends this PR to also bring support for the host_key configuration to the opentelemetry source, for more consistency with http_server. If you think this change should be done in a different PR please let me know, but I felt it was closely related to the headers feature.

  • ea5c56f: Adds support for the host_key config to opentelemetry. I followed the existing pattern from http_server for plumbing through the remote IP.

Thank you @jblazquez, much appreciated 👍

Adding the host_key introduced new scope so I did another review and some new things came up.

@jblazquez jblazquez changed the title feat(opentelemetry): add support for enriching logs with HTTP headers feat(opentelemetry source): add support for enriching logs with HTTP headers Nov 4, 2024
@jblazquez
Copy link
Contributor Author

@pront Appreciate the review!

You brought up some good questions about how host_key should behave, so I did not want to add that new scope to this PR. I reverted the commit that added host_key so this PR is purely about the headers config.

Let me know if there are other changes needed. Thanks!

@pront
Copy link
Contributor

pront commented Nov 5, 2024

@pront Appreciate the review!

You brought up some good questions about how host_key should behave, so I did not want to add that new scope to this PR. I reverted the commit that added host_key so this PR is purely about the headers config.

Let me know if there are other changes needed. Thanks!

Sounds great. Will review again today.

If you are still interested in adding a host_key, feel free to submit another after this one is merged. I think we have a good understanding of how it should work.

Copy link
Contributor

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jblazquez!

@pront pront enabled auto-merge November 5, 2024 16:12
@pront pront added this pull request to the merge queue Nov 5, 2024
Merged via the queue into vectordotdev:master with commit 872d0b5 Nov 5, 2024
97 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation domain: sources Anything related to the Vector's sources source: opentelemetry Anything `opentelemetry` source related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exposing HTTP headers in opentelemetry source
3 participants