-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(tracing): only extract distributed headers if a trace is not already started #9456
fix(tracing): only extract distributed headers if a trace is not already started #9456
Conversation
ASM: This introduces an experimental env var `DD_EXPERIMENTAL_APPSEC_STANDALONE_ENABLED` that will send the tag `apm.enabled` as 0 if used together with `DD_APPSEC_ENABLED=true` ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
…n image (#9332) - Adds `vMAJOR` and `vMAJOR.MINOR` lib-injection images (in addition to `vMAJOR.MINOR.PATCH` and `latest`) - Fixes incorrect generation of `latest` image We want to enable customers to be able to pin to a major version. We were also incorrectly tagging images as `latest` when they weren't (only would have happened on hotfixes, so hasn't _actually_ occurred). We are following ([this doc's suggestions](https://docs.google.com/document/d/1xYvWBz8OXeavZWxBmIGheZmcZkF7Cx9tXqT6mxgHkgs/edit?pli=1#heading=h.nud45hbcwu7p)), but in summary: - Prerelease versions are _not_ tagged - Every non-prerelease release gets the `vMajor.Minor.Patch` version tag - `2.5.0` gets `v2.5.0` - `2.2.3` gets `v2.2.3` - `1.2.1` gets `v1.2.1` - Every non-prerelease release gets the `vMajor.Minor` version tag initially (which assumes we never "go back" in release values) - `2.5.0` gets `v2.5` - `2.2.3` gets `v2.2` - `1.2.1` gets `v1.2` - _Some_ releases get the `vMajor` version tag. Only releases for which this is the _highest_ version in the major get the tag. - `2.5.0` gets `v2` (_if_ there's no higher `2.x.x` release) - `1.2.1` gets `v1` (_if_ there's no higher `1.x.x` release) - _Some_ releases get the latest tag. Only releases for which this is the highest version **ever** get the tag. - `2.5.0` gets `latest` if it's the highest release so far - `1.2.1` will not get `latest` if there's already `2.x.x` releases The logic is now more complicated and requires knowing the state of the git repository. The script shown here mirrors [the one added for .NET](DataDog/dd-trace-dotnet#5544). > Note we're taking advantage of the [Datadog/public-images](https://github.com/DataDog/public-images) support for passing multiple csv values in the `IMG_DESTINATIONS` variable. You can see an example of this used in "the wild" [here](https://github.com/DataDog/datadog-ci/blob/a4041c4fca346122fa54dd67dad3d93d4f1731b9/.gitlab-ci.yml#L34). The generation stage is quite verbose about printing out all the variables, but overall this is obviously very hard to test so I set up [a dummy GitHub repository and GitLab YAML](https://github.com/DataDog/dd-trace-dotnet-gitlab-test/blob/main/.gitlab-ci.yml), which just echoes the values it receives, to confirm they're sent across to the child pipeline correctly. If it is safe to do so, we can test this by reverting [the `revert "TESTING"` commit](5cd4d75) I don't know how backporting works, so hopefully someone else could pick that up 😅 ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: Brett Langdon <[email protected]> Co-authored-by: Zachary Groves <[email protected]>
NOTE to apm-python reviewers: The only files this PR touches outside of LLM Observability ownership are snapshot files. This PR expands on #9417 by only set span.span_type = "llm" if LLMObs is enabled. This means that we will not even attempt to process the given span (i.e. set temporary LLMObs tags), which should minimize when the affect LLMObs code has outside of LLMObs contexts. The majority of this PR's LOC involves modifying all openai/langchain/bedrock snapshot files to remove "type":"llm". Existing tests should cover the reverted functionality. ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
we're testing 79 packages: - 27 has integration tests - 77 has smoke tests (now with a AST patching verification) Top packages list imported from: - https://pypistats.org/top - https://hugovk.github.io/top-pypi-packages/ Some popular packages are not included in the list: - pypular package is discarded because it is not a real top package - wheel, importlib-metadata and pip is discarded because they are package to build projects - colorama and awscli are terminal commands - protobuf fails for all python versions with No module named 'protobuf' - sniffio and tzdata have no python files to patch - greenlet is a CPP project, no files to patch - coverage is a testing package ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. - [x] If change touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from `@DataDog/security-design-and-guidance`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
This PR introduces helper methods in the LLMObs service, namely `activate_distributed_headers()` and `inject_distributed_headers()`, to propagate LLMObs parent IDs (and regular APM parent/trace IDs) for users that may be using custom HTTP/web framework libraries that ddtrace does not provide an integration for. The `LLMObs.activate_distributed_headers()` method is very similar to `ddtrace.trace_utils.activate_distributed_headers()`, except it adds some checks and logs if the distributed context is not valid (not span/trace ID) or if the LLMObs distributed context (parent ID) is not valid. However as long as the context is valid (span/trace ID is valid, but LLMObs parent ID does not need to be present) it will still activate the distributed context to avoid breaking APM applications. The `LLMObs.inject_distributed_headers()` method uses `HTTPPropagator.inject()` under the hood to inject distributed request headers with the necessary context. We also add some checks and logs to see if the span/current_span context is valid. LLMObs parent ID injection happens under the hood as well. This PR also overrides the `config._llmobs_enabled` value to `True` on `LLMObs.enable()` since users could technically not set the environment variable `DD_LLMOBS_ENABLED` if they are enabling LLMObs in-code, i.e. `LLMObs.enable()`. ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
…9451) Reverts #9428 This is causing new test failure, reverting for now to unblock others while we figure out what is going on. https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/62898/workflows/65e942fb-cc5c-4ccf-a357-4702597d6778/jobs/3915709 - [x] checklist
This PR sets up a new `ddtrace.internal._core` module which is a Rust module defined in `src/core/` and uses PyO3. The first component being moved over is the `ddtrace.internal.rate_limiter.RateLimter`. This is a well isolated component which has minimal need to be in Python. There are clear performance gains from moving this component to native. The main motivation from this change is to start to build the basis for adding/moving performance critical components to PyO3. ## Risks This introduces a requirement on the rust compiler for building from source. We have built this into our dev image for awhile, but there are other places where we do not yet have the proper setup and so processes may fail. For example, the benchmarking platform image does not have rust compiler yet. ## Testing Since we kept the same public interface as the original Python module, we are using the existing Python tests as the validation of this change. They should be comprehensive enough to validate the new native version. ## Benchmarks The benchmarking image is not yet updated to include rust compiler, so the following results are from running locally on my machine only. The new `rate_limiter` benchmark shows a roughly 3x performance improvement. | Benchmark | main | rate_limiter | |-----------------------------|------|--------------| | ratelimiter-defaults | 801 ns | 224 ns: 3.57x faster | | ratelimiter-no_rate_limit | 325 ns | 223 ns: 1.46x faster | | ratelimiter-low_rate_limit | 800 ns | 220 ns: 3.64x faster | | ratelimiter-high_rate_limit | 817 ns | 224 ns: 3.65x faster | | ratelimiter-short_window | 928 ns | 224 ns: 4.14x faster | | ratelimiter-long_window | 808 ns | 220 ns: 3.68x faster | | Geometric mean | (ref) | 3.19x faster | This is a great improvement, however, the rate limiter is such a small portion of an actual trace, because right now it's biggest impact is on starting a new trace and making a sampling decision. So applications with the biggest possible improvement are those which start a lot of small traces at high volume. Benchmarks for the `span` suite show a minor improvement, this is because the rate limiter today only takes up a small portion of the total lifecycle of a span. | Benchmark | main | rate_limiter | |------------------------------|------|--------------| | span-add-metrics | 39.2 ms | 38.3 ms: 1.02x faster | | span-start-finish | 16.6 ms | 15.9 ms: 1.05x faster | | span-start-finish-traceid128 | 17.8 ms | 16.5 ms: 1.08x faster | | Geometric mean | (ref) | 1.02x faster | Benchmarks for the `tracer` suite showing similar results to `span`. | Benchmark | main | rate_limiter | |------------------------------|------|--------------| | tracer-small | 94.0 us | 91.2 us: 1.03x faster | | tracer-medium | 844 us | 818 us: 1.03x faster | | Geometric mean | (ref) | 1.02x faster | `tracer-large` results are not show a statistically significant enough difference in overhead. Benchmarks for the `flask_simple` suite showing almost no improvement at all. | Benchmark | main | rate_limiter | |------------------------------|------|--------------| | flasksimple-tracer | 1.13 ms | 1.15 ms: 1.02x slower | | flasksimple-debugger | 566 us | 573 us: 1.01x slower | | flasksimple-appsec-post | 1.03 ms | 1.05 ms: 1.02x slower | | flasksimple-appsec-telemetry | 1.15 ms | 1.17 ms: 1.01x slower | | Geometric mean | (ref) | 1.01x slower | ## Follow-up future work - Move the RateLimiter tests over to rust - Add Rust formatting, static analysis, testing steps to the CI process ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. ## Reviewer Checklist - [ ] Title is accurate - [ ] All changes are related to the pull request's stated goal - [ ] Description motivates each change - [ ] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [ ] Testing strategy adequately addresses listed risks - [ ] Change is maintainable (easy to change, telemetry, documentation) - [ ] Release note makes sense to a user of the library - [ ] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [ ] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Duplicate of #9443 with core api integration instead of wrappers. add support for sqli detection and prevention for exploit prevention add unit tests will also be tested by system-tests sqli rasp tests. (DataDog/system-tests#2514) APPSEC-53421 ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
…a.distributed.extract
Datadog ReportBranch report: ✅ 0 Failed, 120670 Passed, 56233 Skipped, 4h 17m 33.35s Total duration (5h 58m 24.97s time saved) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to backport this to 2.8 and 2.9 (after release)? Or are we introducing it in 2.10 since it would break some cases that currently work?
I was thinking just 2.10. I do consider this a fix for a broken scenario, but not urgent or pressing enough to consider backporting it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this would be ideal but we will likely need an RFC first. Also this doesn't resolve the performance issues in nested apps.
The current change looks good to me
BenchmarksBenchmark execution time: 2025-01-17 21:12:15 Comparing candidate commit 4764abc in PR branch Found 3 performance improvements and 0 performance regressions! Performance is the same for 391 metrics, 2 unstable metrics. scenario:flasksqli-appsec-enabled
scenario:flasksqli-tracer-enabled
scenario:iast_aspects-ospathnormcase_aspect
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per discussion with @tabgok, I have added a "private" FF to control this behavior in case we need to roll back for any customers.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9456 +/- ##
===========================================
- Coverage 74.30% 10.42% -63.88%
===========================================
Files 1398 1365 -33
Lines 129873 127713 -2160
===========================================
- Hits 96500 13320 -83180
- Misses 33373 114393 +81020 ☔ View full report in Codecov by Sentry. |
Datadog ReportBranch report: ✅ 0 Failed, 1048 Passed, 550 Skipped, 16m 42.26s Total duration (23m 9.64s time saved) |
|
Today there are some integrations which always check for distributed tracing headers and if found will activate them as the current/parent context. This blind approach does not take into account if a parent trace has already been started.
This means if you have nested Flask apps (for example), then the child Flask app may make it's parent the upstream remote service, and not the first/initial Flask app in the process. This can result in a broken trace.
The scenario which "works" today which will be broken with this change is if there is "broken" instrumentation as the first span in a trace. This could either be custom instrumentation that did not check for distributed headers, or a built-in integration which is not properly parsing/activating distributed tracing headers.
This change would prioritize keeping the local trace accurate over ensuring a connection with the upstream calling service.
Checklist
changelog/no-changelog
is set@DataDog/apm-tees
.Reviewer Checklist