Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add APM Single Step Instrumentation #981

Closed
wants to merge 11 commits into from

Conversation

CharlyF
Copy link
Contributor

@CharlyF CharlyF commented Nov 15, 2023

What does this PR do?

Define a new struct in the APM feature to allow configuration of the Single Instrumentation Step. Sister PR of DataDog/helm-charts#1211.
Added a couple of tests, the CRD change and the main code in distinct commits.

Motivation

To be confirmed if the following is the expected format.

    - name: DD_APM_INSTRUMENTATION_ENABLED
      value: "true"
    - name: DD_APM_INSTRUMENTATION_DISABLED_NAMESPACES
      value: '["foo"]'
    - name: DD_APM_INSTRUMENTATION_LIB_VERSIONS
      value: '{"java":"1.2.3","test":"1.0.0"}'

but the DCA's config seems good:

  instrumentation:
    disabled_namespaces:
    - foo
    enabled: true
    enabled_namespaces: []
    lib_versions: '{"java":"1.2.3","test":"1.0.0"}'

Additional Notes

Anything else we should know when reviewing?

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: N/A
  • Cluster Agent: v7.39.0

Describe your test plan

spec:
  features:
    apm:
      enabled: true
      singleStepInstrumentation:
        disabledNamespaces:
        - foo
        enabled: true
        libVersions:
          java: 1.2.3
          test: 1.0.0

should start the Cluster Agent and configure: DD_APM_INSTRUMENTATION_ENABLED, DD_APM_INSTRUMENTATION_DISABLED_NAMESPACES and DD_APM_INSTRUMENTATION_LIB_VERSIONS.
If you use DD_APM_INSTRUMENTATION_DISABLED_NAMESPACES and DD_APM_INSTRUMENTATION_ENABLED_NAMESPACES, it will error out (not supported for now?).

{"level":"ERROR","ts":"2023-11-15T03:50:36Z","logger":"controller.datadogagent","msg":"Reconciler error","reconciler group":"datadoghq.com","reconciler kind":"DatadogAgent","name":"datadog","namespace":"default","error":"`enabledMamespaces` and `disabledNamespaces` cannot be set together"}

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label

@CharlyF CharlyF added the enhancement New feature or request label Nov 15, 2023
@CharlyF CharlyF added this to the v1.4.0 milestone Nov 15, 2023
@CharlyF CharlyF requested review from a team as code owners November 15, 2023 03:40
@codecov-commenter
Copy link

codecov-commenter commented Nov 15, 2023

Codecov Report

Merging #981 (5b0cfd9) into main (1c2da35) will increase coverage by 0.01%.
Report is 7 commits behind head on main.
The diff coverage is 75.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #981      +/-   ##
==========================================
+ Coverage   57.16%   57.17%   +0.01%     
==========================================
  Files         164      164              
  Lines       19574    19703     +129     
==========================================
+ Hits        11189    11265      +76     
- Misses       7732     7777      +45     
- Partials      653      661       +8     
Flag Coverage Δ
unittests 57.17% <75.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
apis/datadoghq/v2alpha1/datadogagent_default.go 88.58% <100.00%> (+0.65%) ⬆️
apis/datadoghq/v2alpha1/datadogagent_types.go 100.00% <ø> (ø)
controllers/datadogagent/feature/apm/feature.go 56.58% <70.76%> (+5.55%) ⬆️

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c2da35...5b0cfd9. Read the comment docs.

@CharlyF CharlyF force-pushed the charly/apm-auto-instrumentation branch from 273f183 to 25ec543 Compare November 15, 2023 03:56
@CharlyF CharlyF requested a review from a team as a code owner November 15, 2023 03:56
@liliyadd
Copy link
Collaborator

Thanks @CharlyF for the PR!

In both helm and Cluster Agent, we have some validation of SSI parameters. I was wondering if https://github.com/DataDog/datadog-operator/blob/main/controllers/datadogagent/clusteragent.go#L689 would be the right place to put it for the Operator.

The validation logic is the following:

1. If spec.features.enabledNamespaces != nil AND spec.features.disabledNamespaces != nil, return the error with the message that enabledNamespaces and disabledNamespaces cannot be set at the same time. Datadog Operator should fail installing in this case.
2. If spec.features.enabled='true' AND spec.features.enabledNamespaces != nil, show the warning saying SSI will be enabled in the whole cluster.
3.  If spec.features.enabled='false' AND spec.features.disabledNamespaces != nil, show the warning saying SSI will be disabled in the whole cluster.

@liliyadd
Copy link
Collaborator

Oh I see this logic is in features.go. Will throwing an error there fail Operator installation?

@@ -228,6 +252,47 @@ func (f *apmFeature) ManageDependencies(managers feature.ResourceManagers, compo
// ManageClusterAgent allows a feature to configure the ClusterAgent's corev1.PodTemplateSpec
// It should do nothing if the feature doesn't need to configure it.
func (f *apmFeature) ManageClusterAgent(managers feature.PodTemplateManagers) error {
if f.singleStepInstrumentation != nil && f.singleStepInstrumentation.enabled {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outer check should only have if f.singleStepInstrumentation != nil.

It's possible to have f.singleStepInstrumentation.enabled='false' && f.singleStepInstrumentation.enabledNamespaces != nil - in this case SSI is disabled in the whole cluster, but enabled in some namespaces.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if f.singleStepInstrumentation != nil {
  // Unsupported configuration
  if len(f.singleStepInstrumentation.disabledNamespaces) > 0 && 
     len(f.singleStepInstrumentation.enabledNamespaces) > 0 {
       return error
  }
  
  if f.singleStepInstrumentation.enabled && f.singleStepInstrumentation.enabledNamespaces {
    // SSI instrumentation is enabled on the cluster level
    log.warn("SSI will be enabled in the whole cluster")
    setEnv(DDAPMInstrumentationEnabled, true)
  } else if !f.singleStepInstrumentation.enabled && f.singleStepInstrumentation.disabledNamespaces {
     // SSI instrumentation is disabled on the cluster level
    log.warn("SSI will be disabled in the whole cluster")
    setEnv(DDAPMInstrumentationEnabled, false)
  } else if len(f.singleStepInstrumentation.enabledNamespaces) > 0 {
    // SSI instrumentation is enabled on the namespace level
    setEnv(DDAPMInstrumentationEnabled, false)
    setEnv(DDAPMInstrumentationEnabledNamespaces, parsedNs)
  } else if len(f.singleStepInstrumentation.disabledNamespaces) > 0 
  {
     // SSI instrumentation is disabled on the namespace level
     setEnv(DDAPMInstrumentationEnabled, true)
     setEnv(DDAPMInstrumentationDisabledNamespaces, parsedNs)
  }
setEnv(DDAPMInstrumentationLibVersions, libVersions)
...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm that would be super confusing UX imo. The flag explicitly says enabled: false but then it's not actually disabled? Why should we do that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

singleStepInstrumentation.enabled is a cluster level configuration, singleStepInstrumentation.enabledNamespace is a namespace level configuration. SSI can be disabled in the cluster, but enabled on a specific namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do that please. This would be unique behavior, no other product does that. It's also confusing, as a user you'd think you're disabling the feature but it's not actually the case. Global enable/disable should be global, namespace-level config should be namespace level, etc. and global should override everything, that's how every other config of the agent works.

@CharlyF CharlyF force-pushed the charly/apm-auto-instrumentation branch from c661027 to 810d6c8 Compare November 17, 2023 17:35
docs/configuration.v2alpha1.md Outdated Show resolved Hide resolved
if f.singleStepInstrumentation != nil {
if len(f.singleStepInstrumentation.disabledNamespaces) > 0 && len(f.singleStepInstrumentation.enabledNamespaces) > 0 {
// This configuration is not supported
return fmt.Errorf("`enabledNamespaces` and `disabledNamespaces` cannot be set together")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior I saw when testing DataDog/helm-charts#1252 was not to error and instead use the list based on the flag selection - i.e. if enabled use disabledNamespaces, if disabled use enabledNamespaces. Returning error will prevent reconciling and any agent installation.

Could you confirm this is what we want, @liliyadd?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was actually a bug in Helm Chart that didn't fail the helm chart installation when both enabledNamespaces and disabledNamespaces are set. With DataDog/helm-charts#1261 it won't be possible.

@liliyadd
Copy link
Collaborator

liliyadd commented Dec 1, 2023

I ran the following tests in the staging cluster:

  1. features.apm.instrumentation is not set
    Configuration:
  features:
    apm:

Cluster agent env variables:

    - name: DD_APM_INSTRUMENTATION_ENABLED
      value: "false"

Test app behavior:

  • No automatic injection in any ns
  1. features.apm.instrumentation.enabled is disabled
    Configuration:
  features:
    apm:
      instrumentation:
        enabled: false

Cluster agent env variables:

    - name: DD_APM_INSTRUMENTATION_ENABLED
      value: "false"

Test app behavior:

  • No automatic injection in any ns
  1. features.apm.instrumentation.enabledNamespaces is set
    Configuration:
  features:
    apm:
      instrumentation:
        enabledNamespaces:
        - rc-tests

Cluster agent env variables:

    - name: DD_APM_INSTRUMENTATION_ENABLED
      value: "false"
    - name: DD_APM_INSTRUMENTATION_ENABLED_NAMESPACES
      value: '["rc-tests"]'

Test app behavior:

  • 5 client libraries injected as init containers.
  • Traces observed in dd for the app running in rc-tests namespace.
  • Tracing libraries not injected to the app in apps ns.

4.features.apm.instrumentation.disabledNamespaces is set
Configuration:

  features:
    apm:
      instrumentation:
        disabledNamespaces:
        - rc-tests

Cluster agent env variables:

    - name: DD_APM_INSTRUMENTATION_ENABLED
      value: "false"
    - name: DD_APM_INSTRUMENTATION_DISABLED_NAMESPACES
      value: '["rc-tests"]'

Test app behavior:

  • No client libraries injected as init containers in the namespace rc-tests.
  • No traces observed in dd for the app running in rc-tests or other namespaces.
  • No client libraries injected as init containers for apps in other namespaces.

5.features.apm.instrumentation.disabledNamespaces && features.apm.instrumentation.enabledNamespaces are set
Configuration:

  features:
    apm:
      instrumentation:
        disabledNamespaces:
        - rc-tests
        enabledNamespaces:
        - apps

Result:
Datadog Agent and Cluster Agent don't start. Datadog Operator has the following log:
{"level":"ERROR","ts":"2023-12-01T22:12:46Z","logger":"controller.datadogagent","msg":"Reconciler error","reconciler group":"datadoghq.com","reconciler kind":"DatadogAgent","name":"datadog","namespace":"default","error":"enabledNamespacesanddisabledNamespaces cannot be set together"}

  1. features.apm.instrumentation.enabled is enabled
    Configuration:
  features:
    apm:
      instrumentation:
        enabled: true

Cluster agent env variables:

    - name: DD_APM_INSTRUMENTATION_ENABLED
      value: "true"

Test app behavior:

  • 5 client libraries injected as init containers.
  • Traces observed in dd app.

7.features.apm.instrumentation.enabled is enabled && features.apm.instrumentation.disabledNamespaces is set
Configuration:

  features:
    apm:
      instrumentation:
        enabled: true
        disabledNamespaces:
        - rc-tests

Cluster agent env variables:

    - name: DD_APM_INSTRUMENTATION_ENABLED
      value: "true"
    - name: DD_APM_INSTRUMENTATION_DISABLED_NAMESPACES
      value: '["rc-tests"]'

Test app behavior:

  • No client libraries injected as init containers in the namespace rc-tests.
  • No traces observed in dd for the app running in rc-tests namespace.
  • 5 client libraries injected as init containers for apps in other ns.
  • Traces observed in dd for the app running in other ns

8.features.apm.instrumentation.enabled is enabled && features.apm.instrumentation.enabledNamespaces is set
Configuration:

  features:
    apm:
      instrumentation:
        enabled: true
        enabledNamespaces:
        - rc-tests

Cluster agent env variables:

    - name: DD_APM_INSTRUMENTATION_ENABLED
      value: "true"
    - name: DD_APM_INSTRUMENTATION_ENABLED_NAMESPACES
      value: '["rc-tests"]'

Test app behavior:

  • 5 client libraries injected as init containers for apps in all namespaces
  • Traces observed in dd for the app running in other ns

Copy link
Collaborator

@liliyadd liliyadd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in the PR look good to me. There are some discrepancies between Operator and current Helm chart behavior around disabling SSI in a Datadog namespace by default that @CharlyF and I discussed - Operator won't support disabling SSI in DD namespace and this behavior needs to be documented for the end user. The logic of disabling SSI in DD namespace will be moved to Cluster Agent in 7.51 Agent release allowing Helm Chart cleanup and making Helm and Operator consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants