Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana seems to be pushing corrupt policies out to Horde drones #134494

Closed
pjbertels opened this issue Jun 15, 2022 · 10 comments
Closed

Kibana seems to be pushing corrupt policies out to Horde drones #134494

pjbertels opened this issue Jun 15, 2022 · 10 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@pjbertels
Copy link

Kibana version:
8.3.0-460dc667 / 8.3.0-SNAPSHOT

Elasticsearch version:

Server OS version:

Browser version:

Browser OS version:

Original install method (e.g. download page, yum, from source, etc.):
ESS - perf-gofbm-custom 8.3.0-460dc667 / 8.3.0-SNAPSHOT

Describe the bug:
Horde drones cannot unmarshal policy updates sent down.
Some values in the JSON, for example 'processors' show up as 'Object object' and don't seem correct.

Steps to reproduce:
The cluster is up and running.

Expected behavior:

Screenshots (if relevant):
image

Errors in browser console (if relevant):

Provide logs and/or server output (if relevant):

Any additional context:
I have an example of JSON for the integration policy.

@pjbertels pjbertels added bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team labels Jun 15, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@ph
Copy link
Contributor

ph commented Jun 15, 2022

@pjbertels Can you add the YAML format of the policy / removing any secrets in it?

@kpollich
Copy link
Member

I grabbed a random policy YML from Fleet UI on the performance cluster using credentials I got from @pjbertels, pasting the policy below:

id: d5fa41a0-ece0-11ec-999b-910b6f3f85c8
revision: 2
outputs:
  default:
    type: elasticsearch
    hosts:
      - >-
        https://f3945852051a4a3c839383ce00fb4b87.us-west2.gcp.elastic-cloud.com:443
output_permissions:
  default:
    _elastic_agent_monitoring:
      indices:
        - names:
            - logs-elastic_agent.apm_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.apm_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.auditbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.auditbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.cloudbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.cloudbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.elastic_agent-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.endpoint_security-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.endpoint_security-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.filebeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.filebeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.fleet_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.fleet_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.heartbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.heartbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.metricbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.metricbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.osquerybeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.osquerybeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.packetbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.packetbeat-default
          privileges:
            - auto_configure
            - create_doc
    _elastic_agent_checks:
      cluster:
        - monitor
    c7a7efcd-dbc3-4d14-8d03-b228db0c654d:
      indices:
        - names:
            - logs-system.auth-default
          privileges:
            - auto_configure
            - create_doc
agent:
  monitoring:
    enabled: true
    use_output: default
    namespace: default
    logs: true
    metrics: true
inputs:
  - id: logfile-c7a7efcd-dbc3-4d14-8d03-b228db0c654d
    name: system-9e2059b5-6d45-46da-8226-cff05b3ead9d
    revision: 1
    type: logfile
    use_output: default
    meta:
      package:
        name: system
        version: 1.11.0
    data_stream:
      namespace: default
    streams:
      - id: logfile-system.auth-c7a7efcd-dbc3-4d14-8d03-b228db0c654d
        data_stream:
          dataset: system.auth
          type: logs
        paths:
          - /var/log/auth.log*
          - /var/log/secure*
        exclude_files:
          - .gz$
        multiline:
          pattern: ^\s
          match: after
        processors:
          - add_locale: null
fleet:
  hosts:
    - >-
      https://6145ecd013694673759b69c889cd8ebc.fleet.us-west2.gcp.elastic-cloud.com:443

The relevant processors field appears as:

processors:
  - add_locale: null

Editing the policy shows the following policy editor

image

This is somewhat concerning, because a brand new system policy created on this cluster shows the following editor UI, note the different inputs, etc

image

@jen-huang
Copy link
Contributor

jen-huang commented Jun 15, 2022

@kpollich Indeed the inputs look incomplete. It seems the processors should be hard coded from the agent yaml template:

https://github.com/elastic/integrations/blob/03103dae9f9de2bed8d49385411a088b99afc3a5/packages/system/data_stream/auth/agent/stream/log.yml.hbs#L9

I wonder what ~ gets parsed as from YAML -> JSON?

@ph
Copy link
Contributor

ph commented Jun 15, 2022

The line you are referring @jen-huang is 2 years old? We haven't changed the parsing on the agent recently so I am not sure which regression would create that issue?

@ph
Copy link
Contributor

ph commented Jun 15, 2022

Looking at the YAML specification ~ is equivalent to null | Null | NULL | ~, so looking at the value in the yaml above is appropriate. Looking at the whole configuration this look like a valid configuration to me and a valid yaml file. Is that possible that the formatting is different from the configuration send to the agent, trying to understand the object object at the processor level.

For the ~ usage, this is a weird behavior in beats, some processors like the add_locales processor is only listed in the processors list, the ~ / nil values only ensure that the key exist to identify exist..

@joshdover
Copy link
Contributor

While we need to figure out what is going on here, should we consider this an 8.3 blocker?

@kpollich
Copy link
Member

I pulled the JSON for a system integration policy from dev tools, so here's what we're storing in the saved object here:

{
  "_index": ".kibana_8.3.0_001",
  "_id": "ingest-package-policies:1a321526-5a17-4594-b4ef-bf033381803c",
  "_version": 1,
  "_seq_no": 6893,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "ingest-package-policies": {
      "name": "system-6aa63f3d-34d4-4da8-9c82-b1a7b649e18e",
      "namespace": "default",
      "description": "System",
      "package": {
        "name": "system",
        "title": "System",
        "version": "1.11.0"
      },
      "enabled": true,
      "policy_id": "c8311960-ed6a-11ec-999b-910b6f3f85c8",
      "output_id": "6aa63f3d-34d4-4da8-9c82-b1a7b649e18e",
      "inputs": [
        {
          "type": "logfile",
          "enabled": true,
          "streams": [
            {
              "data_stream": {
                "type": "logs",
                "dataset": "system.auth"
              },
              "enabled": true,
              "vars": {
                "paths": {
                  "type": "text",
                  "value": [
                    "/var/log/auth.log*",
                    "/var/log/secure*"
                  ]
                }
              },
              "id": "logfile-system.auth-1a321526-5a17-4594-b4ef-bf033381803c",
              "compiled_stream": {
                "paths": [
                  "/var/log/auth.log*",
                  "/var/log/secure*"
                ],
                "exclude_files": [
                  ".gz$"
                ],
                "multiline": {
                  "pattern": """^\s""",
                  "match": "after"
                },
                "processors": [
                  {
                    "add_locale": null
                  }
                ]
              }
            }
          ],
          "vars": {},
          "config": {}
        }
      ],
      "revision": 1,
      "created_at": "2022-06-16T11:52:27.649Z",
      "created_by": "admin",
      "updated_at": "2022-06-16T11:52:27.649Z",
      "updated_by": "admin"
    },
    "type": "ingest-package-policies",
    "references": [],
    "migrationVersion": {
      "ingest-package-policies": "8.3.0"
    },
    "coreMigrationVersion": "8.3.0",
    "updated_at": "2022-06-16T11:52:27.649Z"
  }
}

e.g.

"processors": [
  {
    "add_locale": null
  }
]

This all seems correct to me, and aligns with the "add_locale" block we get when creating a brand new system policy on this cluster as well, which looks identical.

Also I'm in agreement with @ph above - we haven't touched this line of code in the system integration in two years, and we haven't seen two years worth of failed policy rollouts for all system policies. Could there be some kind of JSON or YML parsing issue in Horde when it pushes these policies out to agents? I'm not familiar enough with Horde's internals to know for sure the workflow here.

@ph
Copy link
Contributor

ph commented Jun 16, 2022

I am looking at horde, I've recently touch it for upgrade action but that should not have impacted the serialization.
@joshdover This might be an issue with horde only, I am investigating it now.

@ph ph assigned ph and unassigned kpollich Jun 16, 2022
@kpollich
Copy link
Member

To finalize this, we made some findings that the performance testing code is sending through entries like when creating package policies:

{
  "type": "text",
  "value": {}
}

These are then serialized as strings, resulting in the [Object object] string values populating the policy objects. The fix/workaround is to prevent {} from coming through in place of null or simply omitting the empty value. A more permanent fix might be to provide variable definitions in a more convenient location like the package policy API, so that we can support the workflow of:

  1. Create a package policy via Kibana UI
  2. Grab its JSON representation from the Fleet API
  3. Use that JSON as a template to create further policies

This is already sort of a workaround workflow, though, as the real root issue is that it's very difficult to pass input/data stream variables to Fleet's API without detailed knowledge of an integration's structure and each variable's particular configuration.

I think this falls under #132263 and in turn #123150.

I created #134586 as a potential stopgap solution to make this easier for the performance team to consume in the short-term, but if we can get a simple workaround to prevent the {} values in the create package policy responses, I'd rather defer to the above issue around making this API more human readable.

I'm closing this as we discovered the root issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

6 participants