Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit usage of the deprecated log and container inputs #42295

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

rdner
Copy link
Member

@rdner rdner commented Jan 13, 2025

The log input has been deprecated since 7.16, it's time to disable it. The container input is just an alias/preset for the log input.

They still can be used under the following circumstances:

  • When running under Elastic Agent (by integrations)
  • When running as a part of a module
  • When allow_deprecated_use is set as a part of the the input configuration

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Users who are still using the deprecated log and container inputs in their hand-crafted configuration will see the error message on Filebeat's startup.

How to test this PR locally

Running log input in the user-crafted configuration fails

filebeat.yml

filebeat.inputs:
  - type: log
    paths:
      - "/logs/log*.log"
      - "/logs/log*.json"
path.data: "/data"
logging:
  level: debug
output.console:
  enabled: true

This configuration produces the following error message:
Screenshot 2025-01-17 at 12 20 28

Similar for the container input:

filebeat.inputs:
  - type: container
    paths:
      - "/logs/log*.log"
      - "/logs/log*.json"
path.data: "/data"
logging:
  level: debug
output.console:
  enabled: true
Screenshot 2025-01-17 at 12 17 09

Setting allow_deprecated_use: true enables the log input

filebeat.yml

filebeat.inputs:
  - type: log
    allow_deprecated_use: true
    paths:
      - "/logs/log*.log"
      - "/logs/log*.json"
path.data: "/data"
logging:
  level: debug
output.console:
  enabled: true
filebeat.inputs:
  - type: container
    allow_deprecated_use: true
    paths:
      - "/logs/log*.log"
      - "/logs/log*.json"
path.data: "/data"
logging:
  level: debug
output.console:
  enabled: true

These configurations run without the error.

log input still runs as a part of a module

For example, the postgresql module is using the log input:

If we try to start this module via the configuration, Filebeat produces no error:

filebeat.yml

filebeat.modules:
  - module: postgresql
    log:
      enabled: true
path.data: "/data"
logging:
  level: debug
output.console:
  enabled: true

log input still runs as a part of an integration (Elastic Agent)

I ran the agent packaging with the Beats being on this branch:

DEV=true EXTERNAL=false SNAPSHOT=true PLATFORMS=darwin/arm64 PACKAGES=tar.gz mage -v package

Verified that the packaged agentbeat contains the code from this branch:

filebeat.yml

filebeat.inputs:
  - type: log
    allow_deprecated_use: true
    paths:
      - "/logs/log*.log"
      - "/logs/log*.json"
path.data: "/data"
logging:
  level: debug
output.console:
  enabled: true
#!/bin/bash

set -e

AGENT_ROOT=~/Projects/elastic-agent/build/distributions/elastic-agent-9.0.0-SNAPSHOT-darwin-aarch64/data/elastic-agent-c7c5ba/components

$AGENT_ROOT/agentbeat filebeat run -e --path.config "/deprecate_log/config/" #2> >(tee output.json | jq .)

and it failed as expected:

Screenshot 2025-01-17 at 13 51 59

I use "Custom Logs" integration for the next test because it's based on the log input:

https://github.com/elastic/integrations/blob/a957144c41043e28e7effa343d4619f00f4deef3/packages/log/manifest.yml#L19

I created a cloud deployment and enrolled my custom agent to the Fleet with the "Custom Logs" integration.

./elastic-agent status --output json
{
    "info": {
        "id": "bbc5aff6-81db-421c-b674-5dcdf6e67d16",
        "version": "9.0.0",
        "commit": "c7c5ba4132605f9bdef4b1bd63ab6337fef4abba",
        "build_time": "2025-01-17 12:36:46 +0000 UTC",
        "snapshot": true,
        "pid": 85790,
        "unprivileged": false,
        "is_managed": true
    },
    "state": 2,
    "message": "Running",
    "components": [
        {
            "id": "log-default",
            "name": "log",
            "state": 2,
            "message": "Healthy: communicating with pid '85831'",
            "units": [
                {
                    "unit_id": "log-default",
                    "unit_type": 1,
                    "state": 2,
                    "message": "Healthy"
                },
                {
                    "unit_id": "log-default-logfile-logs-e9751ac9-bb83-4573-a41a-3824f423aa37",
                    "unit_type": 0,
                    "state": 2,
                    "message": "Healthy",
                    "payload": {
                        "streams": {
                            "logfile-log.logs-e9751ac9-bb83-4573-a41a-3824f423aa37": {
                                "error": "",
                                "status": "HEALTHY"
                            }
                        }
                    }
                }
            ],
            "version_info": {
                "name": "beat-v2-client",
                "meta": {
                    "build_time": "2025-01-17 12:32:44 +0000 UTC",
                    "commit": "265dda891e5bc6f9b7dbe7e417917ab2ba11d4e7"
                }
            }
        }
    ],
    "FleetState": 2,
    "FleetMessage": "Connected"
}
I saw in the logs that the log input was running
{
  "log.level": "debug",
  "@timestamp": "2025-01-17T13:15:38.405Z",
  "message": "Run input",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "log-default",
    "type": "log"
  },
  "log": {
    "source": "log-default"
  },
  "service.name": "filebeat",
  "ecs.version": "1.6.0",
  "log.logger": "input",
  "log.origin": {
    "file.line": 149,
    "file.name": "input/input.go",
    "function": "github.com/elastic/beats/v7/filebeat/input.(*Runner).Run"
  }
}
{
  "log.level": "debug",
  "@timestamp": "2025-01-17T13:15:38.405Z",
  "message": "Start next scan",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "log-default",
    "type": "log"
  },
  "log": {
    "source": "log-default"
  },
  "log.logger": "input",
  "log.origin": {
    "file.line": 246,
    "file.name": "log/input.go",
    "function": "github.com/elastic/beats/v7/filebeat/input/log.(*Input).Run"
  },
  "service.name": "filebeat",
  "input_id": "fd29f73b-9e51-4c3e-8133-1d7a6f807732",
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2025-01-17T13:15:38.418Z",
  "message": "Check file for harvesting: /Users/rdner/Projects/es_confs/deprecate_log/logs/log1.log",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "log-default",
    "type": "log"
  },
  "log": {
    "source": "log-default"
  },
  "input_id": "fd29f73b-9e51-4c3e-8133-1d7a6f807732",
  "ecs.version": "1.6.0",
  "log.logger": "input",
  "log.origin": {
    "file.line": 494,
    "file.name": "log/input.go",
    "function": "github.com/elastic/beats/v7/filebeat/input/log.getFileState"
  },
  "service.name": "filebeat"
}
{
  "log.level": "debug",
  "@timestamp": "2025-01-17T13:15:38.418Z",
  "message": "Update existing file for harvesting: /Users/rdner/Projects/es_confs/deprecate_log/logs/log1.log, offset: 0",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "log-default",
    "type": "log"
  },
  "log": {
    "source": "log-default"
  },
  "state_id": "native::210784429-16777233",
  "os_id": "210784429-16777233",
  "source_file": "/Users/rdner/Projects/es_confs/deprecate_log/logs/log1.log",
  "old_finished": false,
  "ecs.version": "1.6.0",
  "input_id": "fd29f73b-9e51-4c3e-8133-1d7a6f807732",
  "log.logger": "input",
  "log.origin": {
    "file.line": 593,
    "file.name": "log/input.go",
    "function": "github.com/elastic/beats/v7/filebeat/input/log.(*Input).harvestExistingFile"
  },
  "old_source": "/Users/rdner/Projects/es_confs/deprecate_log/logs/log1.log",
  "service.name": "filebeat",
  "finished": false,
  "old_os_id": "210784429-16777233"
}
{
  "log.level": "debug",
  "@timestamp": "2025-01-17T13:15:38.418Z",
  "message": "Harvester for file is still running: /Users/rdner/Projects/es_confs/deprecate_log/logs/log1.log",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "log-default",
    "type": "log"
  },
  "log": {
    "source": "log-default"
  },
  "service.name": "filebeat",
  "input_id": "fd29f73b-9e51-4c3e-8133-1d7a6f807732",
  "os_id": "210784429-16777233",
  "source_file": "/Users/rdner/Projects/es_confs/deprecate_log/logs/log1.log",
  "state_id": "native::210784429-16777233",
  "old_finished": false,
  "finished": false,
  "old_source": "/Users/rdner/Projects/es_confs/deprecate_log/logs/log1.log",
  "ecs.version": "1.6.0",
  "log.logger": "input",
  "log.origin": {
    "file.line": 648,
    "file.name": "log/input.go",
    "function": "github.com/elastic/beats/v7/filebeat/input/log.(*Input).harvestExistingFile"
  },
  "old_os_id": "210784429-16777233"
}
{
  "log.level": "debug",
  "@timestamp": "2025-01-17T13:15:38.418Z",
  "message": "input states cleaned up. Before: 1, After: 1, Pending: 0",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "log-default",
    "type": "log"
  },
  "log": {
    "source": "log-default"
  },
  "log.logger": "input",
  "log.origin": {
    "file.line": 310,
    "file.name": "log/input.go",
    "function": "github.com/elastic/beats/v7/filebeat/input/log.(*Input).cleanupStates"
  },
  "service.name": "filebeat",
  "input_id": "fd29f73b-9e51-4c3e-8133-1d7a6f807732",
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2025-01-17T13:15:43.378Z",
  "message": "End of file reached: /Users/rdner/Projects/es_confs/deprecate_log/logs/log1.log; Backoff now.",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "log-default",
    "type": "log"
  },
  "log": {
    "source": "log-default"
  },
  "input_id": "fd29f73b-9e51-4c3e-8133-1d7a6f807732",
  "finished": false,
  "harvester_id": "ce7d846b-bddc-4fa0-8af5-1b3af845775f",
  "log.origin": {
    "file.line": 111,
    "file.name": "log/log.go",
    "function": "github.com/elastic/beats/v7/filebeat/input/log.(*Log).Read"
  },
  "service.name": "filebeat",
  "source_file": "/Users/rdner/Projects/es_confs/deprecate_log/logs/log1.log",
  "state_id": "native::210784429-16777233",
  "os_id": "210784429-16777233",
  "ecs.version": "1.6.0",
  "log.logger": "input.harvester"
}

I saw the incoming data in Kibana too:

Screenshot 2025-01-17 at 14 22 02

@rdner rdner added Filebeat Filebeat breaking change backport-skip Skip notification from the automated backport with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team deprecation labels Jan 13, 2025
@rdner rdner self-assigned this Jan 13, 2025
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 13, 2025
rdner added 2 commits January 13, 2025 14:26
The log input has been depredated since 7.16, it's time to disable it.
The container input is just an alias/preset for the log input.

They still can be used under the following circumstances:

* When running under Elastic Agent (by integrations)
* When running as a part of a module
* When `allow_deprecated_use` is set as a part of the the input configuration
@rdner rdner force-pushed the deprecate-log-input branch from 6401032 to f1a1aa3 Compare January 13, 2025 13:29
@rdner rdner force-pushed the deprecate-log-input branch from 99d4d76 to 1f497a9 Compare January 15, 2025 10:14
@rdner rdner force-pushed the deprecate-log-input branch from 1f497a9 to ba5df37 Compare January 15, 2025 10:23
@rdner rdner force-pushed the deprecate-log-input branch from 4fb2853 to 41e6772 Compare January 15, 2025 12:34
@rdner rdner force-pushed the deprecate-log-input branch from 508685e to 6de94ca Compare January 15, 2025 15:48
@rdner rdner force-pushed the deprecate-log-input branch 10 times, most recently from a9aa39c to 6aa00a4 Compare January 16, 2025 14:50
@rdner rdner force-pushed the deprecate-log-input branch 2 times, most recently from 4dd7477 to b8d927c Compare January 16, 2025 16:30
@rdner rdner force-pushed the deprecate-log-input branch from b8d927c to 265dda8 Compare January 16, 2025 18:32
@@ -217,6 +217,7 @@ def test_wrong_module_no_reload(self):
self.render_config_template(
reload=False,
reload_path=self.working_dir + "/configs/*.yml",
reload_type="modules",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug, just luck that this test worked before.

@@ -227,6 +228,7 @@ def test_wrong_module_no_reload(self):
test:
enabled: true
wrong_field: error
var.paths: []
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was broken, this was missing and the test was expecting an error for empty paths.

@rdner rdner marked this pull request as ready for review January 17, 2025 13:24
@rdner rdner requested a review from a team as a code owner January 17, 2025 13:24
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just missing integration tests for both cases: standalone and under Elastic-Agent.

@@ -1,4 +1,4 @@
type: log
type: filestream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every Filestream instance must have a unique ID, even for tests and examples it's better to always add it.

Suggested change
type: filestream
type: filestream
id: foo-multi

@@ -1,4 +1,4 @@
type: log
type: filestream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here

Suggested change
type: filestream
type: filestream
id: foo-multibad


The following example configures {beatname_uc} to harvest lines from all log files that match the specified glob patterns:

["source","yaml",subs="attributes"]
-------------------------------------------------------------------------------------
{beatname_lc}.inputs:
- type: log
- type: filestream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- type: filestream
- type: filestream
id: unique-ID

@@ -3,6 +3,12 @@
[id="{beatname_lc}-input-{type}"]
=== Container input

deprecated:[7.16.0]

The container input is deprecated. Please use the the <<filebeat-input-filestream,`filestream input`>> and its <<filebeat-input-filestream-parsers-container,`container`>> parser.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should add an example configuration here, there are a couple of pitfalls when using Filestream to ingest container data:

  • The ID must be unique
  • Following symlinks needs to be enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -58,6 +58,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Filestream inputs with duplicated IDs will fail to start. An error is logged showing the ID and the full input configuration. {issue}41938[41938] {pull}41954[41954]
- Filestream inputs can define `allow_deprecated_id_duplication: true` to run keep the previous behaviour of running inputs with duplicated IDs. {issue}41938[41938] {pull}41954[41954]
- The Filestream input only starts to ingest a file when it is >= 1024 bytes in size. This happens because the fingerprint` is the default file identity now. To restore the previous behaviour, set `file_identity.native: ~` and `prospector.scanner.fingerprint.enabled: false` {issue}40197[40197] {pull}41762[41762]
- Filebeat fails to start when its configuration contains usage of the deprecated `log` or `container` inputs {pull}42295[42295]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning the allow_deprecated_use flag here directly.

@@ -3,6 +3,12 @@
[id="{beatname_lc}-input-{type}"]
=== Container input

deprecated:[7.16.0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a retroactive deprecation isn't it? There's no deprecation warning right now https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-container.html

We are the only people who know this is the log input underneath, we should explain we consider this deprecated because it is just the log input underneath and can be replaced by filestream directly.

@@ -3,6 +3,12 @@
[id="{beatname_lc}-input-{type}"]
=== Container input

deprecated:[7.16.0]

The container input is deprecated. Please use the the <<filebeat-input-filestream,`filestream input`>> and its <<filebeat-input-filestream-parsers-container,`container`>> parser.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

// AllowDeprecatedUse returns true if the configuration allows using the deprecated log input
func AllowDeprecatedUse(cfg *conf.C) bool {
allow, _ := cfg.Bool(allowDeprecatedUseField, -1)
return allow || fleetmode.Enabled() || fileset.CheckIfModuleInput(cfg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to always allow the log input when its run by agent? Every standalone agent configuration will bypass the need for the allow_deprecated_use flag?

Can't we just add this flag to every integration in https://github.com/elastic/integrations/ that uses log or one of its aliases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify breaking change deprecation Filebeat Filebeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants