Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file_output automatically escape HTML entities in log records. #36050

Open
dzhou3 opened this issue Oct 29, 2024 · 7 comments · May be fixed by #36388
Open

file_output automatically escape HTML entities in log records. #36050

dzhou3 opened this issue Oct 29, 2024 · 7 comments · May be fixed by #36388
Labels
bug Something isn't working pkg/stanza

Comments

@dzhou3
Copy link

dzhou3 commented Oct 29, 2024

Component(s)

pkg/stanza

What happened?

Description

file_output operator automatically escapes special characters in log records when Go template is used to customize output format.

Steps to Reproduce

ingest some logs that contain special characters (e.g. single quote, greater sign and etc...) from journaldreceiver, and pipeline to a file_output operator with custom format (Go template).

Expected Result

The output file should show the same log content as the upstream log source.

Actual Result

The output file contains escaped characters.

I believe the root cause is file_output operator is using html/template instead of text/template.

Collector version

v0.108.0

Environment information

Environment

OS: Ubuntu 22.04

OpenTelemetry Collector configuration

receivers:
  journald:
    priority: info
    directory: /var/log/journal
    operators:
      - id: output_log
        type: file_output
        path: /var/log/output_log
        format: "{{.Timestamp}} {{.Body._HOSTNAME}} [{{.Body._PID}}] {{.Body.SYSLOG_IDENTIFIER}} {{.Body.MESSAGE}}"

exporters:
  nop:

service:
  pipelines:
    logs:
      receivers: [journald]
      exporters: [nop]

Log output

/var/log/output_log:

2024-10-29 08:25:57.045724 +0000 UTC ip-172-31-4-221 [30859] ntpd ntpd exiting on signal 15 (Terminated)
2024-10-29 08:25:57.046092 +0000 UTC ip-172-31-4-221 [30859] ntpd 169.254.169.123 local addr 172.31.4.221 -> <null>
2024-10-29 08:25:57.046163 +0000 UTC ip-172-31-4-221 [1] systemd Stopping Network Time Service...
2024-10-29 08:25:57.046837 +0000 UTC ip-172-31-4-221 [1] systemd ntp.service: Deactivated successfully.
2024-10-29 08:25:57.047118 +0000 UTC ip-172-31-4-221 [1] systemd Stopped Network Time Service.
2024-10-29 08:25:57.049068 +0000 UTC ip-172-31-4-221 [1] systemd Starting Network Time Service...
2024-10-29 08:25:57.057964 +0000 UTC ip-172-31-4-221 [31080] ntpd ntpd [email protected] Wed Feb 16 17:13:02 UTC 2022 (1): Starting
2024-10-29 08:25:57.058367 +0000 UTC ip-172-31-4-221 [31080] ntpd Command line: /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 114:122
2024-10-29 08:25:57.058421 +0000 UTC ip-172-31-4-221 [31080] ntpd ----------------------------------------------------
2024-10-29 08:25:57.058482 +0000 UTC ip-172-31-4-221 [31080] ntpd ntp-4 is maintained by Network Time Foundation,
2024-10-29 08:25:57.058529 +0000 UTC ip-172-31-4-221 [31080] ntpd Inc. (NTF), a non-profit 501(c)(3) public-benefit
2024-10-29 08:25:57.05856 +0000 UTC ip-172-31-4-221 [31080] ntpd corporation.  Support and training for ntp-4 are
2024-10-29 08:25:57.058596 +0000 UTC ip-172-31-4-221 [31080] ntpd available at https://www.nwtime.org/support
2024-10-29 08:25:57.05863 +0000 UTC ip-172-31-4-221 [31080] ntpd ----------------------------------------------------
2024-10-29 08:25:57.578266 +0000 UTC ip-172-31-4-221 [31086] ntpd proto: precision = 0.056 usec (-24)
2024-10-29 08:25:57.57869 +0000 UTC ip-172-31-4-221 [31086] ntpd basedate set to 2022-02-04
2024-10-29 08:25:57.57877 +0000 UTC ip-172-31-4-221 [31086] ntpd gps base set to 2022-02-06 (week 2196)
2024-10-29 08:25:57.578809 +0000 UTC ip-172-31-4-221 [31086] ntpd leapsecond file ('/usr/share/zoneinfo/leap-seconds.list'): good hash signature
2024-10-29 08:25:57.578841 +0000 UTC ip-172-31-4-221 [31086] ntpd leapsecond file ('/usr/share/zoneinfo/leap-seconds.list'): loaded, expire=2024-12-28T00:00:00Z last=2017-01-01T00:00:00Z ofs=37
2024-10-29 08:25:57.57993 +0000 UTC ip-172-31-4-221 [] kernel audit: type=1400 audit(1730190357.573:33): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/etc/ssl/openssl.cnf" pid=31086 comm="ntp
d" requested_mask="r" denied_mask="r" fsuid=0 ouid=0


### Additional context

After changing the imports from `html/template` to `text/template` in the file pacakge, and building a local otelcol-contrib, the issue disappeared.
@dzhou3 dzhou3 added bug Something isn't working needs triage New item requiring triage labels Oct 29, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dzhou3 dzhou3 changed the title file_output is not able to parse some special characters in log records. file_output automatically some special characters in log records. Oct 29, 2024
@dzhou3 dzhou3 changed the title file_output automatically some special characters in log records. file_output automatically escape HTML entities in log records. Oct 29, 2024
@djaglowski djaglowski removed the needs triage New item requiring triage label Oct 29, 2024
@djaglowski
Copy link
Member

Thanks for reporting @dzhou3. This operator is really only for debugging purposes but I agree this bug make debugging imprecise, so we should address it.

@dzhou3
Copy link
Author

dzhou3 commented Oct 29, 2024

This operator is really only for debugging purposes

Ah, we were planning to use this operator (along with the router operator) to sort journald logs into different files based on log message patterns. Are you suggesting we should avoid doing this?

@djaglowski
Copy link
Member

Interesting use case. Realistically, I don't expect the operator to go away or change meaningfully, but this is the first time I've ever heard of a case where it would be strictly depended upon. I assume you've ruled out the file exporter? I wonder if an enhancement to that component would be able to solve your case more reliably.

@dzhou3
Copy link
Author

dzhou3 commented Oct 29, 2024

Yes, I've tried the combination journaldreceiver -> transformprocessor (for regex matching and tagging) -> routingprocessor -> fileexporter. However, the output logs are always in the JSON format and contain lots of OTLP metadata. This doesn't meet our requirement -- output should be the same as the journald input without introducing any external services (or we would go by fluentbit instead).

@odubajDT
Copy link
Contributor

Hi, I would like to look at this issue if possible

@odubajDT
Copy link
Contributor

odubajDT commented Nov 15, 2024

According to the documentation, the interface is the same as for html/template, but it should not escape any characters, which are by standard escaped in html. I guess this would be a good solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg/stanza
Projects
None yet
3 participants