Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use time_format and types on multi_format parse #9

Open
carlosedp opened this issue Apr 24, 2018 · 8 comments
Open

Use time_format and types on multi_format parse #9

carlosedp opened this issue Apr 24, 2018 · 8 comments

Comments

@carlosedp
Copy link

I currently parse one tag with a simple filter but I need to add a different pattern to the filter and I'm planning to migrate to a multi_format plugin.

My question is if it's possible to use the time_format and types fields on each the pattern like this:

<filter kubernetes.var.log.containers.traefik-ingress-**.log>
  @type parser
  key_name log
  reserve_data yes
  <parse>
    @type multi_format
    <pattern>
      format regexp
      expression /^(?<ip>[^-]*) - - \[(?<datetime>[^\]]*)\] "(?<method>[^ ]*) (?<path>[^ ]*) (?<http_version>[^"]*)" (?<status_code>[^ ]*) (?<body_bytes>[^ ]*) "(?<referer>[^"]*)" "(?<user_agent>[^"]*)" (?<seq>[^ ]*) "(?<domain>[^ ]*)" "(?<dest_url>[^ ]*)" (?<response_time>[^ ms]*)/
      time_key datetime
      time_format %d/%b/%Y:%H:%M:%S %z
      types status_code:integer,body_bytes:integer,seq:integer,response_time:integer
    </pattern>
    <pattern>
      format regexp
      expression /^\[(?<datetime>[^\]]*)\] - (?<data>[^ ]*)/
    </pattern>
  </parse>
</filter>
@sokoow
Copy link

sokoow commented Feb 15, 2019

Probably not cause I get this when I used your example:

2019-02-15 09:35:16 +0000 [warn]: dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not match with data 'time=\"2019-02-15T09:35:16Z\" level=debug msg=\"Skipping Kubernetes event kind *v1.Endpoints\"\n'" location=nil tag="kubernetes.var.log.containers.traefik-ingress-controller-d2bv5_kube-system_traefik-ingress-lb-3f5d86da4a51d9909d67acae3fa9c8ddf1ba9f83f6b4ece55728a53b41175bae.log" time=2019-02-15 09:35:16.418231659 +0000 record={"log"=>"time=\"2019-02-15T09:35:16Z\" level=debug msg=\"Skipping Kubernetes event kind *v1.Endpoints\"\n", "stream"=>"stdout", "docker"=>{"container_id"=>"3f5d86da4a51d9909d67acae3fa9c8ddf1ba9f83f6b4ece55728a53b41175bae"}, "kubernetes"=>{"container_name"=>"traefik-ingress-lb", "namespace_name"=>"kube-system", "pod_name"=>"traefik-ingress-controller-d2bv5", "container_image"=>"traefik:latest", "container_image_id"=>"docker-pullable://traefik@sha256:79a9b27986068895c5deb438099fbd3072ed645cdcabc72af24e229f868c4cf2", "pod_id"=>"101009d3-2c79-11e9-b433-de1a34070007", "labels"=>{"controller-revision-hash"=>"696f6f7df", "k8s-app"=>"traefik-ingress-lb", "name"=>"traefik-ingress-lb", "pod-template-generation"=>"1"}, "host"=>"kube-master1", "master_url"=>"https://10.96.0.1:443/api", "namespace_id"=>"9e8bebb0-11a2-11e9-9ca2-de1a34070007"}}

@nargmarg
Copy link

So please, is someone knows where we should add the types field because I've tried multiple places and nothing happened : all my field stay on string types :-/

@repeatedly
Copy link
Owner

@nargmarg If you have a problem, need to write configuration and actual log example. We are not psychic, so hard to reply the answer without the information :)

@nargmarg
Copy link

nargmarg commented May 28, 2020

You're right, sorry.
Below you will find one part of my fluentd config (no outut part but no need to share that) :

<filter kubernetes.**>
   @id filter_parser
   @type parser
   key_name log
   reserve_time true
   reserve_data true
   remove_key_name_field true
   <parse>
     @type multi_format
       <pattern>
           format /(?<web.remote_addr>[^ ]*) - - \[(?<web.time>[^\]]*)\] "(?<web.method>\S+)(?: +(? 
        <web.request>[^\"]*) +\S*)?" (?<web.code>[^ ]*) (?<web.size>[^ ]*) "(?<web.referer>[^\"]*)" "(? 
        <web.agent>[^\"]*)" (?<web.request_length>[^ ]*) (?<web.request_time>[^ ]*) \[(? 
        <web.namespace_service_port>[^ ]*)\] (?<web.upstream_addr>[^ ]*) (? 
        <web.upstream_response_length>[^ ]*) (?<web.upstream_response_time>[^ ]*) (? 
       <web.upstream_status>[^ ]*)/
         types web.code:integer,web.size:integer,web.request_length:integer,
         web.request_time:float,web.upstream_addr:array,web.upstream_response_length:integer,
         web.upstream_response_time:float,web.upstream_status:integer
       </pattern>
       <pattern>
         format /time="(?<external_dns.time>[^ ]*)" level=(?<external_dns.level>[^ ]*) msg="(? 
         <external_dns.msg>[^\"]*)"/
       </pattern>
       <pattern>
           format json
       </pattern>
   </parse>
 </filter>

I don't know if it's the right place for te types field but, I have try many places and nothing happened. All types stay in string, nothing is converted...

Do you have any advices? Thansk a lot.
(Don't pay attention to the indention please)

@hetii
Copy link

hetii commented Nov 21, 2022

The same for me.
I parse json escaped logs from ingress controller and all field are as a text type :/
Probably this plugin ignore the types keyword.
@repeatedly Can you confirm/denied if it's possible to define types for fields like it is described in
https://docs.fluentd.org/configuration/parse-section#parse-parameters

@fabio-s-franco
Copy link

I was wondering if @repeatedly is still active. Seen some stuff that would be quite useful (like open PRs) and answers to questions like this.

I know you can use time_format in multi_format. Now I want to know if I can take message with the same overall pattern (json), but with different names for time_key.

@repeatedly
Copy link
Owner

I know you can use time_format in multi_format

Yes. This plugin forwards configurations and events to actual parser plugins, so parser features should be worked.
I tested with simple configuration and it works as expected.

<source>
  @type sample
  sample {"hello":"world","log":"{\"key\":\"value\",\"event_time\":\"22/Feb/2022:12:00:00 +0900\",\"num\":\"100\"}"}
  tag sample
</source>

<filter sample>
  @type parser
  key_name log
  <parse>
    @type multi_format
    <pattern>
      format json
      time_key event_time
      time_format %d/%b/%Y:%H:%M:%S %z
      types num:integer
    </pattern>
    <pattern>
      format none
    </pattern>
  </parse>
</filter>

<match sample>
  @type stdout
</match>

with different names for time_key.

What does this mean? incoming events have different time key names like below?

{"k":"v1","time_key1":"time_value1"}
{"k":"v2","time_key2":"time_value2"}
{"k":"v3","time_key3":"time_value3"}
{"k":"v4","time_key2":"time_value3"}
...
{"k":"vN","time_key1":"time_valueN"}

@fabio-s-franco
Copy link

What does this mean? incoming events have different time key names like below?

Exactly @repeatedly. Been struggling with this for a while and it doesn't really seem to work. For example:

 <source>
        @type http
        bind 0.0.0.0
        port 5880

        <parse>
          @type multi_format
          <pattern>
            format json
            time_key Timestamp
            keep_time_key false
            utc true
            time_format %Y-%m-%dT%H:%M:%S.%NZ
          </pattern>
          <pattern>
            format json
            time_key @t
            keep_time_key false
            utc true
            time_format %Y-%m-%dT%H:%M:%S.%NZ
          </pattern>
          <pattern>
            format json
            time_key @timestamp
            keep_time_key false
            utc true
            time_format %Y-%m-%dT%H:%M:%S.%NZ
          </pattern>
          <pattern>
            format json
            time_key timestamp
            keep_time_key false
            utc true
            time_format %Y-%m-%dT%H:%M:%S.%NZ
          </pattern>
        </parse>

        @label @HTTP
      </source>

My expectation was that it would take go through the parses until there is a match in the time_key. But in the end all different keys endup in the end object in elastic, and the timestamp is the time the event was emitted by fluentd.

This is as far as I got, and just put it aside for now. I find it very difficult to understand how fluentd treats time in general. If you have suggestions, I am all ears :)

Thanks for looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants