Logstash

Parsing a new log data source

To continue with the example of the cooltool service added in the PCAP processors section, assuming that cooltool generates some textual log files to be parsed and indexed into Malcolm.

Users will have have configured cooltool in the cooltool.Dockerfile and its section in the docker-compose files to write logs into a subdirectory or subdirectories in a shared folder - bind mounted in such a way that both the cooltool and filebeat containers can access. Referring to the zeek container as an example, this is how the ./zeek-logs folder is handled; both the filebeat and zeek services have ./zeek-logs in their volumes: section:

$ grep -P "^(      - ./zeek-logs|  [\w-]+:)" docker-compose.yml | grep -B1 "zeek-logs"
  filebeat:
      - ./zeek-logs:/data/zeek
--
  zeek:
      - ./zeek-logs/upload:/zeek/upload
…

Access to the cooltool logs must be provided in a similar fashion.

Next, tweak [filebeat-logs.yml]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/filebeat/filebeat-logs.yml) by adding a new log input path pointing to the cooltool logs to send them along to the logstash container. This modified filebeat-logs.yml will need to be reflected in the filebeat container via bind mount or by rebuilding it.

Logstash can then be easily extended to add more [logstash/pipelines]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/logstash/pipelines). At the time of this writing (as of the [v5.0.0 release]({{ site.github.repository_url }}/releases/tag/v5.0.0)), the Logstash pipelines basically look like this:

input (from filebeat) sends logs to 1..n parse pipelines
each parse pipeline does what it needs to do to parse its logs then sends them to the enrichment pipeline
the [enrichment pipeline]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/logstash/pipelines/enrichment) performs common lookups to the fields that have been normalized and indexes the logs into the OpenSearch data store

In order to add a new parse pipeline for cooltool after tweaking [filebeat-logs.yml]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/filebeat/filebeat-logs.yml) as described above, create a cooltool directory under [logstash/pipelines]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/logstash/pipelines) that follows the same pattern as the zeek parse pipeline. This directory will have an input file (tiny), a filter file (possibly large), and an output file (tiny). In the filter file, be sure to set the field event.hash to a unique value to identify indexed documents in OpenSearch; the fingerprint filter may be useful for this.

Finally, in the ./config/logstash.env file, set a new LOGSTASH_PARSE_PIPELINE_ADDRESSES environment variable to cooltool-parse,zeek-parse,suricata-parse,beats-parse (assuming the pipeline address from the previous step was named cooltool-parse) so that logs sent from filebeat to logstash are forwarded to all parse pipelines.

Parsing new Zeek logs

The following modifications must be made in order for Malcolm to parse new Zeek log files:

Add a parsing filter file named so that it sorts after [logstash/pipelines/zeek/1001_zeek_parse.conf]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/logstash/pipelines/zeek/1001_zeek_parse.conf) but before [logstash/pipelines/zeek/1199_zeek_unknown.conf]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/logstash/pipelines/zeek/1199_zeek_unknown.conf)
- Follow patterns for existing log files as an example
- For common Zeek fields such as the id four-tuple, timestamp, etc., use the same convention used by existing Zeek logs in that file (e.g., ts, uid, orig_h, orig_p, resp_h, resp_p)
- The [logstash/scripts/logstash-start.sh]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/logstash/scripts/logstash-start.sh) Logstash container startup script should automatically fix any issues with parsing the Zeek tab delimiter (e.g., converting spaces in the dissect and split filters to tabs)
If necessary, perform log normalization in [logstash/pipelines/zeek/1300_zeek_normalize.conf]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/logstash/pipelines/zeek/1300_zeek_normalize.conf) for values such as action (event.action), result (event.result), application protocol version (network.protocol_version), etc.
If necessary, define conversions for floating point or integer values in [logstash/pipelines/zeek/1400_zeek_convert.conf]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/logstash/pipelines/zeek/1400_zeek_convert.conf)
Identify the new fields and add them as described in Adding new log fields

The script [scripts/zeek_script_to_malcolm_boilerplate.py]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/scripts/zeek_script_to_malcolm_boilerplate.py) may help by autogenerating these filters.

Enrichments

Malcolm's Logstash instance will do a lot of enrichments automatically: see the [enrichment pipeline]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/logstash/pipelines/enrichment), including MAC address to vendor by OUI, GeoIP, ASN, and a few others. In order to take advantage of these enrichments that are already in place, normalize new fields to use the same standardized field names Malcolm uses for things such as IP addresses, MAC addresses, etc. Additional enrichments may be added by creating new .conf files containing Logstash filters in the [enrichment pipeline]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/logstash/pipelines/enrichment) directory and using either of the techniques in the Local modifications section to implement those changes in the logstash container.

Logstash plugins

The [logstash.Dockerfile]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/Dockerfiles/logstash.Dockerfile) installs the Logstash plugins used by Malcolm (search for logstash-plugin install in that file). Additional Logstash plugins could be installed by modifying this Dockerfile and rebuilding the logstash image.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contributing-logstash.md

contributing-logstash.md

Logstash

Parsing a new log data source

Parsing new Zeek logs

Enrichments

Logstash plugins

Files

contributing-logstash.md

Latest commit

History

contributing-logstash.md

File metadata and controls

Logstash

Parsing a new log data source

Parsing new Zeek logs

Enrichments

Logstash plugins