Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected EOF #591

Closed
krep-dr opened this issue Oct 18, 2022 · 15 comments
Closed

unexpected EOF #591

krep-dr opened this issue Oct 18, 2022 · 15 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. target/kubernetes Issues relating to kubernetes cluster scanning

Comments

@krep-dr
Copy link

krep-dr commented Oct 18, 2022

What steps did you take and what happened:
I'm getting a quite a lot of reconciler errors in the trivy-operator log. When the error occurs the scan-vulnerability job has a Completed status but is never cleaned up and no vulnerability report is created. I don't think it is related to the image since a rerun sometimes works.

Thanks.

1.6660767565583582e+09    ERROR   Reconciler error        {"controller": "job", "controllerGroup": "batch", "controllerKind": "Job", "Job": {"name":"scan-vulnerabilityreport-bb4cd84bb","namespace":"core-trivy-operator"}, "namespace": "core-trivy-operator", "name": "scan-vulnerabilityreport-bb4cd84bb", "reconcileID": "c9be99c0-bc9e-4f76-921a-8342371a8556", "error": "unexpected EOF"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234

Anything else you would like to add:

Might be the same problem as described here aquasecurity/starboard#1031

Environment:

  • Trivy-Operator version (use trivy-operator version): 0.3.0
  • Kubernetes version (use kubectl version): v1.23.9-gke.900
  • OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): macOS 10.15
@krep-dr krep-dr added the kind/bug Categorizes issue or PR as related to a bug. label Oct 18, 2022
@chen-keinan
Copy link
Contributor

chen-keinan commented Oct 18, 2022

@krep-dr thank you for reporting this issue .

  • Is this happening after upgrading to version 0.3.0 ?
  • is this happening for all scan jobs ?

@krep-dr
Copy link
Author

krep-dr commented Oct 18, 2022

It is a clean install. I have not used the trivy-operator before version 0.3.0
Not all jobs are affected but I have not been able to see a pattern.

@krep-dr
Copy link
Author

krep-dr commented Oct 18, 2022

I guess the output from the scan jobs is corrupted for some reason

kubectl logs scan-vulnerabilityreport-f44855689-hlj84 | base64 --decode | bzip2 -dtvv
  (stdin): Defaulted container "lyd-web-lpm-test" out of: lyd-web-lpm-test, 923c6958-7b15-4cf9-94e7-70a72c45e4ba (init)

    [1: huff+mtf file ends unexpectedly

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

@chen-keinan
Copy link
Contributor

chen-keinan commented Oct 18, 2022

@krep-dr scan-jobs output is compressed (bzip) and encoded (base64) scan job output.
can you please try to delete the scan jobs (with issue) and restart trivy-operator

@chen-keinan
Copy link
Contributor

I guess the output from the scan jobs is corrupted for some reason

kubectl logs scan-vulnerabilityreport-f44855689-hlj84 | base64 --decode | bzip2 -dtvv
  (stdin): Defaulted container "lyd-web-lpm-test" out of: lyd-web-lpm-test, 923c6958-7b15-4cf9-94e7-70a72c45e4ba (init)

    [1: huff+mtf file ends unexpectedly

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

yes , look like scan didn't completed successfully are you using regular (image) scan or filesystem scan ?

@chen-keinan chen-keinan added target/kubernetes Issues relating to kubernetes cluster scanning priority/backlog Higher priority than priority/awaiting-more-evidence. labels Oct 18, 2022
@krep-dr
Copy link
Author

krep-dr commented Oct 19, 2022

I'm using the regular scan

Containers:
  lyd-web-lpm-test:
    Container ID:  containerd://118b4359f9dd3b41365ad7913bf7d7065146db9a1e97972a91c3d1f0cd8db2f5
    Image:         ghcr.io/aquasecurity/trivy:0.31.3
    Image ID:      ghcr.io/aquasecurity/trivy@sha256:3516c70972d05afaee84c47305d58c2497547f56d60fdaee99d84ff7e28159b2
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      trivy image 'org/lyd-web:9237f8f48ae68b660f2a2d2f98ffc2ee3353c60d' --security-checks vuln --cache-dir /tmp/trivy/.cache --quiet --skip-update --format json > /tmp/scan/result_lyd-web-lpm-test.json &&  bzip2 -c /tmp/scan/result_lyd-web-lpm-test.json | base64
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 17 Oct 2022 20:37:31 +0200
      Finished:     Mon, 17 Oct 2022 20:38:30 +0200

@krep-dr
Copy link
Author

krep-dr commented Oct 19, 2022

After triggering the scanner job manually in Kubernetes with args trivy image 'org/lyd-web:9237f8f48ae68b660f2a2d2f98ffc2ee3353c60d' --security-checks vuln --cache-dir /tmp/trivy/.cache --quiet --skip-update --format json I can see the scan result is missing some data. It looks like the output is cut off which results in an invalid json. After rerunning the job multiple times it suddenly started working again but now another scan job is broken.

@chen-keinan
Copy link
Contributor

@krep-dr lets keep this issue open and track this issue

@krep-dr
Copy link
Author

krep-dr commented Oct 21, 2022

@chen-keinan Adding a sleep 10 to the arg list for the scanner pod seems to fix the problem.

Args:
      -c
      trivy image 'org/lyd-web:9237f8f48ae68b660f2a2d2f98ffc2ee3353c60d' --security-checks vuln --cache-dir /tmp/trivy/.cache --quiet --skip-update --format json > /tmp/scan/result_lyd-web-lpm-test.json &&  bzip2 -c /tmp/scan/result_lyd-web-lpm-test.json | base64 && sleep 10

Based on that observation, a theory could be that adding a sleep ensures enough time for the full report to be outputted before the pod terminates? Maybe a buffer that needs to be flushed

@chen-keinan
Copy link
Contributor

@krep-dr thanks for checkin it out.
note : trivy-operator initiate the scan-job deletion after logs are parsed

@chen-keinan
Copy link
Contributor

chen-keinan commented Oct 28, 2022

@krep-dr I would suggest to try it again with trivy-operator v0.5.0 using this helm param :
--set="trivyOperator.scanJobCompressLogs=false"

@chen-keinan
Copy link
Contributor

@krep-dr please let me know if still issue after v0.5.0 ☝️

@krep-dr
Copy link
Author

krep-dr commented Nov 4, 2022

@chen-keinan thanks. I haven't got around to test it yet. Hopefully I can take a look at it later today or next week

@chen-keinan
Copy link
Contributor

@krep-dr is this issue can be closed ?

@krep-dr
Copy link
Author

krep-dr commented Nov 25, 2022

@chen-keinan apologies for my late response. It seems to be running more stable now with the compression disabled, so let's close it for now 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. target/kubernetes Issues relating to kubernetes cluster scanning
Projects
None yet
Development

No branches or pull requests

2 participants