Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete verifier log in 1.16 #1554

Open
weli-l opened this issue Aug 30, 2024 · 7 comments
Open

Incomplete verifier log in 1.16 #1554

weli-l opened this issue Aug 30, 2024 · 7 comments
Labels
bug Something isn't working release-blocker Needs resolution before the next release

Comments

@weli-l
Copy link

weli-l commented Aug 30, 2024

Describe the bug

When I called collection.LoadAndAssign, I found that the error log of ebpf program loading failure was intercepted, resulting in very few logs available to me. Is there any way to restore all these logs?

time="2024-08-27T02:38:37Z" level=error msg="loadKmeshSockConnObjects failed: program cluster_manager: load program: bad address: regs=10 stack=0 before 994: (67) r4 <<= 32: regs=10 stack=0 (truncated, 960 line(s) omitted)" subsys=pkg/bpf
time="2024-08-27T02:38:37Z" level=info msg="Clean kmesh_version map and bpf prog" subsys=pkg/bpf

How to reproduce

code a wrong bpf prog

Version information

v0.16.0

@weli-l weli-l added the bug Something isn't working label Aug 30, 2024
@florianl
Copy link
Contributor

Hi @weli-l

Which values did you use for LogLevel and LogSize in ProgramOptions? And can you provide a minimal reproducer?

@weli-l
Copy link
Author

weli-l commented Aug 31, 2024

Hi @weli-l

Which values did you use for LogLevel and LogSize in ProgramOptions? And can you provide a minimal reproducer?

now I use v0.16.0 , perhaps there is no need to pass these two parameters,When I use v0.5.0, I set LogLevel to 1 and LogSize to 1280000, and I test it in a program, there is no minimal reproducer

@mozillazg
Copy link

@weli-l You can use this way to get the whole verifier error:

        var ve *ebpf.VerifierError
	if errors.As(err, &ve) {
		// Using %+v will print the whole verifier error, not just the last
		// few lines.
		log.Fatalf("Verifier error: %+v", ve)
	}

@weli-l
Copy link
Author

weli-l commented Aug 31, 2024

@weli-l You can use this way to get the whole verifier error:

        var ve *ebpf.VerifierError
	if errors.As(err, &ve) {
		// Using %+v will print the whole verifier error, not just the last
		// few lines.
		log.Fatalf("Verifier error: %+v", ve)
	}

Thanks! It works. There are more logs printed, but many logs are still omitted, making it impossible to support the location problem. In addition, I use this method

return fmt.Errorf("bpf Load failed: %-100v", ve)

It also cannot print the last 100 lines, and it still prints from the beginning of the log

@ti-mo
Copy link
Collaborator

ti-mo commented Sep 5, 2024

We shipped some changes to verifier log handling in 1.16, but there was no intended breakage. I didn't vet the changes thoroughly, and this is a very thorny piece of code by nature, so it's possible for there to be a regression.

In 1.16, LogSize is a no-op, the library now tries to probe the required buffer for the log. The LogSize field was removed on the dev branch right after the 1.16 release to avoid breaking API.

@weli-l On 1.16, please try to revert the following commits in the given order:

218b9f9
f2b2f6d

Printing with %+v should always print the full log if VerifierError.Truncated is false (this means we got the full log). Note that with these commits removed, you'll have to set LogSize to an appropriately-large value to fit your log, otherwise the end will be missing. Thanks!

@ti-mo ti-mo changed the title log has been omitted Incomplete verifier log in 1.16 Sep 5, 2024
@mtardy
Copy link
Member

mtardy commented Sep 12, 2024

Supporting this issue: using 1.16, I had a similar situation where I couldn't see the verifier output for the subprogram and couldn't link the func#number to its name. And it would make sense since the subprogram verification should have happened early and be at the beginning of the logs.

@ti-mo
Copy link
Collaborator

ti-mo commented Oct 10, 2024

I've taken a look at the new verifier log handling and nothing sticks out as a potential cause for this. The info provided here is incredibly limited, so I don't have much to go on. I've played around with the verifier log tests on 5.10, 5.4 and 4.19 and things behave as expected.

However, it looks like the initial post has bad address: in the message, which corresponds to EFAULT, which is sprinkled generously around linux/kernel/bpf/verifier.c and looks like it's exclusively used for signaling verifier bugs. I suspect you've (both) hit a spot in the verifier that makes it return EFAULT instead of ENOSPC, without checking the size of the log buffer.

Without a reproducer, this is going to be impossible for me to troubleshoot, so please try to get me an ELF that triggers the bug. This may need to be fixed upstream, but making this waterproof in the future may be non-trivial.

I'll consider reinstating the ProgramOptions.LogSize field and treat it as the starting size of the buffer, so at least when this happens again in the future, the caller has a way of manually sizing the buffer to obtain the full log so we can confirm it's a verifier bug.

@ti-mo ti-mo added the release-blocker Needs resolution before the next release label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working release-blocker Needs resolution before the next release
Projects
None yet
Development

No branches or pull requests

5 participants