Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable the profiling, goroutines, metrics endpoints and signal handler as soon as possible #59087

Open
YangKeao opened this issue Jan 21, 2025 · 3 comments · May be fixed by #59122
Open

Enable the profiling, goroutines, metrics endpoints and signal handler as soon as possible #59087

YangKeao opened this issue Jan 21, 2025 · 3 comments · May be fixed by #59122
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@YangKeao
Copy link
Member

YangKeao commented Jan 21, 2025

Enhancement

For now, during the TiDB boot process, the observability infrastructures are not started, which makes it pretty hard for us to investigate any issue before starting the server. I think the following ones are especially important and helpful for debug:

  1. /debug/pprof, especially /debug/pprof/profile and /debug/pprof/goroutine. If the TiDB is blocked in the boot stage, get the backtrace of all goroutines and help us to understand where it is blocked.
  2. /metrics. Without it, the prometheus cannot scrape the metrics of the booting TiDB server.
  3. SIGUSR1 signal handler. After starting the server, the TiDB server will handle SIGUSR1 signal by printing the backtrace of all goroutines. Without it, all other signals which will print the backtrace like SIGQUIT will kill the process.

Current behavior

All of them are started after creating storage, dom, server, etc.

storage, dom := createStoreDDLOwnerMgrAndDomain(keyspaceName)
repository.SetupRepository(dom)
svr := createServer(storage, dom)

exited := make(chan struct{})
signal.SetupSignalHandler(func() {
	svr.Close()
	resourcemanager.InstanceResourceManager.Stop()
	cleanup(svr, storage, dom)
	cpuprofile.StopCPUProfiler()
	executor.Stop()
	close(exited)
})
topsql.SetupTopSQL(svr)
terror.MustNil(svr.Run(dom))

The signal handler is setup in signal.SetupSignalHandler. The status server is setup in svr.Run(dom).

Change

To overcome this inconvenience, I propose to make the following changes:

  1. Create a temporary status server including metrics and /debug/pprof before svr.Run(dom), and stop it before creating a new fully functional status server.
  2. Setup the signal handler of SIGUSR1 earlier.
@YangKeao YangKeao added the type/enhancement The issue or PR belongs to an enhancement. label Jan 21, 2025
@D3Hunter
Copy link
Contributor

change 1 already covered the functions of 2

@YangKeao
Copy link
Member Author

change 1 already covered the functions of 2

Yes. I treat it as "better than nothing" 🤦 .

@tiancaiamao
Copy link
Contributor

If http /debug is available, SIGUSR1 is not a must because we can curl http://xxx:10080/debug/pprof/goroutine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
3 participants