-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metric: add database and table name to handle_query_duration #38261
metric: add database and table name to handle_query_duration #38261
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Welcome @noucas! |
/cc @Defined2014 @dveeden |
/check-issue-triage-complete |
@noucas Could you sign the CLA? Could you also give some details (output, screenshots, etc.) about the testing that you did? |
Would this impact any of the grafana dashboards? Would it make sense to update the grafana dashboards after this has been merged? |
Would the increase in load/data be an issue for prometheus? |
server/conn.go
Outdated
@@ -133,20 +133,6 @@ var ( | |||
mysql.ComSetOption: metrics.QueryTotalCounter.WithLabelValues("SetOption", "Error"), | |||
} | |||
|
|||
queryDurationHistogramUse = metrics.QueryDurationHistogram.WithLabelValues("Use") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the removal of these lines intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'm not active in this repo for a while, so I should miss some newer info for TiDB, but I can give some old info and background about this...
Prometheus's WithLabelValues is a slow operation, we can observe some cost in go-pprof when running the lightweight operation benchmark like "sysbench oltp read only", so it's the reason why it is pre-defined some const label value as a global variable here.
But in this PR, it wants to let the table name be a label, so it's impossible to pre-define as previously, letting the table name be a label is useful for maintaining, but it should observe some performance degradation in the benchmark like sysbench.
At last, it should take care of the cluster with huge numbers of tables, it's not a good practice to store the huge number of label values in Prometheus. (it can be ignored if TiDB has to be changed to some other better ts storage..)
CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinalities (many different label values), such as user IDs, email addresses, or other unbounded sets of values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dveeden Yes, it is intentional. The reason to remove them is that they are fixed code with only 1 label of sqlType
by default and it's okay because we only have a limited set of values for sqlType
so hard code will work. But when we increase the labels with db
and table
, we have no way to hard coded these labels, it should be specified dynamically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lysu Thanks for your comment so I understand why there so much hard code like this.
- Is it a good idea that we define a config flag to enable/disable the metric? So user will have the right to choose between them like what we can do with
record-db-qps
config? - In case of we can not add the label
table
due to the high cardinalities problem, Is it okay that we still add labeldb
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having many tables is something that may happen with something like "<application...>" tables where there is one database, but each customer has its own set of tables with a certain prefix. This can be a wordpress hosting solution or something similar. The same can happen if instead of a table prefix a database is created for each customer. However with say 5 tables per customer and 2000 customers you may end up with 5×2000=10000 tables or 2000 databases. So doing this per database would improve the situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lysu Thanks for your comment so I understand why there so much hard code like this.
- Is it a good idea that we define a config flag to enable/disable the metric? So user will have the right to choose between them like what we can do with
record-db-qps
config?- In case of we can not add the label
table
due to the high cardinalities problem, Is it okay that we still add labeldb
?
Adding a configuration for it like record-db-qps
looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lysu Is it okay that we define a new config flag and if it is turn off, the metrics of handle_query_duration_seconds
does not appear, and when it is turn on the metrics register with 3 labels {sqlType, dbName, tableName} ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to keep the original metrics as they used to be. There're many scripts of Grafana depending on it. It is better to add new metrics for the new feature. And the new config flag control the new metrics.
cc @lysu |
@bb7133 PTAL |
/run-mysql-test |
/run-build |
@dveeden It is all green. |
PTAL @jackysp |
…uration_seconds-metric
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dveeden The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
@noucas Looks like you tried to update/rebase this PR, but it seems like that didn't go as planned? |
/hold |
There is a merged PR that already added a new label to the metric |
What problem does this PR solve?
Issue Number: close #37892
Problem Summary:
What is changed and how it works?
Add additional information about database name and table to the current metric tidb_server_handle_query_duration_seconds
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.