Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric: add database and table name to handle_query_duration #38261

Closed
wants to merge 2,893 commits into from
Closed

metric: add database and table name to handle_query_duration #38261

wants to merge 2,893 commits into from

Conversation

noucas
Copy link

@noucas noucas commented Sep 29, 2022

What problem does this PR solve?

Issue Number: close #37892

Problem Summary:

What is changed and how it works?

Add additional information about database name and table to the current metric tidb_server_handle_query_duration_seconds

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Sep 29, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • dveeden

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@sre-bot
Copy link
Contributor

sre-bot commented Sep 29, 2022

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 29, 2022
@ti-chi-bot
Copy link
Member

Welcome @noucas!

It looks like this is your first PR to pingcap/tidb 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to pingcap/tidb. 😃

@noucas noucas changed the title Add database name and table name to handle_query_duration metric metric: ddd database name and table name to handle_query_duration Sep 29, 2022
@noucas noucas changed the title metric: ddd database name and table name to handle_query_duration metric: add database name and table name to handle_query_duration Sep 29, 2022
@noucas noucas changed the title metric: add database name and table name to handle_query_duration metric: add database and table name to handle_query_duration Sep 29, 2022
@dveeden
Copy link
Contributor

dveeden commented Sep 30, 2022

/cc @Defined2014 @dveeden

@dveeden
Copy link
Contributor

dveeden commented Sep 30, 2022

/check-issue-triage-complete

@dveeden
Copy link
Contributor

dveeden commented Sep 30, 2022

@noucas Could you sign the CLA? Could you also give some details (output, screenshots, etc.) about the testing that you did?

@dveeden
Copy link
Contributor

dveeden commented Sep 30, 2022

image

@dveeden
Copy link
Contributor

dveeden commented Sep 30, 2022

Would this impact any of the grafana dashboards? Would it make sense to update the grafana dashboards after this has been merged?

@dveeden
Copy link
Contributor

dveeden commented Sep 30, 2022

Would the increase in load/data be an issue for prometheus?

server/conn.go Outdated
@@ -133,20 +133,6 @@ var (
mysql.ComSetOption: metrics.QueryTotalCounter.WithLabelValues("SetOption", "Error"),
}

queryDurationHistogramUse = metrics.QueryDurationHistogram.WithLabelValues("Use")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the removal of these lines intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm not active in this repo for a while, so I should miss some newer info for TiDB, but I can give some old info and background about this...

Prometheus's WithLabelValues is a slow operation, we can observe some cost in go-pprof when running the lightweight operation benchmark like "sysbench oltp read only", so it's the reason why it is pre-defined some const label value as a global variable here.

But in this PR, it wants to let the table name be a label, so it's impossible to pre-define as previously, letting the table name be a label is useful for maintaining, but it should observe some performance degradation in the benchmark like sysbench.

At last, it should take care of the cluster with huge numbers of tables, it's not a good practice to store the huge number of label values in Prometheus. (it can be ignored if TiDB has to be changed to some other better ts storage..)

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinalities (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

from https://prometheus.io/docs/practices/naming/#labels

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dveeden Yes, it is intentional. The reason to remove them is that they are fixed code with only 1 label of sqlType by default and it's okay because we only have a limited set of values for sqlType so hard code will work. But when we increase the labels with db and table, we have no way to hard coded these labels, it should be specified dynamically.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lysu Thanks for your comment so I understand why there so much hard code like this.

  1. Is it a good idea that we define a config flag to enable/disable the metric? So user will have the right to choose between them like what we can do with record-db-qps config?
  2. In case of we can not add the label table due to the high cardinalities problem, Is it okay that we still add label db?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having many tables is something that may happen with something like "<application...>" tables where there is one database, but each customer has its own set of tables with a certain prefix. This can be a wordpress hosting solution or something similar. The same can happen if instead of a table prefix a database is created for each customer. However with say 5 tables per customer and 2000 customers you may end up with 5×2000=10000 tables or 2000 databases. So doing this per database would improve the situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lysu Thanks for your comment so I understand why there so much hard code like this.

  1. Is it a good idea that we define a config flag to enable/disable the metric? So user will have the right to choose between them like what we can do with record-db-qps config?
  2. In case of we can not add the label table due to the high cardinalities problem, Is it okay that we still add label db?

Adding a configuration for it like record-db-qps looks good to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lysu Is it okay that we define a new config flag and if it is turn off, the metrics of handle_query_duration_seconds does not appear, and when it is turn on the metrics register with 3 labels {sqlType, dbName, tableName} ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to keep the original metrics as they used to be. There're many scripts of Grafana depending on it. It is better to add new metrics for the new feature. And the new config flag control the new metrics.

@dveeden
Copy link
Contributor

dveeden commented Sep 30, 2022

cc @lysu

@lysu lysu requested a review from bb7133 September 30, 2022 13:16
@noucas
Copy link
Author

noucas commented Oct 4, 2022

@bb7133 PTAL

@hawkingrei
Copy link
Member

/run-mysql-test

@hawkingrei
Copy link
Member

/run-build

@hawkingrei
Copy link
Member

@dveeden It is all green.

@bb7133
Copy link
Member

bb7133 commented Nov 1, 2022

PTAL @jackysp

@noucas noucas requested a review from a team as a code owner December 12, 2022 10:07
@ti-chi-bot ti-chi-bot removed the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 12, 2022
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. component/dumpling This is related to Dumpling of TiDB. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 11, 2023
@ti-chi-bot ti-chi-bot bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 11, 2023
Copy link

ti-chi-bot bot commented Nov 11, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dveeden

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the approved label Nov 11, 2023
@dveeden
Copy link
Contributor

dveeden commented Nov 12, 2023

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Nov 12, 2023
@dveeden
Copy link
Contributor

dveeden commented Nov 12, 2023

@noucas Looks like you tried to update/rebase this PR, but it seems like that didn't go as planned?

@dveeden
Copy link
Contributor

dveeden commented Nov 12, 2023

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 12, 2023
@noucas
Copy link
Author

noucas commented Nov 14, 2023

There is a merged PR that already added a new label to the metric tidb_server_handle_query_duration_seconds, we can close this PR.
@dveeden Thanks for your kind support to me from my reported issue to this PR. I am happy that my work can finally contribute a little to TiDB community and will keep doing in the future.

@dveeden dveeden closed this Nov 15, 2023
@noucas noucas deleted the add-database-and-table-name-handle_query_duration_seconds-metric branch November 16, 2023 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/dumpling This is related to Dumpling of TiDB. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding schema level information to tidb_server_handle_query_duration_seconds_sum