Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics-scraper: SQLite usage is not thread safe #9452

Open
foslage opened this issue Sep 10, 2024 · 2 comments
Open

Metrics-scraper: SQLite usage is not thread safe #9452

foslage opened this issue Sep 10, 2024 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@foslage
Copy link

foslage commented Sep 10, 2024

What happened?

When metrics-scraper runs UpdateDatabase() or CullDatabase() at the exact time a client is querying metrics it will receive a database is locked (5) (SQLITE_BUSY) error and abort.

This results in UpdateDatabase() or CullDatabase() not closing the transaction correctly. Any future attempts to open a (new) transaction will result in a SQL logic error: cannot start a transaction within a transaction (1) error.

This effectively renders metrics-scraper unusable and results in kubernetes-dashboard only showing outdated metrics or not showing metrics at all.

What did you expect to happen?

Expected it to handle multiple concurrent queries correctly.

How can we reproduce it (as minimally and precisely as possible)?

  1. Build metrics-scraper from current master (a12c809) and start it on a cluster with a few pods in it (my cluster had ~5 nodes and ~700 pods)
git clone https://github.com/kubernetes/dashboard.git kubernetes-dashboard
cd kubernetes-dashboard/modules/metrics-scraper
go build
# use metric resolution of 1 second to increase odds of error occuring
rm -f /tmp/metrics.db* && ./metrics-scraper --kubeconfig /path/to/kubeconfig --metric-resolution 1s
  1. Run bombardier or a similar tool to generate some metrics querying requests
git clone https://github.com/codesenberg/bombardier.git
cd bombardier
go build
./bombardier -c 1 -d 5s http://localhost:8000/api/v1/dashboard/namespaces/[path to some metric that actually exists]

Anything else we need to know?

A good solution would be to update database.go to close transactions correctly in case of errors.

A great solution would be to also make SQLite thread safe as describe in mattn/go-sqlite3 #209. This would allow clients to query metrics while UpdateDatabase() or CullDatabase() are active.

Until this bug is fixed a possible workaround is to use the --db-file parameter to get SQLite to use a shared cache. This can be done using the helmfile's values.yaml:

metricsScraper:
  containers:
    args:
      - --db-file
      - file:/tmp/metrics.db?cache=shared

What browsers are you seeing the problem on?

Chrome, Safari, Microsoft Edge, Firefox, Others

Kubernetes Dashboard version

7.5.0

Kubernetes version

v1.30.3

Dev environment

$ go version
go version go1.22.6 linux/amd64
$ node --version
v22.4.1

@foslage foslage added the kind/bug Categorizes issue or PR as related to a bug. label Sep 10, 2024
@BraceCY
Copy link

BraceCY commented Nov 26, 2024

We have meet this issue too,this issue is expected to be fixed.
@floreks

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants