Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGMT-19840: Gather operational metrics from installercache #299

Open
wants to merge 2 commits into
base: MGMT-14453
Choose a base branch
from

Conversation

paul-maidment
Copy link
Owner

The intent of this PR is to trace the following statistics, implemented as counts and incremented from applicable parts of the solution.

counterDescriptionInstallerCachePrunedHardlink           "Counts the number of times the installercache pruned a hardlink for being too old"
counterDescriptionInstallerCacheGetReleaseOK             "Counts the number of times that a release was fetched succesfully"
counterDescriptionInstallerCacheGetReleaseTimeout        "Counts the number of times that a release timed out or had the context cancelled"
counterDescriptionInstallerCacheGetReleaseError          "Counts the number of times that a release fetch resulted in error"
counterDescriptionInstallerCacheReleaseCached            "Counts the number of times that a release was found in the cache"
counterDescriptionInstallerCacheReleaseExtracted         "Counts the number of times that a release was extracted"
counterDescriptionInstallerCacheTryEviction              "Counts the number of times that the eviction function was called"
counterDescriptionInstallerCacheReleaseEvicted           "Counts the number of times that a release was evicted"

This, combined with the event based metrics gathered in openshift#7156 should provide enough information to track the behaviour of the cache.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

@paul-maidment paul-maidment force-pushed the add-metrics-to-installercache branch from 678fab0 to 2e4c181 Compare January 30, 2025 21:39
@paul-maidment paul-maidment force-pushed the MGMT-14453 branch 28 times, most recently from a8ff664 to e95cd1d Compare February 5, 2025 21:11
@paul-maidment paul-maidment force-pushed the MGMT-14453 branch 2 times, most recently from 9e67c72 to 2fc7ba4 Compare February 5, 2025 23:12
…in the installer cache (openshift#7205)

This PR is for the purpose of resolving  multiple bugs within the installer cache, due to the poor condition of the current cache, it makes sense to fix this in a single PR.

* https://issues.redhat.com/browse/MGMT-14452
Installer cache removes in-used cached image when out of space
* https://issues.redhat.com/browse/MGMT-14453
INSTALLER_CACHE_CAPACITY small value cause to assisted-service crash
* https://issues.redhat.com/browse/MGMT-14457
Installer cache - fails to install when running parallel with same version
* Additionally, the cache did not respect limits, so this has been addressed here.

Fixes:

I have implemented fixes for each of the following issues.

* Mutex was ineffective as not instantiated corrctly, leading to [MGMT-14452](https://issues.redhat.com//browse/MGMT-14452), [MGMT-14453](https://issues.redhat.com//browse/MGMT-14453).
* Naming convention for hardlinks changed to be UUID based to resolve [MGMT-14457](https://issues.redhat.com//browse/MGMT-14457).
* Any time we either extract or use a release, the modified time must be updated, not only for cached releases. This was causing premature pruning of hardlinks.
* LRU cache order updated to be based on microseconds instead of seconds.
* Eviction checks updated to consider max release size and also cache threshold.
* We now check there is enough space before writing.
* During eviction - releases without hard links will be evicted before releases with hard links.
The intent of this PR is to trace the following statistics, implemented as counts and incremented from applicable parts of the solution.

	counterDescriptionInstallerCachePrunedHardlink           "Counts the number of times the installercache pruned a hardlink for being too old"
	counterDescriptionInstallerCacheGetReleaseOK             "Counts the number of times that a release was fetched succesfully"
	counterDescriptionInstallerCacheGetReleaseTimeout        "Counts the number of times that a release timed out or had the context cancelled"
	counterDescriptionInstallerCacheGetReleaseError          "Counts the number of times that a release fetch resulted in error"
	counterDescriptionInstallerCacheReleaseCached            "Counts the number of times that a release was found in the cache"
	counterDescriptionInstallerCacheReleaseExtracted         "Counts the number of times that a release was extracted"
	counterDescriptionInstallerCacheTryEviction              "Counts the number of times that the eviction function was called"
	counterDescriptionInstallerCacheReleaseEvicted           "Counts the number of times that a release was evicted"

This, combined with the event based metrics gathered in openshift#7156 should provide enough information to track the behaviour of the cache.
@paul-maidment paul-maidment force-pushed the add-metrics-to-installercache branch from 2e4c181 to f7b0e81 Compare February 6, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant