Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No data and generic rules not created #1217

Open
idrikay opened this issue Jul 17, 2024 · 7 comments
Open

No data and generic rules not created #1217

idrikay opened this issue Jul 17, 2024 · 7 comments

Comments

@idrikay
Copy link

idrikay commented Jul 17, 2024

I have tried to deploy Pyrra with both manifests and helm chart. Both ways fail to create generic rules. I also get no data in requests or errors.

  • I see the service monitor in Prometheus
Screenshot 2024-07-17 at 12 43 58
  • Using the helm chart I have added the genericRules key-value in the config without success. No metrics with pyrra prefix are created
genericRules:
  enabled: true
Screenshot 2024-07-17 at 12 23 46 Screenshot 2024-07-17 at 12 42 17
  • I sometimes see the request graph for a couple of minutes but data disappears on refresh.
Screenshot 2024-07-17 at 12 38 02
  • Here is the rule file:
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: coredns-response-errors
  namespace: monitoring
spec:
  description: ""
  indicator:
    ratio:
      errors:
        metric: coredns_dns_responses_total{job="coredns",rcode="SERVFAIL"}
      total:
        metric: coredns_dns_responses_total{job="coredns"}
  target: "99.99"
  window: 2w
@vidomas
Copy link

vidomas commented Oct 28, 2024

Solved same issue by adding "release: prometheus-community" label to my ServiceLevelObjective.

@sebastiangaiser
Copy link
Contributor

Having a similar problem that in some clusters metrics for each slo resource getting created, in others not. For me this is independent from the label. I'm also using genericRules.enabled: true from the Helm chart.

@vidomas
Copy link

vidomas commented Oct 29, 2024

In my case prometheus is provisioned by operator (kube-prometheus-stack helm chart) and Prometheus CRD has rule selector based on labels

spec:
  ruleSelector:
    matchLabels:
      release: prometheus-community

@sebastiangaiser
Copy link
Contributor

@vidomas can you check which metrics get produced by Pyrra?
Using the matchLabels makes totally sense for your deployment of kube-prometheus-stack in order to pick-up the generated PrometheusRules. But the original issue is about that no metrics getting produced for a/all ServiceLevelObjectives.

@sebastiangaiser
Copy link
Contributor

I found my error when looking trough the code:

if len(o.Indicator.BoolGauge.Grouping) > 0 {

bool_gauge and grouping is not supposed to work.

@mtthwcmpbll
Copy link

I'm curious if the original author ever sorted out their issue here. I'm seeing the same thing testing out Pyrra, and I'm deploying the example SLO from the Pyrra repo:

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: pyrra-connect-errors
  namespace: monitoring
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  target: '99'
  window: 2w
  description: Pyrra serves API requests with connect-go either via gRPC or HTTP.
  indicator:
    ratio:
      errors:
        metric: connect_server_requests_total{job="pyrra",code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss"}
      total:
        metric: connect_server_requests_total{job="pyrra"}

I've removed the grouping since the docs on generic rules say those aren't supported, and I've verified that I'm getting other metrics from the pyrra-kubernetes pod scraped and available such as controller_runtime_reconcile_time_seconds_bucket, etc. When I hit the pod's /metrics endpoint manually, I don't see any metrics that start with pyrra_.

@mtthwcmpbll
Copy link

Actually, I think I found the issue on my end. I'd assumed for some reason that the generic metrics having a generic name like pyrra_availability meant that these would be metrics published by one of the pods' /metrics endpoints and was surprised to see no changes there. After making sure that I didn't have any grouping in my SLO and comparing the generated PrometheusRules with and without the --generic-rules flag, I see that the generic rules are included in the PrometheusRule.

I tracked down some errors in our Mimir cluster where the limits needed to be raised so it would stop rejecting metrics from Pyrra including max_label_names_per_series and ruler_max_rule_groups_per_tenant which were too low.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants