feat: push prometheus metrics to pushgateway on-demand #1404

Ekleog · 2025-01-20T01:12:27Z

aawsome

Sorry for the late review.. From my side this is almost ready for merging, I just left a view comments.

General question: What do you think about which env variables are used? The "standard" is to use RUSTIC_XXX env variable for a xxx option. We have the pushgateway_url option and we have (only) env options for PUSHGATEWAY_USER and PUSHGATEWAY_PASS - I personally would have the later 2 also in the config and then we should decide about the env names. Or use RUSTIC_PUSHGATEWAY_* (by clap) if present and the PUSHGATEWAY_* as alternatives if they are commonly used variables for the pushgateway?

src/commands/backup.rs

src/prometheus.rs

Ekleog · 2025-01-27T21:34:03Z

Thank you for your review!

About the authentication environment variables, I actually thought about it. My argument against having CLI flags with user/pass is, it's essentially insecure to use this way, because any process on your machine can then read the CLI flags, and thus bypass auth.

Especially when Pushgateway is likely often deployed on the same host it's keeping metrics for (and firewalled out of the internet), this seems to me like having its credentials in the CLI would basically completely kill any security the credentials would bring.

I just renamed them RUSTIC_* for consistency with clap.

Also, I would not at all be opposed to having these as a parameter in the config file, but TBH I have absolutely no idea how to actually do it without also exposing a clap argument — I haven't dived deep enough in rustic's config handling yet.

WDYT?

aawsome · 2025-02-06T22:30:27Z

Hi @Ekleog - sorry for the late reply...

I overworked this PR a bit:

now uses the crate prometheus_push which allows to fine-tune the used reqwest. Especially it allows to build without openssl which may be a blocker for building on some targets.
I moved most of the options to GlobalOptions as I think prometheus metrics may be also interesting for other commands in future (like forget, prune, maybe check). This also includes moving the push code as a method of this struct.
Regarding the user/password discussion: We already have the --password option which is also not advisable to use. I think we should be consistent here to also have the pushgateway user/pass as CLI arguments.

Can you please test if this still works with my changes?

If you are fine with what I was doing, I think the only open points are to adapt unit tests and docu / config examples accordingly.

Ekleog · 2025-02-07T00:08:53Z

Hmm… for prometheus-push, are you really sure you need it for one of your supported targets? It seems to be very young, especially compared to the main prometheus crate; and a target with zero native-tls-supported target sounds specific enough to likely not be supported by rustic anyway… I think?

As for your other changes, they definitely make sense to me! I'll review and re-test as soon as you confirm where you want to go wrt. prometheus-push. I personally would not be willing to accept, in a project I maintain, a dependency with a first release 12 months ago, a last release 8 months ago, and that started for some reason at version 0.2.1, all that with less than 10k downloads. But that's your choice, so just let me know your decision and I'll follow! :)

Ekleog · 2025-02-07T00:09:36Z

Also, no worries at all about the reply delay, and thank you for your very involved answer!

aawsome · 2025-02-07T13:52:45Z

Thanks for the discussion @Ekleog

I changed it now by copy&pasting few lines from the prometheus codebase which allows to use a self-defined reqwest without any additional dependency.

Hmm… for prometheus-push, are you really sure you need it for one of your supported targets? It seems to be very young, especially compared to the main prometheus crate; and a target with zero native-tls-supported target sounds specific enough to likely not be supported by rustic anyway… I think?

Ok, in fact it is more about the need to have something like libssl-dev on the system compiling rustic - and the runtime library then needed to run it. This is all source of potential trouble. And when it come to cross-compiling, things may even go wild..

aawsome · 2025-02-07T14:00:25Z

@Ekleog

One thing regarding the job name: In case we want to monitor, lets say, backup and prune both with their own metric, would a use want to have identical job names here (like rustic-backup/rustic-prune) or would a common name be used (rustic)?
Currently the job name is a global option, if it may differ, it should be rather moved to the command...

And about labels: This is currently done such that users can give labels on the global level and addtional labels on the backup (and even specific to snapshots defined in the config profile) level. I hope this is general enough to suit everybody's need...

Ekleog · 2025-02-07T14:12:23Z

The changes make sense to me!

For the job name, I think (though I haven't checked) that the push gateway will delete old metrics upon getting new metrics with the same job, even if the old metrics was not the same.

So if people want to run backup then prune with no interval, but keep seeing the metrics until the next run of each, we'll need to have a different job name per command.

This being said it could still make sense to have it be in globaloptions. Thinking more, I'm actually thinking that we should not have it have any default value: forcing the user to configure it would solve all the questions, and it'd avoid users with multiple backup runs erasing their own metrics by default.

Does that make sense?

Ekleog · 2025-02-07T14:15:11Z

(as for the global/extra labels, TBH I'm thinking labels should be defined as CLI arguments usually because people may have multiple backup/forget jobs that'd want different labels, so I'd say I'm opinionless on this one)

aawsome · 2025-02-07T22:21:44Z

@Ekleog about the CLI-options:
Of course everything can be given as CLI options, however the question is about how to enable it in the config profile such that users can only run rustic backup or rustic prune with everything already defined. So, I moved the job name to the backup options and did some finalizing including the unit tests and docu, see the full.toml or the Readme for an example how a config profile may look like.

I just encountered one (hopefully last) thing: You set the label path to the path, but in labels no / is allowed, so this can produce errors in most cases. Any idea how we should handle/escape the /?

Also, can you please test if everything is working as expected? I didn't setup a prometheus or pushgateway, so I can't really test it.. ;-)

Ekleog · 2025-02-08T00:17:18Z

I'm not sure I get what you mean by "no / allowed"? Prometheus label values can be any Unicode character

I'll test it out hopefully by the end of weekend, I have a lot of things to do right now unfortunately! I'll make sure to report here as soon as I have tested it 😄

Ekleog · 2025-02-08T00:19:31Z

Also, thank you very much for all your involvement in this PR! Sorry I've been underwater to answer you over the last few days, I'll do my best to give you a proper test asap ❤️

aawsome · 2025-02-10T07:42:10Z

@Ekleog I changed the labels to use base64 encoding as described in https://github.com/prometheus/pushgateway/blob/master/README.md.

One thing: There are quite some default labels set like paths, uid, gid - and those can't be unset by users. I would actually prefer using a minimal set of labels - especially as users can always define their own labels. Maybe even use no label at all or restrict them to the ones which can be used for grouping: hostname, label (the rustic label) and paths. WDYT?

Also would be nice to have a tests against a running prometheus pushclient before we merge this.

Ekleog · 2025-02-10T15:01:03Z

That's a great idea, to avoid issues related to eg. quotes in paths or other labels' values!

As for limiting the default labels, I can confirm that hostname+label+paths would be enough for my personal use case! I actually only need an id label, that I would define myself on the backup, so 🤷

Just for the test against a running prometheus pushclient, do you mean a cargo unit test (which seems hard considering it'd require a running pushclient), or a CI job that'd run rustic once and confirm there are expected metrics?

aawsome · 2025-02-10T21:57:06Z

As for limiting the default labels, I can confirm that hostname+label+paths would be enough for my personal use case! I actually only need an id label, that I would define myself on the backup, so 🤷

I restricted the default labels to hostname+label+paths+tags.

Just for the test against a running prometheus pushclient, do you mean a cargo unit test (which seems hard considering it'd require a running pushclient), or a CI job that'd run rustic once and confirm there are expected metrics?

I added a unit test. I think the only thing missing is just one manual test to verify that the format is really what pushclient can work with...

aawsome · 2025-02-14T19:00:05Z

I installed a pushgateway and tested myself - needed a few corrections and added debug and a better testcase.

@Ekleog Only one thing I stumbled over: We are using Gauge for all values. I don't have experience with prometheus, but it seems there are other metric types. I wonder if we really want to use Gauge for all values or if we might differentiate. E.g. the dates are now saved as seconds since 1970, if I am correct - doesn't prometheus support a datetime metric?

Ekleog

Oh wow, I'm impressed by how motivated you are on this PR, thank you for all the work you put in here!

For the metrics, here are the supported Prometheus metric types. In this case, I think gauge would be the right choice for all the metrics — because it's from a batch job, there's no meaningful "reset to 0" semantics for counters :)

Also, I did try switching my production system to the current tip of this PR (a65f2e4).

Everything seems to be working fine for me! 🎉 Though I still need to adjust the exact labels now that everything's going through the pushgateway, which broke my dashboards and alerts — hopefully I'll manage to get something working again soon, but it should not be an issue considering all metrics seem to properly have entered prometheus :)

I'll be very happy to switch back to a released version of rustic as soon as it lands 😄

Ekleog · 2025-02-16T20:55:23Z

src/config.rs

+            .iter()
+            .map(|x| (x.0.as_str(), x.1.as_str()))
+            .chain(extra_labels.iter().map(|(a, b)| (a.as_str(), b.as_str())))
+            .collect();


Hmm… I think this order will lead to weird override mechanisms? Like it'd be, from least to most strongly binding:

Global labels

Hardcoded labels (paths, etc.)

Backup-specific labels

It might be worth not adding the labels here, but only taking them from the function call, thus letting the caller deal with putting them with the right priority?

Ekleog · 2025-02-16T20:55:49Z

src/commands/backup.rs

+            if let Err(err) = publish_metrics(&snap, self.prometheus_job, self.prometheus_labels) {
+                warn!("error pushing prometheus metrics: {err}");
+            }
+        }


Should there be an error if prometheus was configured but the feature was not enabled at compile-time?

Ekleog · 2025-02-16T23:26:16Z

(I added some comments in a review to not touch the code you had written, but I'd be happy to fix them myself if you confirm they sound reasonable to you, next time I find some time to work on my personal backup system! :))

Ekleog · 2025-02-20T21:56:33Z

Ok, I can confirm that a few days after, all my metrics dashboards are adjusted and look the same as before; and backups seem to be working without issues either!

Just let me know if you want me to handle the comments I posted, I don't want to disrupt your workflow by pushing to the branch at an unexpected time 😄

feat: push prometheus metrics to pushgateway on-demand

cd9085d

Ekleog mentioned this pull request Jan 20, 2025

JSON output as prometheus metric format? #1403

Open

aawsome requested changes Jan 26, 2025

View reviewed changes

src/commands/backup.rs Outdated Show resolved Hide resolved

src/prometheus.rs Outdated Show resolved Hide resolved

src/prometheus.rs Outdated Show resolved Hide resolved

chore: handle review comments

a02cf39

restructure and use prometheus_push

d83136d

refactor without http-auth-basic and prometheus-push

d0561f7

finalize

0f3d8ee

use show config snapshot with empty rustic.toml

13c2c35

use base64 encoding

298335f

adjust labels and add test

bda590c

finalize

a65f2e4

Ekleog commented Feb 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: push prometheus metrics to pushgateway on-demand #1404

feat: push prometheus metrics to pushgateway on-demand #1404

Ekleog commented Jan 20, 2025

aawsome left a comment

Ekleog commented Jan 27, 2025

aawsome commented Feb 6, 2025 •

edited

Loading

Ekleog commented Feb 7, 2025

Ekleog commented Feb 7, 2025

aawsome commented Feb 7, 2025

aawsome commented Feb 7, 2025

Ekleog commented Feb 7, 2025

Ekleog commented Feb 7, 2025

aawsome commented Feb 7, 2025 •

edited

Loading

Ekleog commented Feb 8, 2025

Ekleog commented Feb 8, 2025

aawsome commented Feb 10, 2025

Ekleog commented Feb 10, 2025

aawsome commented Feb 10, 2025

aawsome commented Feb 14, 2025

Ekleog left a comment

Ekleog Feb 16, 2025

Ekleog Feb 16, 2025

Ekleog commented Feb 16, 2025

Ekleog commented Feb 20, 2025

feat: push prometheus metrics to pushgateway on-demand #1404

Are you sure you want to change the base?

feat: push prometheus metrics to pushgateway on-demand #1404

Conversation

Ekleog commented Jan 20, 2025

aawsome left a comment

Choose a reason for hiding this comment

Ekleog commented Jan 27, 2025

aawsome commented Feb 6, 2025 • edited Loading

Ekleog commented Feb 7, 2025

Ekleog commented Feb 7, 2025

aawsome commented Feb 7, 2025

aawsome commented Feb 7, 2025

Ekleog commented Feb 7, 2025

Ekleog commented Feb 7, 2025

aawsome commented Feb 7, 2025 • edited Loading

Ekleog commented Feb 8, 2025

Ekleog commented Feb 8, 2025

aawsome commented Feb 10, 2025

Ekleog commented Feb 10, 2025

aawsome commented Feb 10, 2025

aawsome commented Feb 14, 2025

Ekleog left a comment

Choose a reason for hiding this comment

Ekleog Feb 16, 2025

Choose a reason for hiding this comment

Ekleog Feb 16, 2025

Choose a reason for hiding this comment

Ekleog commented Feb 16, 2025

Ekleog commented Feb 20, 2025

aawsome commented Feb 6, 2025 •

edited

Loading

aawsome commented Feb 7, 2025 •

edited

Loading