-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add flags to control telemetry behaviors #25849
Conversation
a99774b
to
3753cff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but I'll wait for Praveen to look. I'm ok with having it disabled on features, as long as we're sure any actual public builds and Docker images has it enabled.
influxdb3/Cargo.toml
Outdated
@@ -97,3 +97,4 @@ test_helpers.workspace = true | |||
tonic.workspace = true | |||
tower.workspace = true | |||
test-log.workspace = true | |||
predicates = "3.1.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this? We generally want all our dependencies in the workspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a library used by the assert_cmd
crate to assert on CLI stdout with code that looks like this (snipped from this PR, there are other examples where the inverse logic is applied with the .not()
method):
// validate we get a debug message indicating upload disabled
AssertCmd::cargo_bin("influxdb3")
.unwrap()
.args(serve_args)
.arg("-vv")
.arg("--disable-telemetry-upload")
.timeout(std::time::Duration::from_millis(200))
.assert()
.failure()
.stdout(predicates::str::contains(expected_disabled));
I added it using cargo add --dev -p ...
and didn't noticed that it wasn't added to the workspace -- I'll do that now!
Okay, I'll take a look at adding that after I get back from a late lunch! |
9dbf454
to
f6728fc
Compare
Okay I've added the telemetry opt-in feature flag and verified that it works as expected with a local build: % cargo build --workspace --features "influxdb3/release_default"
Compiling influxdb3 v0.1.0 (/home/wayne/projects/github.com/influxdata/influxdb/influxdb3)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 5.60s
% ./target/debug/influxdb3 serve -vv --host-id="meowmeowmeow" --http-bind='0.0.0.0:8086' --object-store=file --data-dir $PWD/tmp-data-dir_no-license_1
2025-01-17T01:07:41.498496Z INFO influxdb3::commands::serve: InfluxDB 3 Core server starting writer_id=meowmeowmeow git_hash=v2.5.0-14272-gb8a94488b5693503731ec01c3e5b5a47255df516-dirty version=0.1.0 uuid=01f21a8d-6bef-4cf4-9b84-33c443c8d7eb num_cpus=12
2025-01-17T01:07:41.498551Z DEBUG influxdb3::commands::serve: build configuration build_malloc_conf=
2025-01-17T01:07:41.498760Z INFO influxdb3_clap_blocks::object_store: Object Store db_dir="/home/wayne/projects/github.com/influxdata/influxdb/tmp-data-dir_no-license_1" object_store_type="Directory"
2025-01-17T01:07:41.499037Z INFO influxdb3::commands::serve: Creating shared query executor num_threads=12
2025-01-17T01:07:41.502096Z DEBUG datafusion_execution::memory_pool::pool: Created new GreedyMemoryPool(pool_size=8589934592)
2025-01-17T01:07:41.523637Z INFO influxdb3::commands::serve: catalog initialized instance_id="eda38d95-df71-4127-8383-dc757ff037d7"
2025-01-17T01:07:41.525658Z DEBUG influxdb3_wal::object_store: >>> replaying
2025-01-17T01:07:41.526019Z INFO influxdb3::commands::serve: setting up background mem check for query buffer
2025-01-17T01:07:41.526079Z DEBUG influxdb3::commands::serve: setting up background buffer checker mem_threshold_bytes=23298950349
2025-01-17T01:07:41.526174Z INFO influxdb3::commands::serve: setting up telemetry store
2025-01-17T01:07:41.526266Z DEBUG influxdb3::commands::serve: Initializing TelemetryStore with upload enabled for https://telemetry.v3.influxdata.com. And without the feature flag:
|
Hmmm, maybe it's too confusing to use both CLI options and build flags for this since we would have to enable the feature flag for the CLI options to have any effect 🤔 I think I'll go ahead and remove the feature flag. |
f6728fc
to
8e8d446
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall 👍 - just left couple of clarifications queries
.arg("--disable-telemetry-upload") | ||
.timeout(std::time::Duration::from_millis(500)) | ||
.assert() | ||
.failure() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it failure()
because process starts and hangs around till 500 millis and then shuts down? Just wondered if it could've used the TestServer
to spin up the server and got hold of the stdout for your assertions. If you'd already looked into it and it didn't work for you that's fine, but it has the mechanism built in to kill the process after the test without timeout
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it failure() because process starts and hangs around till 500 millis and then shuts down?
I added the failure()
call just to be clear in this test case that we expect a non-zero exit code when the process is killed using the timeout
method. It's not strictly necessary for this test case (we only really care about checking the stdout), just me trying to be explicit.
Just wondered if it could've used the TestServer to spin up the server and got hold of the stdout for your assertions.
I did look through the TestSever
code a little, and it looked to me like it was intended more for API-level integration testing than what I'm doing here, which is asserting on the CLI's stdout. Looking at it again, it's still not obvious how to get the stdout from it without modifying it non-trivially to preserve stdout.
it has the mechanism built in to kill the process after the test without timeout.
Yeah, I see the it has the mechanism built in to kill the process after the test without timeout using the kill
method, but that would require additional boilerplate to have the test case sleep 500 ms before killing it (assuming we also added more code to preserve and retrieve the TestServer
stdout).
To be clear, the timeout
method isn't used to kill the process after the test -- the test happens after we've killed the process. We kill it after a constant 500 ms because that's what I thought would reliably be enough time to reach the stdout debug messages that are being asserted on (a too small timeout could cause this test to be flaky since we would be less likely to reach that debug message). I know it's not ideal asserting on stdout to test whether the telemetry store is running in but I think the only other option would be to actually set up a listening http server and point it at that, and test that something reached it from the TelemetryStore
(or not as the test case might call for), but that seemed like it might no be worth the additional effort...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think the test happens after killing the process as well with the timeout - just wasn't sure if there was easy wins in extending TestServer
.
influxdb3/tests/server/cli.rs
Outdated
.timeout(std::time::Duration::from_millis(500)) | ||
.assert() | ||
.failure() | ||
.stdout(predicates::str::contains(expected_disabled).not()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using predicates
is fine but I've been following the existing pattern of using assert_contains!
and assert_not_contains!
macros from test_helpers
just to keep it consistent. If those macros are lacking in functionality for your use case, definitely keep predicates
in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I now see the run*
functions used elsewhere in this file and how they are using those assertions you mentioned. But those test cases seem to be running short-lived subcommands rather than the long-lived serve
subcommand. I like the assert_cmd
abstractions for their succinctness in a case like this -- particularly the timeout
method to avoid extra boilerplate around waiting a fixed time before killing the process. Using predicates
just seemed a natural extension of using the assert_cmd
crate, which I noticed was already being used here for its cargo_bin
command constructor. I'll look into using the other assertions though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea - it'd have had to wait till server started fully. I think we already loop through in TestServer
till it hits the last line to capture the address. So, you could just extend TestServer
to hold all the lines it's seen during the startup and access it in the tests. When TestServer::spawn().await
completes, that means server is fully up. Then you can just loop through and check the line you wanted to assert is there.
See if that works, if not go ahead and merge this PR - the point about TestServer
is not really critical, I just wanted to check if you'd seen it and still thought it's not fit for your test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the value in adding more code to TestServer
when all we need to accomplish the task at hand is std::process::Command
? I see the specific value of TestServer
where it's used -- it abstracts over a running server while executing API-driven test cases and displaying logs in the test case if TEST_LOG
is set. That makes sense. But it's super heavy-handed when all that's necessary for these new test cases is std::process::Command
. Is it just the assert_cmd
helpers that you don't like? It would still be simpler to remove those (then add more code to TestServer
that is), but the test case code becomes more verbose in doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, it's the timeouts that stood out. The initial thought was TestServer
already abstracts the server starting part already and it also hooks into server logs and then the assertions stood out as well. But as I said, both aren't critical they're part of test infra, we can ship it.
@@ -60,7 +60,7 @@ pub const DEFAULT_DATA_DIRECTORY_NAME: &str = ".influxdb3"; | |||
/// The default bind address for the HTTP API. | |||
pub const DEFAULT_HTTP_BIND_ADDR: &str = "0.0.0.0:8181"; | |||
|
|||
pub const DEFAULT_TELMETRY_ENDPOINT: &str = "https://telemetry.v3.influxdata.com"; | |||
pub const DEFAULT_TELEMETRY_ENDPOINT: &str = "https://telemetry.v3.influxdata.com"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
8e8d446
to
3328494
Compare
This PR is meant to address #25813 by providing two new options for the
influxdb3 serve
subcommand:--disable-telemetry-upload
--telemetry-endpoint
I also considered including some kind of Rust feature flag that would be off by default but which we could enable for release builds; this feature flag would give us local dev builds of the server with telemetry uploads disabled for local testing to prevent unknowingly pushing bogus telemetry data, but I wanted to get the team's input before putting that effort in.
I think this hypothetical off-by-default feature flag is useful in addition to the CLI flag because the latter requires us to explicitly do something to turn telemetry off in dev environments whereas the former doesn't.