Add CRT error to CRT request logs, move CRT per-request logs to DEBUG #669

dannycjones · 2023-12-06T14:31:59Z

Description of change

At the moment, errors for individual requests made by the CRT client (inside of a meta request) can fail and we log that they failed, but only with the HTTP Status (or lack thereof as "-1").

For example, these three requests failed for different reasons.

WARNING request failed request_type=GetObject http_status=-1 range=None duration=3.048089489s ttfb=None request_id=<unknown>
WARNING request failed request_type=GetObject http_status=500 range=None duration=2.968672313s ttfb=Some(2.713102782s) request_id=0H2C1MFGVVR9ARMV
WARNING request failed request_type=GetObject http_status=-1 range=None duration=290.737317ms ttfb=None request_id=<unknown>

We have access to a CRT error code here, so we should surface that. These also log on success, so I've wrapped it in an Option<>.

I've also updated the message itself to be "CRT request" to try and highlight that these are inner requests made by the CRT that can be retried or be part of a much larger meta request.

An error message may look like this now:

WARNING CRT request failed request_type=GetObject crt_error=Some(Error(1048, "aws-c-io: AWS_IO_SOCKET_TIMEOUT, socket operation timed out.")) http_status=-1 range=None duration=3.126663123s ttfb=None request_id=<unknown>

I almost wonder if we should be logging these as warning, as it has caused some confusion around what has actually happened and if the data was really retrieved or the application failed to handle that error. For now, I'll leave them as warning.

Relevant issues: N/A

Does this change impact existing behavior?

No behavior change, logs contain new information. Message changes slightly.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Signed-off-by: Daniel Carl Jones <[email protected]>

dannycjones · 2023-12-06T14:32:09Z

FYI @IsaevIlya

passaro

The change looks good. We could go a bit further, but happy to do that in a separate PR if you prefer.

mountpoint-s3-client/src/s3_crt_client.rs

Signed-off-by: Daniel Carl Jones <[email protected]>

jamesbornholt · 2023-12-08T01:04:04Z

mountpoint-s3-client/src/s3_crt_client.rs

+                    "CRT request failed"
                } else {
-                    "request finished"
+                    "CRT request finished"
                };
-                event!(log_level, %request_type, http_status, ?range, ?duration, ?ttfb, %request_id, "{}", message);
-                trace!(detailed_metrics=?metrics, "request completed");
+                debug!(%request_type, ?crt_error, http_status, ?range, ?duration, ?ttfb, %request_id, "{}", message);
+                trace!(detailed_metrics=?metrics, "CRT request completed");


it's just bikeshedding but i don't think "CRT request" is the right way to describe these. it's an S3 request that failed. if we're going to push these down to debug level, i don't see the need to also write "CRT" here.

A side effect of awslabs#669 was that there's now no way to get request IDs for failed requests at the default logging settings, as only DEBUG-level messages include the request IDs. This change adds request IDs to the meta request failure message when available, so that these WARN-level messages still include request IDs. I also added some new infrastructure to test metrics and log messages. For metrics, we build a new `metrics::Recorder` that collects all the metrics and can then be searched to find them. For log messages, we build a `tracing_subscriber::Layer` that collects all tracing events emitted while enabled. In both cases, the new objects aren't thread safe, as both `Recorder`s and `Layer`s are global state. So these tests need to continue to use `rusty_fork` to split into a new process per test. Signed-off-by: James Bornholt <[email protected]>

A side effect of #669 was that there's now no way to get request IDs for failed requests at the default logging settings, as only DEBUG-level messages include the request IDs. This change adds request IDs to the meta request failure message when available, so that these WARN-level messages still include request IDs. I also added some new infrastructure to test metrics and log messages. For metrics, we build a new `metrics::Recorder` that collects all the metrics and can then be searched to find them. For log messages, we build a `tracing_subscriber::Layer` that collects all tracing events emitted while enabled. In both cases, the new objects aren't thread safe, as both `Recorder`s and `Layer`s are global state. So these tests need to continue to use `rusty_fork` to split into a new process per test. Signed-off-by: James Bornholt <[email protected]>

Add CRT error to underlying request log entries

d862417

Signed-off-by: Daniel Carl Jones <[email protected]>

dannycjones requested a review from passaro December 6, 2023 14:32

dannycjones had a problem deploying to PR integration tests December 6, 2023 14:32 — with GitHub Actions Failure

dannycjones temporarily deployed to PR integration tests December 6, 2023 14:32 — with GitHub Actions Inactive

dannycjones had a problem deploying to PR integration tests December 6, 2023 14:32 — with GitHub Actions Failure

dannycjones temporarily deployed to PR integration tests December 6, 2023 14:32 — with GitHub Actions Inactive

dannycjones had a problem deploying to PR integration tests December 6, 2023 14:32 — with GitHub Actions Failure

dannycjones temporarily deployed to PR integration tests December 6, 2023 14:32 — with GitHub Actions Inactive

dannycjones temporarily deployed to PR integration tests December 6, 2023 15:47 — with GitHub Actions Inactive

dannycjones had a problem deploying to PR integration tests December 6, 2023 15:47 — with GitHub Actions Failure

dannycjones temporarily deployed to PR integration tests December 6, 2023 15:47 — with GitHub Actions Inactive

dannycjones had a problem deploying to PR integration tests December 6, 2023 15:48 — with GitHub Actions Failure

dannycjones temporarily deployed to PR integration tests December 6, 2023 15:48 — with GitHub Actions Inactive

dannycjones had a problem deploying to PR integration tests December 6, 2023 15:48 — with GitHub Actions Failure

dannycjones temporarily deployed to PR integration tests December 6, 2023 15:48 — with GitHub Actions Inactive

passaro reviewed Dec 6, 2023

View reviewed changes

mountpoint-s3-client/src/s3_crt_client.rs Outdated Show resolved Hide resolved

mountpoint-s3-client/src/s3_crt_client.rs Outdated Show resolved Hide resolved

dannycjones temporarily deployed to PR integration tests December 6, 2023 17:57 — with GitHub Actions Inactive

dannycjones had a problem deploying to PR integration tests December 6, 2023 17:57 — with GitHub Actions Failure

dannycjones temporarily deployed to PR integration tests December 6, 2023 17:57 — with GitHub Actions Inactive

dannycjones had a problem deploying to PR integration tests December 6, 2023 17:57 — with GitHub Actions Failure

dannycjones temporarily deployed to PR integration tests December 6, 2023 17:57 — with GitHub Actions Inactive

dannycjones had a problem deploying to PR integration tests December 6, 2023 17:57 — with GitHub Actions Failure

dannycjones added 2 commits December 7, 2023 09:28

Merge branch 'main' into add-crt-err-underlying-req

7111118

Move all CRT request logging to debug even on error

0f8118c

Signed-off-by: Daniel Carl Jones <[email protected]>

dannycjones changed the title ~~Add CRT error to underlying request log entries~~ Add CRT error to CRT request logs, move CRT per-request logs to DEBUG Dec 7, 2023

dannycjones temporarily deployed to PR integration tests December 7, 2023 09:43 — with GitHub Actions Inactive

dannycjones had a problem deploying to PR integration tests December 7, 2023 09:43 — with GitHub Actions Failure

dannycjones temporarily deployed to PR integration tests December 7, 2023 09:43 — with GitHub Actions Inactive

dannycjones enabled auto-merge December 7, 2023 09:44

dannycjones temporarily deployed to PR integration tests December 7, 2023 10:55 — with GitHub Actions Inactive

dannycjones requested a review from passaro December 7, 2023 17:02

jamesbornholt reviewed Dec 8, 2023

View reviewed changes

jamesbornholt approved these changes Dec 8, 2023

View reviewed changes

dannycjones added this pull request to the merge queue Dec 8, 2023

Merged via the queue into awslabs:main with commit 863087d Dec 8, 2023
24 checks passed

dannycjones deleted the add-crt-err-underlying-req branch December 8, 2023 01:34

jamesbornholt mentioned this pull request Feb 28, 2024

Add request ID to meta request failures and add tests #790

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CRT error to CRT request logs, move CRT per-request logs to DEBUG #669

Add CRT error to CRT request logs, move CRT per-request logs to DEBUG #669

dannycjones commented Dec 6, 2023

dannycjones commented Dec 6, 2023

passaro left a comment

jamesbornholt Dec 8, 2023

Add CRT error to CRT request logs, move CRT per-request logs to DEBUG #669

Add CRT error to CRT request logs, move CRT per-request logs to DEBUG #669

Conversation

dannycjones commented Dec 6, 2023

Description of change

Does this change impact existing behavior?

dannycjones commented Dec 6, 2023

passaro left a comment

Choose a reason for hiding this comment

jamesbornholt Dec 8, 2023

Choose a reason for hiding this comment