Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CRT error to CRT request logs, move CRT per-request logs to DEBUG #669

Merged
merged 3 commits into from
Dec 8, 2023

Conversation

dannycjones
Copy link
Contributor

Description of change

At the moment, errors for individual requests made by the CRT client (inside of a meta request) can fail and we log that they failed, but only with the HTTP Status (or lack thereof as "-1").

For example, these three requests failed for different reasons.

WARNING request failed request_type=GetObject http_status=-1 range=None duration=3.048089489s ttfb=None request_id=<unknown>
WARNING request failed request_type=GetObject http_status=500 range=None duration=2.968672313s ttfb=Some(2.713102782s) request_id=0H2C1MFGVVR9ARMV
WARNING request failed request_type=GetObject http_status=-1 range=None duration=290.737317ms ttfb=None request_id=<unknown>

We have access to a CRT error code here, so we should surface that. These also log on success, so I've wrapped it in an Option<>.

I've also updated the message itself to be "CRT request" to try and highlight that these are inner requests made by the CRT that can be retried or be part of a much larger meta request.

An error message may look like this now:

WARNING CRT request failed request_type=GetObject crt_error=Some(Error(1048, "aws-c-io: AWS_IO_SOCKET_TIMEOUT, socket operation timed out.")) http_status=-1 range=None duration=3.126663123s ttfb=None request_id=<unknown>

I almost wonder if we should be logging these as warning, as it has caused some confusion around what has actually happened and if the data was really retrieved or the application failed to handle that error. For now, I'll leave them as warning.

Relevant issues: N/A

Does this change impact existing behavior?

No behavior change, logs contain new information. Message changes slightly.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@dannycjones dannycjones requested a review from passaro December 6, 2023 14:32
@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 14:32 — with GitHub Actions Inactive
@dannycjones
Copy link
Contributor Author

FYI @IsaevIlya

@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 14:32 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 14:32 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 14:32 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 15:47 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 15:47 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 15:48 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 15:48 — with GitHub Actions Inactive
Copy link
Contributor

@passaro passaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good. We could go a bit further, but happy to do that in a separate PR if you prefer.

mountpoint-s3-client/src/s3_crt_client.rs Outdated Show resolved Hide resolved
mountpoint-s3-client/src/s3_crt_client.rs Outdated Show resolved Hide resolved
@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 17:57 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 17:57 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 17:57 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 6, 2023 17:57 — with GitHub Actions Inactive
@dannycjones dannycjones changed the title Add CRT error to underlying request log entries Add CRT error to CRT request logs, move CRT per-request logs to DEBUG Dec 7, 2023
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 09:43 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 09:43 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 09:43 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 09:43 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 09:43 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 09:43 — with GitHub Actions Inactive
@dannycjones dannycjones enabled auto-merge December 7, 2023 09:44
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 10:55 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 10:55 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 10:55 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 10:55 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 10:55 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 10:55 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests December 7, 2023 10:55 — with GitHub Actions Inactive
@dannycjones dannycjones requested a review from passaro December 7, 2023 17:02
Comment on lines +460 to +465
"CRT request failed"
} else {
"request finished"
"CRT request finished"
};
event!(log_level, %request_type, http_status, ?range, ?duration, ?ttfb, %request_id, "{}", message);
trace!(detailed_metrics=?metrics, "request completed");
debug!(%request_type, ?crt_error, http_status, ?range, ?duration, ?ttfb, %request_id, "{}", message);
trace!(detailed_metrics=?metrics, "CRT request completed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just bikeshedding but i don't think "CRT request" is the right way to describe these. it's an S3 request that failed. if we're going to push these down to debug level, i don't see the need to also write "CRT" here.

@dannycjones dannycjones added this pull request to the merge queue Dec 8, 2023
Merged via the queue into awslabs:main with commit 863087d Dec 8, 2023
24 checks passed
@dannycjones dannycjones deleted the add-crt-err-underlying-req branch December 8, 2023 01:34
jamesbornholt added a commit to jamesbornholt/mountpoint-s3 that referenced this pull request Feb 28, 2024
A side effect of awslabs#669 was
that there's now no way to get request IDs for failed requests at the
default logging settings, as only DEBUG-level messages include the
request IDs. This change adds request IDs to the meta request failure
message when available, so that these WARN-level messages still include
request IDs.

I also added some new infrastructure to test metrics and log messages.
For metrics, we build a new `metrics::Recorder` that collects all the
metrics and can then be searched to find them. For log messages, we
build a `tracing_subscriber::Layer` that collects all tracing events
emitted while enabled. In both cases, the new objects aren't thread
safe, as both `Recorder`s and `Layer`s are global state. So these tests
need to continue to use `rusty_fork` to split into a new process per
test.

Signed-off-by: James Bornholt <[email protected]>
jamesbornholt added a commit to jamesbornholt/mountpoint-s3 that referenced this pull request Feb 28, 2024
A side effect of awslabs#669 was
that there's now no way to get request IDs for failed requests at the
default logging settings, as only DEBUG-level messages include the
request IDs. This change adds request IDs to the meta request failure
message when available, so that these WARN-level messages still include
request IDs.

I also added some new infrastructure to test metrics and log messages.
For metrics, we build a new `metrics::Recorder` that collects all the
metrics and can then be searched to find them. For log messages, we
build a `tracing_subscriber::Layer` that collects all tracing events
emitted while enabled. In both cases, the new objects aren't thread
safe, as both `Recorder`s and `Layer`s are global state. So these tests
need to continue to use `rusty_fork` to split into a new process per
test.

Signed-off-by: James Bornholt <[email protected]>
jamesbornholt added a commit to jamesbornholt/mountpoint-s3 that referenced this pull request Feb 28, 2024
A side effect of awslabs#669 was
that there's now no way to get request IDs for failed requests at the
default logging settings, as only DEBUG-level messages include the
request IDs. This change adds request IDs to the meta request failure
message when available, so that these WARN-level messages still include
request IDs.

I also added some new infrastructure to test metrics and log messages.
For metrics, we build a new `metrics::Recorder` that collects all the
metrics and can then be searched to find them. For log messages, we
build a `tracing_subscriber::Layer` that collects all tracing events
emitted while enabled. In both cases, the new objects aren't thread
safe, as both `Recorder`s and `Layer`s are global state. So these tests
need to continue to use `rusty_fork` to split into a new process per
test.

Signed-off-by: James Bornholt <[email protected]>
jamesbornholt added a commit to jamesbornholt/mountpoint-s3 that referenced this pull request Feb 28, 2024
A side effect of awslabs#669 was
that there's now no way to get request IDs for failed requests at the
default logging settings, as only DEBUG-level messages include the
request IDs. This change adds request IDs to the meta request failure
message when available, so that these WARN-level messages still include
request IDs.

I also added some new infrastructure to test metrics and log messages.
For metrics, we build a new `metrics::Recorder` that collects all the
metrics and can then be searched to find them. For log messages, we
build a `tracing_subscriber::Layer` that collects all tracing events
emitted while enabled. In both cases, the new objects aren't thread
safe, as both `Recorder`s and `Layer`s are global state. So these tests
need to continue to use `rusty_fork` to split into a new process per
test.

Signed-off-by: James Bornholt <[email protected]>
jamesbornholt added a commit to jamesbornholt/mountpoint-s3 that referenced this pull request Feb 28, 2024
A side effect of awslabs#669 was
that there's now no way to get request IDs for failed requests at the
default logging settings, as only DEBUG-level messages include the
request IDs. This change adds request IDs to the meta request failure
message when available, so that these WARN-level messages still include
request IDs.

I also added some new infrastructure to test metrics and log messages.
For metrics, we build a new `metrics::Recorder` that collects all the
metrics and can then be searched to find them. For log messages, we
build a `tracing_subscriber::Layer` that collects all tracing events
emitted while enabled. In both cases, the new objects aren't thread
safe, as both `Recorder`s and `Layer`s are global state. So these tests
need to continue to use `rusty_fork` to split into a new process per
test.

Signed-off-by: James Bornholt <[email protected]>
jamesbornholt added a commit to jamesbornholt/mountpoint-s3 that referenced this pull request Mar 7, 2024
A side effect of awslabs#669 was
that there's now no way to get request IDs for failed requests at the
default logging settings, as only DEBUG-level messages include the
request IDs. This change adds request IDs to the meta request failure
message when available, so that these WARN-level messages still include
request IDs.

I also added some new infrastructure to test metrics and log messages.
For metrics, we build a new `metrics::Recorder` that collects all the
metrics and can then be searched to find them. For log messages, we
build a `tracing_subscriber::Layer` that collects all tracing events
emitted while enabled. In both cases, the new objects aren't thread
safe, as both `Recorder`s and `Layer`s are global state. So these tests
need to continue to use `rusty_fork` to split into a new process per
test.

Signed-off-by: James Bornholt <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Mar 7, 2024
A side effect of #669 was
that there's now no way to get request IDs for failed requests at the
default logging settings, as only DEBUG-level messages include the
request IDs. This change adds request IDs to the meta request failure
message when available, so that these WARN-level messages still include
request IDs.

I also added some new infrastructure to test metrics and log messages.
For metrics, we build a new `metrics::Recorder` that collects all the
metrics and can then be searched to find them. For log messages, we
build a `tracing_subscriber::Layer` that collects all tracing events
emitted while enabled. In both cases, the new objects aren't thread
safe, as both `Recorder`s and `Layer`s are global state. So these tests
need to continue to use `rusty_fork` to split into a new process per
test.

Signed-off-by: James Bornholt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants