You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The C++ SDK retries requests based on Exception name (XML response document) and HTTP response code.
The aws-c-s3 client retries only based on response code.
We are encountering fatal errors due to retry not applying in cases like the following:
[ERROR] 2024-11-20 18:29:29.374 S3MetaRequest [139830786781184] id=0x7eae94c4e500 Meta request failed from error 2058 (The connection has closed or is closing.). (request=0x7f2cdf6a6600, response status=400). Try to setup a retry.terminate called after throwing an instance of 'av::CheckException' what(): Check failure at perception/dataset/tensor_group_io.cc:145: Expected: 'x is ok', with x := 'blobstore::write_blob(outfile, all_data)' [av::status::Status] x = PutObject() failed where: cloud/aws/s3/s3_streambuf.cc:100 extra: s3://aurora-cloud-swe-prod-batch-artifacts/opt/2c9f8615/logs/77b0a8ba-1732-4444-bf99-a537dc6e7ddd/543317464f8ecea758af168e7cb50d99.rats: HTTP response code: 400Resolved remote host IP address:Request ID: 23TK08F9MWREV3JEException name: RequestTimeoutError message: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.7 response headers:connection : closecontent-type : application/xml date : Wed, 20 Nov 2024 18:29:27 GMTserver : AmazonS3transfer-encoding : chunkedx-amz-id-2 : QY98w0bZSLwaRtEN4fxse39l6wcXtgEaG6/U7nTcjIMDIFkQA7Gzj8OI8B21wjd+jXgiKGtQbhU=x-amz-request-id : 23TK08F9MWREV3JE
The above situation is a common case (s3 closing connections and sending 400 RequestTimeout) errors, see e.g. here:
400 errors are retried based on evaluating the Exception name.
Current Behavior
Any400 error automatically is a fatal error, due to the translation into AWS_ERROR_S3_INVALID_RESPONSE_STATUS.
There is support for XML in source/s3_util.c, but it is not currently used to parse the response bodies of failed requests.
Reproduction Steps
Have the S3 backend return 400 errors with retryable Exeption names. No retries are happening.
Possible Solution
Add XML parsing of response bodies, update the logic to retry RequestTimeout, as it is currently supported by aws-cli, Golangv1 SDK, AWS C++ SDK for S3 (not S3-CRT).
aws-c-s3 version used
v0.2.3
Compiler and version used
clang-15.0.7
Operating System and version
ubuntu 22.04
The text was updated successfully, but these errors were encountered:
Describe the bug
The C++ SDK retries requests based on Exception name (XML response document) and HTTP response code.
The
aws-c-s3
client retries only based on response code.We are encountering fatal errors due to retry not applying in cases like the following:
The above situation is a common case (s3 closing connections and sending 400
RequestTimeout
) errors, see e.g. here:But similar retry support is lacking in
aws-c-s3
.Expected Behavior
400 errors are retried based on evaluating the
Exception
name.Current Behavior
Any
400
error automatically is a fatal error, due to the translation intoAWS_ERROR_S3_INVALID_RESPONSE_STATUS
.There is support for XML in
source/s3_util.c
, but it is not currently used to parse the response bodies of failed requests.Reproduction Steps
Have the S3 backend return 400 errors with retryable
Exeption
names. No retries are happening.Possible Solution
Add XML parsing of response bodies, update the logic to retry
RequestTimeout
, as it is currently supported byaws-cli
, Golangv1 SDK, AWS C++ SDK for S3 (not S3-CRT).aws-c-s3 version used
v0.2.3
Compiler and version used
clang-15.0.7
Operating System and version
ubuntu 22.04
The text was updated successfully, but these errors were encountered: