Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new cases of expected behaviours in troubleshooting doc #789

Merged
merged 3 commits into from
Mar 8, 2024

Conversation

sauraank
Copy link
Contributor

Description of change

Updated troubleshooting documentation with 4 new cases. It includes the expected behaviour in case of all file deletion, credential expiry while bucket is still mounted, throttling error and invalid hostname for storage provider not supporting virtual style.
NOTE: I am not adding the the cases of invalid credentials at the time of mounting as the error message in those cases are super clear and user don't need to go to logs to understand the issue.

Relevant issues:
#722

Does this change impact existing behavior?

No. It is just documentation change.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@sauraank sauraank had a problem deploying to PR integration tests February 27, 2024 09:40 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 27, 2024 09:40 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 27, 2024 09:40 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 27, 2024 09:40 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 27, 2024 09:40 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 27, 2024 09:40 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 27, 2024 09:40 — with GitHub Actions Failure
@sauraank sauraank force-pushed the troubleshooting_update branch from fe7afc1 to 82a5fe6 Compare February 27, 2024 09:41
@sauraank sauraank temporarily deployed to PR integration tests February 27, 2024 09:41 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 27, 2024 09:41 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 27, 2024 09:41 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 27, 2024 09:41 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 27, 2024 09:41 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 27, 2024 09:41 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 27, 2024 09:41 — with GitHub Actions Inactive
doc/TROUBLESHOOTING.md Show resolved Hide resolved
doc/TROUBLESHOOTING.md Outdated Show resolved Hide resolved
doc/TROUBLESHOOTING.md Outdated Show resolved Hide resolved
doc/TROUBLESHOOTING.md Outdated Show resolved Hide resolved

## Throttling Errors

When looking at the logs, these will appear as failed requests with `http_status=503` or `http_status=429` .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log examples, please! Also, as a developer apart from realising that S3 is throttling my app, what should I do? Will MP retry those automatically?

Copy link
Contributor Author

@sauraank sauraank Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From mountpoint POV it does not perform anything special on throttling like retries. I added it as it was there in support runbook. I can provide ways to mitigate 503 like restructuring the bucket with more number of prefix which are there in public aws troubleshooting pages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the main bit here is that Mountpoint does not do throttling itself. Thanks for describing a way of mitigating S3 throttling errors, but where does this come from? You mention there 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD, these values may become outdated, I'd add a link to the source of that information.

Otherwise LGTM!

@sauraank sauraank had a problem deploying to PR integration tests February 29, 2024 11:54 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 29, 2024 11:54 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 29, 2024 11:54 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 29, 2024 11:54 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 29, 2024 11:54 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 29, 2024 11:54 — with GitHub Actions Failure
@sauraank sauraank had a problem deploying to PR integration tests February 29, 2024 11:54 — with GitHub Actions Failure
@sauraank sauraank force-pushed the troubleshooting_update branch from 216c995 to 96c580e Compare February 29, 2024 11:57
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 11:57 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 11:57 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:21 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:21 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:21 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:21 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:21 — with GitHub Actions Inactive
@sauraank sauraank force-pushed the troubleshooting_update branch from 96c580e to 6431806 Compare February 29, 2024 13:34
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:34 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:34 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:34 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:34 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:34 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:34 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests February 29, 2024 13:34 — with GitHub Actions Inactive

In this case, try using `--force-path-style` CLI option when you are mounting the bucket using Mountpoint.

NOTE - Third party storage provider are not officially supported by Mountpoint for Amazon S3.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: delete

1: Client error
2: Unknown CRT error
3: CRT error 1059: aws-c-io: AWS_IO_DNS_INVALID_NAME, Host name was invalid for dns resolution.
Error: Failed to create mount process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: delete this line


For more details on how Mountpoint handles endpoint, please see our [configuration documentation](https://github.com/awslabs/mountpoint-s3/blob/main/doc/CONFIGURATION.md#endpoints-and-aws-privatelink).

## Directory disappear after deleting all the files within it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Directory disappear after deleting all the files within it
## Directory disappears after deleting all the files within it


## Directory disappear after deleting all the files within it

Amazon S3 does not support directories and objects are just grouped using prefix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just echo the existing doc:

Suggested change
Amazon S3 does not support directories and objects are just grouped using prefix.
The Amazon S3 data model is a flat structure, with no hierarchy of subdirectories.

Comment on lines 176 to 179
So, if all the files within a prefix are deleted, the prefix itself and the corresponding directory cease to exist.
In this case, it is expected that mountpoint will not be able to show the directory for listing or other file system operation.

Workaround to persist a directory could be creating an empty file (for example, `.keep`). These files can be hidden as `ls` filters out files with prefix `.` without `-a` option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
So, if all the files within a prefix are deleted, the prefix itself and the corresponding directory cease to exist.
In this case, it is expected that mountpoint will not be able to show the directory for listing or other file system operation.
Workaround to persist a directory could be creating an empty file (for example, `.keep`). These files can be hidden as `ls` filters out files with prefix `.` without `-a` option.
If all the files within a prefix are deleted, the prefix itself and the corresponding directory cease to exist.
In this case, it is expected that Mountpoint will no longer show the directory or be able to create new files within it. You can recreate the directory with `mkdir` and then continue creating new files within it. Alternatively, you can prevent a directory from disappearing by creating an empty, hidden file (for example, `.keep`) inside it.

Comment on lines 183 to 209
## Input/output error after running workload for some time

It is possible that your AWS credentials may get expired while your bucket is still mounted using Mountpoint.
In that case you will get following errors for filesystem operations:

```
ls: reading directory '.': Input/output error
```

```
cat: new_file.txt: Input/output error
```

Please check the Mountpoint logs. If you see the following errors, then your AWS credentials have expired.

```
[WARN] lookup{req=104 ino=1 name="Input"}:
list_objects{id=20 bucket=plutodemo continued=false delimiter=/ max_keys=1 prefix=Input/}: mountpoint_s3_client::s3_crt_client:
meta request failed duration=10.023623ms error=ClientError(Forbidden("The provided token has expired."))

[WARN] lookup{req=106 ino=1 name="Input"}:
head_object{id=21 bucket=plutodemo key=Input}: mountpoint_s3_client::s3_crt_client:
meta request failed duration=11.087781ms error=ClientError(Forbidden("<no message>"))
```

You might need to have a fresh mount of your bucket with valid credentials or increase your credential session duration if that suits your use case.
For more details to configure AWS credentials, see the [configuration documentation](https://github.com/awslabs/mountpoint-s3/blob/main/doc/CONFIGURATION.md#aws-credentials).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this for the moment. This sounds like a bug — we should be refreshing temporary credentials automatically.


## Throttling Errors

When looking at the logs, these errors will appear as failed requests with `http_status=503` or `http_status=429` . For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When looking at the logs, these errors will appear as failed requests with `http_status=503` or `http_status=429` . For example:
When looking at the logs, throttling errors will appear as failed requests with `http_status=503` or `http_status=429`. For example:

```

The 503 or 429 status codes means the request limits have been exceeded.
Mountpoint itself does not do any throttling, so any throttling will be from S3 or possibly dependent services, like STS which is used to provide credentials.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Mountpoint itself does not do any throttling, so any throttling will be from S3 or possibly dependent services, like STS which is used to provide credentials.
Mountpoint itself does not do any throttling. These errors are returned from S3 or from dependent services, like STS which is used to provide credentials.

Comment on lines 228 to 230
You can try to mitigate throttling (503 Slow Down) by distributing objects across multiple prefixes if your use case allows it.
Since, you can send 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in an S3 bucket, increasing the prefix in the bucket would allow more requests to be processed by S3.
Amazon S3 gradually scales up to handle requests for each of the prefixes separately.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can try to mitigate throttling (503 Slow Down) by distributing objects across multiple prefixes if your use case allows it.
Since, you can send 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in an S3 bucket, increasing the prefix in the bucket would allow more requests to be processed by S3.
Amazon S3 gradually scales up to handle requests for each of the prefixes separately.
Amazon S3 automatically scales to high request rates.
Your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned Amazon S3 prefix.
You can reduce the impact of throttling errors by distributing objects across multiple prefixes in your bucket.

@sauraank sauraank force-pushed the troubleshooting_update branch from 6431806 to ec07f83 Compare March 8, 2024 16:10
@sauraank sauraank temporarily deployed to PR integration tests March 8, 2024 16:10 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests March 8, 2024 16:10 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests March 8, 2024 16:10 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests March 8, 2024 16:10 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests March 8, 2024 16:10 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests March 8, 2024 16:10 — with GitHub Actions Inactive
@sauraank sauraank temporarily deployed to PR integration tests March 8, 2024 16:10 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt added this pull request to the merge queue Mar 8, 2024
Merged via the queue into awslabs:main with commit 81ae0da Mar 8, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants