Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait_for_snapshot undocumented behavior / enhancement #120409

Open
murtll opened this issue Jan 17, 2025 · 0 comments
Open

wait_for_snapshot undocumented behavior / enhancement #120409

murtll opened this issue Jan 17, 2025 · 0 comments
Labels
>enhancement needs:triage Requires assignment of a team area label

Comments

@murtll
Copy link

murtll commented Jan 17, 2025

Description

wait_for_snapshot action is not well documented and it's actual behavior doesn't suite well for one-day indices (i.a. log storage).

My case:
I needed to store indices for some time, make a snapshot 5 days in advance before deleting index, then delete index, but check if snapshot was created and don't delete index if it's snapshot is not available.

As documentation says, it should work like that. In fact, before 8.13 wait_for_snapshot step didn't check that specified index snapshot is available (only policy run) and I suddenly lost some amount of old logs because of configuration mistake.
After that I upgraded Elasticsearch to 8.16, where availability of index snapshot is checked, but now it checks only if LAST snapshot of the policy contains specified index, so I cannot make a snapshot in advance.

Also that snapshot must be created after the ILM action time, so if the snapshot was created at 03:00 and ILM action happened at 04:00, it will wait until the next day and next SLM policy run. After that next SLM run, it will check if new snapshot contains specified index, but it won't, because that index was included in previous snapshot.

After learning all that from reading source code, I resolved this case by making snapshot one day AFTER index is transitioned to delete phase by ILM. But it seems to be not a very good solution to me, because I want to make snapshot in advance before index deletion, to have some time to fix SLM if it is broken. In current case, if SLM breaks, I will have run out of disks in couple of days because old indices will not be deleted.

In total, it at least should be documented and this behavior should be recorded as correct. At most - I think it will be good to have 2 types of "waiting" for snapshot - one type as current (for long-existing indices) and another one for one-day indices, which will check that index does exist at least in one snapshot of policy (no matter is it last or not), and will not check for the time of snapshot creation.

@murtll murtll added >enhancement needs:triage Requires assignment of a team area label labels Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement needs:triage Requires assignment of a team area label
Projects
None yet
Development

No branches or pull requests

1 participant