Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage Collect old persisted queries #217

Closed
jasonbahl opened this issue May 5, 2023 · 4 comments · Fixed by #227
Closed

Garbage Collect old persisted queries #217

jasonbahl opened this issue May 5, 2023 · 4 comments · Fixed by #227
Assignees
Labels
type: enhancement Improvements to existing functionality

Comments

@jasonbahl
Copy link
Collaborator

When using Automated Persisted Queries, the query documents are stored but never cleaned up, so anytime clients change the documents they're using the old ones stick around.

I believe that Query Documents should update their modified timestamp any time they're executed against, and any document that's not been executed in xx amount of time should be deleted.

Query Documents should be able to be marked as excluded from automatic garbage collection as well.

@jasonbahl jasonbahl added the type: enhancement Improvements to existing functionality label May 5, 2023
@jasonbahl jasonbahl self-assigned this May 5, 2023
@jmotes
Copy link

jmotes commented May 5, 2023

Thanks for this. I just set up automatic persisted queries in our project yesterday and noticed that the persisted queries list grows pretty quickly as changes to queries are pushed up and was wondering if there was a mechanism to purge them.

If I can beta test or anything just let me know!

@jasonbahl
Copy link
Collaborator Author

Some questions:

If I create a query in the admin, should that query be garbage collected? Or should it be ignored from garbage collection? How can I flag a manually created query to not be garbage collected (or flag APQ queries to be garbage collected?)

Do we use the "created" date (or modified date?), to determine expiration or something else?

  • for example, we could have a cron job or something run daily that checks for queries created xx time before now and deletes all of them. Or we could look for queries with an expiration (meta field or other?) (I'm leaning toward created or modified)

  • one thought here is that we use the "modified" date. Any query that's not been modified in xx amount of time would be deleted. When a Cache Miss occurs and the query is executed and resolved, we could update the modified time at that point, indicating that the query is still active by the client and would make it ineligible for garbage collection

  • or we don't care about the modified time. We just garbage collect and the client will "pay the price" to create the query again if it's already been deleted

What should the default time be for garbage collection?

  • 30 days? filterable / override via a setting. (i.e. 30 days after post_modified / post_create date, query is deleted, unless flagged to not be garbage collected (posts manually created in the admin?))

@markkelnar
Copy link
Contributor

To answer questions from above.

By default garbage collection is off globally. Also when the plugin is initially activated. Unless the toggle setting is already present in the database as on.

The garbage collection toggle when enabled, will register the cron job.
When the job runs, it will verify that the admin setting is still set to on. Cause in my brief development testing, I saw a scenario where the admin setting was disabled but the cron job was still scheduled.

What about queries created in WP admin editor?

We do not have a working example of someone who uses WP admin editor created/managed and APQ queries for the same site. The proposal is to add a taxonomy field per query visible in the WP admin editor to designate the query as 'skip garbage collection'. For this issue, I vote that we wait until the need arises.

What if queries are saved in the system before this garbage collection feature that the user wants to skip garbage collection? After installing this new feature but before toggle on enabling garbage collection, the user should designate those specific queries to skip garbage collection, using whatever feature/interface is created.

Use the modified date for the post to determine age of the query in the system.

Limit queries loaded/deleted in one cron job.
Do not load ALL queries meeting the expiration/age criteria. Instead, load a smaller number of queries.
We do not have specific performance numbers to accurately know the right number. Anywhere less than 72k at a time sounds right. What about two batches of 36k? Smaller sized batches that are run frequently enough to garbage collect at a rate fast enough to mitigate the garbage creation.

@jasonbahl
Copy link
Collaborator Author

After reviewing #227, it made me think that we could potentially address 2 issues at once:

The PR (#227) opened to resolve this issue registers a taxonomy and allows a term to be saved to it. If a document has that term saved to it, then that document is skipped during garbage collection.

I think if we re-name the Taxonomy to something like "Groups" (what Paw / RapidAPI client calls groups of requests) or "Collections" (what Postman calls groups of requests) then we could resolve both issues at once.

In addition, we'd be able to delete the code we have in the PR that adds a custom meta box, checkbox and nonce for storing the term, and instead could make use of core WordPress UIs for associating the "graphql_document" post types with a "Group" / "Collection" taxonomy.

something like:

CleanShot 2023-07-19 at 12 05 08@2x

CleanShot 2023-07-19 at 12 05 19@2x

The logic to handle garbage collection would remain essentially the same, but instead of checking for graphql_documents that are NOT IN the individual term, we would check for documents that are not in any term (would need testing, but I believe the technique shown here would work: https://wordpress.stackexchange.com/questions/114978/get-all-posts-without-tags#answer-333485)


I believe by making this change, we would serve the current users that would benefit from garbage collection, while also providing added value to users (including myself and others see: https://twitter.com/wpgraphql/status/1678973750048616449?s=20) that use the Admin UIs to create GraphQL Documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Improvements to existing functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants