-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Support concurrent updates for Zep users #326
Comments
Thanks for raising this. We candidly did not design user metadata handling with high concurrency requirements in mind. It would be helpful to understand how you're using user metadata. What data are you storing in the user object? |
Hi @danielchalef , thanks for your quick response. My data flow is the following:
So in the user metadata I save:
What happens is that sometimes both async tasks or even "main" thread updates, and one of the async tasks attempt to update at the same time and the app breaks because of APIError (produced by the advisory lock in pg db). Ideally, one of the concurrent updates should keep waiting for the release to happen. Or, at least, the API should provide a method like |
Candidate fix in #329 |
@nicoeiris11 Zep v0.24.0 includes an experimental approach to locking user metadata that should cope better with high-concurrency updates to the same user record. Please try it out and let me know if this fixes your issue. We'll apply this fix more widely if so. |
@danielchalef, thank you so much for the quick fix. I tested v.0.24.0 and the number of failed updates decreased significantly, but still had some cases in which Zep failed after the 3 retries. Is it possible to experiment with a retry policy with exponential backoff and more time between attempts? I appreciate your time and dedication to improving the user experience/development of the tool. |
Good point. An exponential backoff may work better here. I'm loath to increase the max backoff time beyond ~10 seconds, preferring the client time out and retry. See #330 |
v0.25.0 - Use Exponential Backoff for Metadata Lock Fails is building. Please let me know your thoughts once you've had a chance to check it out. |
@danielchalef thank you so much for releasing a new version with the changes so fast. I'll let you know as soon as I have updates from the new tag testing. |
@danielchalef I want to confirm that with v0.25.0 my load test passed 100% ok. There will always be a threshold of requests load that will cause Zep to return time out due to the number of concurrent DB operations. But from my last experiments, I didn't notice ApiError due to locks anymore (only time outs when I manually force a very high load scenario). I want to thank you for your hard work and dedication to maintaining the repo and addressing developers' issues. |
@nicoeiris11 Great to hear! If you're using Zep's default |
@danielchalef Yes, I'm using docker-compose in production pointing to stable tag release. |
The Postgres website has good guidance on tuning. I've also found this tool useful: https://pgtune.leopard.in.ua/ |
Is your feature request related to a problem? Please describe.
I'm having a lot of problems IN PROD updating Zep users in async tasks. I run Celery tasks to process data and save it to Users metadata. Different async tasks could potentially update the same user metadata at the same time. I'm aware that Zep implements DB advisory locks in the update as follows:
However, not having control in my app over this behavior makes my data inconsistent and generates many tasks to fail on my side (Zep ApiError).
Describe the solution you'd like
I'd like better handling of locks on Zep side, implementing:
Describe alternatives you've considered
Either having the Zep user update operation to wait until the lock is released to update the user, or , another strategy could be providing a method to check if a user is locked, so the client can wait until the user is locked to perform the next update op.
Also, having a method to force the release of a lock if something is wrong on Zep side could also be useful.
Additional context
Currently, the solution I implemented which is not working, is each time I want to perform an update, I do the following:
This is the way I found to handle concurrent updates in my Celery async tasks.
But still having issues for many cases. Any thoughts? Any better possible solution?
Thank you so much and great work guys!!
The text was updated successfully, but these errors were encountered: