-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix LRU crash when getting too many random lua scripts #1310
base: unstable
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks. This make sense to me, btw, can we do something in defrag.c to resume this pointer?
please fix the DCO btw
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1310 +/- ##
============================================
+ Coverage 70.55% 70.69% +0.14%
============================================
Files 115 115
Lines 63158 63158
============================================
+ Hits 44561 44652 +91
+ Misses 18597 18506 -91
|
02c4673
to
6222d74
Compare
Signed-off-by: Seungmin Lee <[email protected]>
6222d74
to
81e4cd1
Compare
Thanks for the comment. Are you saying about tracking references during defragmentation? if not, can you elaborate on your questions please? |
@@ -203,6 +203,7 @@ void scriptingInit(int setup) { | |||
* sha with the dictionary, so free fn is not set. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment need to update
yean, something like that. |
Problem
Valkey stores scripts in a dictionary (lua_scripts) keyed by their SHA1 hashes, but it needs a way to know which scripts are least recently used. It uses an LRU list (lua_scripts_lru_list) to keep track of scripts in usage order. When the list reaches a maximum length, Valkey evicts the oldest scripts to free memory in both the list and dictionary. The problem here is that the sds from the LRU list can be pointing to already freed/moved memory by active defrag that the sds in the dictionary used to point to. It results in assertion error at this line
Solution
If we duplicate the sds when adding it to the LRU list, we can create an independent copy of the script identifier (sha). This duplication ensures that the sha string in the LRU list remains stable and unaffected by any defragmentation that could alter or free the original sds. In addition, dictUnlink doesn't require exact pointer match(ref) so this change makes sense to unlink the right dictEntry with the copy of the sds.
Reproduce
To reproduce it with tcl test:
Crash info
Crash report:
Defrag info:
Test:
Run the test script to push 100,000 scripts to ensure the LRU list keeps 500 maximum length without any crash.