-
-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using cache.shared
in JavaScript rules can lead to unexpected java.lang.IllegalStateException: Multi threaded access requested by thread
#4413
Comments
Given this is only an issue for JS Scripting, this issue might need to be moved to the add-ons repo unless core is amenable to putting a lock on the shared cache for all languages. A lock on the private cache might be advisable as well given timers. If not, locks will need to be impelmented in the JS Scripting add-on similar to the locks already in place for timers, assuming such locks are feasible. Note that if you have rules that call other rules, there are also no locks in place in that case so if two rules call the same other rule at the same time the exception will occur and as far as I've been told it is not technically feasible to put locks in place in this context, at least not in the JS Scripting add-on by itself. @florian-h05, you are most familiar with the locking stuff that was done for JS Scripting so far, wdyt? |
As I work-around, I have been trying to create String Items to store values (as JSON) and Groups to simulate arrays of such values. This seems to mostly avoid the issue: I very infrequently encounter this exception now. It doesn't completely resolve it and does make the Items page near impossible to use (as explained in openhab/openhab-webui#2819). If I come up with a better work-around, I'll let you know. |
Wrt to the proposed solutions by OP, 2 and 3 are no option. Though I wonder if the shared cache does need a synchronisation in general, because even if the script engine allows multi-threaded access to it‘s context, modifying something in the shared cache by two different threads can lead to bad side-effects. If a solution is only to be implemented in JS Scripting (which however would not synchronise the access from scripts in other languages than JS), I think we could wrap core‘s shared cache and associate the lock of the engine with the value in the shared cache on value store. GraalJS approach to disallow multi-threaded access is sometimes a PITA, but I understand that decision given the side-effects that can occur and given that NodeJS is single-threaded as well.
I don‘t think that is needed, as the values in the private cache can only be accessed by their „owner“ engine. As timer execution is synchronised in the engine, it is not possible for a timer to execute while the script is active somewhere else. |
I assumed calls to
|
They are, just checked the core code 👍 |
@J-N-K May I ask for your help here? The best solution I can currently think of is that every value in the shared cache has an owner, that can provide a lock which has to be acquired if another script is accessing the value. The lock has to be unlocked after the execution of that other script. However this poses the risk that scripts block others from execution for a possibly long time. |
It sounds like that is going to be a solution that will cause unexpected and unexplained (e.g. hard to debug) freezes in rules. For instance, I have a rule that handles calendar events. The rule uses timers to do things a number of minutes before/after an event starts/ends. These timers are set as soon as the event is seen (i.e. one of the next two events on the calendar). This script may therefore set these timers hours or even days before they fire. This rule accesses values from the shared cache before setting these timers, so it will block access to those values pretty much continuously. If anyone has a similar rule, they will find themselves with a completely broken system when this fix is released. I'm sure that's not ideal. |
Wrong. The shared cache is only blocked while the script engine is executing code. In case of timers, you run your script and create the timer in that run. The timer is then scheduled, and once the timer expires the callback is run by the same engine. In between, this engine can do other stuff and if it is not doing anything, the stuff in the shared cache it owns can be accessed by others. |
I don't fully understand the issue. The shared caches uses locks on put/get/remove, so where do you see a problem here? |
You have two rules, both having their own GraalJS script engine. The problem here is not that the cache itself does not synchronize its operations, but that script B accesses an object that is owned by script A while script A is currently running and this is a illegal access to script A's context from script B's thread. |
IMO that's wrong usage of the cache. You should store unmodifiable objects there, and instead of modifying an existing object create a modified copy and put that to the cache. |
Indeed: I can get a reference to a single JS object instance in two rules through the shared cache. These rules can run in separate threads and JavaScript engines are designed to run single-threaded. Using the cache to access properties of a single object shared between two rules at the same time appears to result in exceptions. The example code I created writes to the object but the problem should exist if it reads properties as well. I will run a quick test to make sure. |
I've updated the repro code to only read a property and not modify the object:
Here's a sample of the errors I'm getting after starting
|
Note that sometimes I can catch an exception and sometimes I get the "Failed to execute rule..." error. |
I think the issue really is that you try to use an object that itself does not support multi-thread access. It's not an issue of the cache. Try something simple, put an integer to the cache, start with 0, increase after every read and put the new number to the cache. This should work perfectly fine. |
That's the problem with objects in Graal ... primitives should work perfectly fine.
👍 |
Sure that would work, but I'm trying to allow two rules to communicate with more than just integers. I've got a number of rules that want to play messages through audio, these store a string that refers to the mp3 file to be played in a shared array. A single rule control the audio player and removes the top entry from the array to play, and the the next, etc. I cannot implement that with a shared cache that only allows integers. |
Note that my original repro is based on an example from the documentation, which uses an object with a property that serves as a counter. This suggests that this is a totally normal thing to do and should not result in exceptions. See here: https://www.openhab.org/addons/automation/jsscripting/#cache |
This will result in issues. One rule will read the string, parse it to an array, add an mp3 file, stringify and set. Meanwhile another rule will do the same. Both get the same initial string from the cache, add a string, and write to the cache. The last one will overwrite the changes made by the first one. |
Now that I know what you want to achieve, try that out: Have one String Item. All rules that want to queue an mp3 file should sent the filename to this Item. |
I had that setup initially but dropped it for some reason - I don't recall what it was. I can try it again and find out why that didn't work for me. But if the shared cache should never contain anything other than primitives, that should be documented and it should throw an error when you try to do so IMHO. |
Sure - the docs are only that way because no one noticed this problem before. I will take care of documentation and to log a warning when putting an object into shared cache. |
This is almost exactly the use case I had when I created the Separation of Behavior and Gatekeeper design patterns and Gatekeeper class in openhab_rules_toole. In either case you'd only need the private cache, not the shared cache, so there won't be any multithreaded access problems. However, note that you will want to use a String Item for this and not calling the rule from another rule becuase that latter approach is not locked so if two rules attempt to call the same rule at the same time the multithreaded exception will occur. I still think having locks on the Objects in the shared cache is still a good idea because even if one used immutable Objects, the problem @SkyLined points out could still occur where two rules (regardless of the language) start at near the same time, each grab the value from the cache at near the same time, one modifies it and puts the modifications back into the cache. The other one modifies their now stale copy of the Object and puts that back into the cache, wiping out the changes from the first. This remains a potential problem regardless of the rules language. |
I think that there won't be a multithreaded exception, just tried the following: Have a rule A that has a JS script action with a sleep of 10 seconds to ensure the rule runs long enough. Have a rule B that attemps to run rule A and pass some information. Running B while A is running does not lead to a multithreaded exception but instead to:
|
Update docs and log a warning if done so. Refs openhab/openhab-core#4413. Signed-off-by: Florian Hotze <[email protected]>
Update docs and log a warning if done so. Refs openhab/openhab-core#4413. Signed-off-by: Florian Hotze <[email protected]>
I routinely see Multi-threaded exceptions in my config. Something is not being properly locked in the combination of rules running which create timers which call another rule. Assuming that rule locking and timer locking are working all that's left is two rules calling the same other rule at the same time. But I don't think that test case actually tests what I'm talking about. My internet is down right now so I can't test this just yet, but I'd test it as follows: Create a script or rule A with a JS Action and make sure that the script also has a variable or two. Put the sleep in that for sure at the end to make sure it ruins long enough but I don't know that the sleep is necessary. But now create rules B and C with a cron trigger and JS action that calls rule A at the same time with passed values, at least one of which is an Object (I don't know it that matters or not, primites may also cause the error). You need three rules to test the behavior I'm talking about because I don't think it has to do with the fact that rule A is already running as much as rule B and C are trying to call it at the same time. IIRC you've said that Or maybe some recent changes have changed this behavior and the engine now generates a new error? Im on milestone releases. I wish I could get to my logs right now. I'll try to remember to come back once I have internet again. |
I think I remember why I stopped using a single string Item to queue files and a rule that triggers when it's set: The rule needed a way to play files one after the other when two rules set the Item around the same time. For that I created a queue in a local variable, which did not cause problems, and started a timer to play the next file in the queue when the previous one was done. I experienced a lot of exceptions with these timers, which would break this setup. I attempted to work around that by creating a rule that runs every second in place of the timer, and use the shared cache to allow both rules to access the queue. I'll revert back to using timers and see if I can pinpoint all multi-threaded access issues I get with them. Fixing that would make life a lot easier for me, and I expect everyone that uses them. |
With the introduction of script pre-compilation, a regression wrt to multi-thread access synchronization has been introduced for UI-based scripts, which has been fixed just yesterday: openhab/openhab-addons#17510 If you have used UI-based rules for your first attempt, the above could apply to your problems. |
Gotta love Heisenbugs. I'll see about updating to the snapshot this weekend and see if my periodic multithreaded exceptions go away.
They way I've set this up in the past was something along these lines. I trigger the rule based on command to a String Item. The code for the Action is along these lines:
The gatekeeper is in the private cache and it manages a queue of commands and a Timer to run a command (a passed in function) and wait the timeout time minus how ever long it took to run the command function before running the next command on the queue. I don't think I ever experienced a multithreaded exception from Gatekeeper in my setup, so this approach might work around the regression that @florian-h05 just fixed too. Though it could also just be a fluke of timing. |
Expected Behavior
The
cache.shared
object is documented as allowing any script to access it and "share" information. The documentation comes with an example forcache.private
that we can adjust to usecache.shared
:We would expect the above code to work when used by two rules; every time either of the rules fires, the code is executed, the counter increased and the old value logged.
Current Behavior
cache.shared
allows sharing JavaScript objects between two rules. JavaScript code for two different rules will get executed in two different threads. The JavaScript engine used by OpenHAB is not designed to be run in multiple threads simultaneously. The Java code is aware of this and will throw ajava.lang.IllegalStateException
exception when two threads attempt to access the same JavaScript object. The message associate with this exception is:This exception unexpectedly stop the execution of the rule using
cache.shared
and breaks its functionality.Possible Solution
cache.shared
to share information between multiple rules. Obviously, limiting the amount of sharing seriously reduces the usefulness ofcache.shared
, so this is more of a stop-gap than a solution IMHO.Steps to Reproduce (for Bugs)
This issue quite easy to reproduce If we create two JavaScript rules that accesses a shared object over a prolonged period and execute them at nearly the same time. In this set-up, the JavaScript code is almost guaranteed to simultaneously attempt to access the object, which triggers the exception. Here's a step by step guide to do this:
Test1
andTest2
Test1
that fires every second (cron:* * * * * ? *
)Test2
does not need a trigger: you will run it manually later.Test1
andTest2
to execute the same JavaScript code (found below).Test1
(the cron rule)Test2
Here is the code used in both rules:
Here is an example log output showing the exception after starting the first rule and manually running the second:
The exception can happen in either
Test1
orTest2
depending on timing.Context
I am creating a queue for audio files to be played. The queue is stored in an
Array
incache.shared
. Any rule can useArray.push()
to add a file to the queue. A dedicated rule uses a timer to repeatedly check if it is time to play the next file and usesArray.shift()
to get the next file from the queue and send it to the player. This system breaks when two rules try to add or remove files to/from the queue at the same time. These exceptions make the system unreliable.Your Environment
The text was updated successfully, but these errors were encountered: