Careful! Confusing C# Concept: Closures Capturing Crap Carries Confounding Caveats, Creating Curiously Complex Conditions #3226
Replies: 1 comment
-
A note about another way to help avoid the problem, that might sometimes look like a silly or pointless optimization in some changes I've submitted: If you can make your lambda static, your lambda is almost certainly immune to the problem. A static delegate can't capture non-static variables, meaning it can't capture local variables or parameters. Thus, it almost always protects you from the problem. Trying to make lambdas static is a worthwhile goal for that reason alone. That does, however, also prevent you from using the copy trick, so it's not always an option, depending on how much control you have over the signature of the delegate itself. |
Beta Was this translation helpful? Give feedback.
-
This one is a fun concept that crops up from time to time and can be easy to miss even in tests (especially if no test sets up the problematic situations, which can be hard to think up sometimes).
Capture of variables in closures.
What is it?
Any time you have a closure, such as but probably not limited to a lambda, anonymous method, or local method (all of which are delegates)...
If the code inside that closure accesses a variable or parameter from its enclosing code, that's capturing a variable.
So long as that value doesn't change outside the scope of the delegate, and so long as the delegate doesn't make changes to the variable that the enclosing code might depend on1, you're probably fine.
But, if it does, and the analyzers can detect it (they can for quite a wide range of scenarios), Visual studio warns you about it for some very good reasons.
For one thing, it results in the actual code emitted by the compiler being a lot different than what you wrote. It'll be a method that mostly looks similar to the delegate you made, but it will belong to a compiler-generated class and might not have a signature that looks quite like what you wrote. And the class will contain fields for each of the captured variables and will usually be constructed ONCE and reused. That, by itself, is nasty, and is a key piece of the bad behavior that can result. If you've ever seen that
<>c__DisplayClass0_0
type stuff in a stack trace, that's that mechanism in action.Another potentially problematic issue is that a reference captured by a closure could outlive the variable it refers to, with all sorts of possible bad effects - usually NullReferenceExceptions or ObjectDisposedExceptions, but potentially even access to a completely different instance of the object. Especially with things like user-initiated events either raised by actual event handling code or via the Command pattern (both of which are just invoking supplied delegates at a later time), this can crop up in weird places at unknown later times, leading to anything from memory leaks (common, at least temporarily) to unexpected behavior to program crashes or data corruption, any of which can be heisenbugs.
The big and really non-obvious result of the way it all works is that, especially if your closure performs any kind of late evaluation or late binding2, even if references aren't borked or it's a value type, what actually happens may not be what you expect, due to how and when things are evaluated. It's not restricted to LINQ, though, and the consequences of it may be difficult to realize without actually experiencing the resulting behavior.
What can happen?
Well, take this simple code that just creates some action delegates in a quick for loop:
What's the output? Looks like it should be 0 to 4, right? After all, we created all those delegates in the loop and explicitly gave them the then-current value of aVariableToCapture, right? And that's a value type, so it's immutable and there's no issue because we therefore got a copy, right?
Well... Of course I wouldn't be writing this if any of that were true.
The actual output will be:
Wait... What? Why are they all the same? And, furthermore, why are they all 5? Our loop didn't even go to 5.
Well... That's why visual studio and resharper were giving you the warning about a captured variable being modified in an outer scope.
Neither those lambdas nor anything at all inside them were executed or evaluated in-line. That becomes more obvious when you realize the output didn't happen til the second loop.
Behind the scenes, each of those got stuck into an instance of a compiler-generated class that has an int field and a method with the body as written3. Here's what it looks like, before becoming IL:
Condensing that all down, what we get is five delegates that are all just the same method on the same instance of the same compiler-generated class. So, clearly invoking any one of them is going to do the same thing as any of the others.
Ok. So why is the output 5?
That was a post-increment operation in the loop. The final evaluation of the for loop increments the variable but then fails the check, so it does not execute. But the variable is now 5 (which is why the check failed).
So, when we call the methods, we get 5 for each one.
Ok, but who does that?
Actually, it's all over the place in a lot of code, including in this project. I'm guilty of it sometimes, too, though I do usually at least check to make sure what I just wrote won't be susceptible to the problem (because you certainly can avoid the issue so long as you don't depend on captured values at any later point). The analyzer isn't always capable of telling when your use of a captured variable is ok, so it takes the safe option of just warning you any time you're using a captured variable that gets modified outside the scope of the closure.
It doesn't have to be in a loop, either. That's just an easy way to force the problem for illustration purposes. You can encounter the issue in linear code, as well. In fact, just unroll the loop in the above code, and you'll still get exactly the same behavior, so long as you're feeding your closures the variable.
How do I know if my code is susceptible to it?
Well, the safest thing is to assume that, if you see the warning, you have the problem. If you copy the value into a temporary variable inside the closure, the problem will go away. See this code, which has a 1-and-a-partial-line difference from the original, but outputs the expected values:
This correctly outputs:
And, if we look at the resulting low-level C# code, it's clear why that is:
Note that now we are getting individual instances of the generated class, each constructed inside the loop, with a copy of the loop variable at that iteration.
This is a fix that VS and ReSharper are capable of making for you, though you'll typically want to rename the new local variable it creates right before your closure.
If you fully understand the way your code will behave in all scenarios with all legal inputs, and you are certain you can ignore the problem, then it may be acceptable to ignore it for performance purposes. Otherwise, copying the value is a good thing to do.
But what if the captured variable is a reference type?
Then things can get even more hairy and hard to predict, trace, or debug. Obviously, you can't simply copy a reference type4 by declaring a new reference to it. That's just a new name for the same reference. Typically, it's ideal to do your best to not pass references into closures via capture if it is at all avoidable, unless you can be 100% certain those references will not only still be valid and not disposed, but that dereferencing them when the delegate actually executes will result in operating on the instance you thought it was, and that the values of the members of that instance will be what you thought when you created the delegate.
The key is to remember that, in all cases, any non-constant or non-local value (meaning it came from a different/higher scope5) or reference given to a closure will be evaluated at the time the closure itself is executed, meaning when the event is raised or the action delegate is invoked.
It's critical for this behavior to be correct in the library, because consuming code depends on deterministic behavior, and things can get even more out of control if a consumer's code makes that kind of mistake on top of that mistake being made in the Terminal.Gui code they are interacting with. All bets are off if that situation ever arises. While bad consumer code isn't our fault, making it worse because we didn't handle closures correctly is on us.
This is a common source of heisenbugs in some projects that don't deal with this issue. And they're difficult to reproduce unless someone who hits it is willing to share specific-enough code to hit the behavior. They're also often not the easiest things to write unit tests for, since you have to already know what the code is vulnerable to and then intentionally do something that looks right but is actually wrong, which explodes rapidly in complexity as the complexity of the method under test increases. Big LINQ chains, in particular, can get REALLY nasty, here, because you can have multiple levels of capture happening in your method call chain or nested method calls. And then materializing them introduces weird sync points that further obfuscate the source of weird behavior.
TL;DR, please?
If visual studio or any tool warns you about a captured variable, and you are not 100% sure of how to handle it properly, do the following:
Other notes
Remember that this is a library project for use in other software. We don't get to make assumptions about threading/concurrency, so we should always assume a multi-threaded situation and that there may be multiple concurrent or reentrant executions of the same code path, on one to infinite threads, either foreground or background.
This issue, in particular, is a NASTY one when it interacts with thread safety issues or is the root cause of them, because now things can happen in a completely unpredictable order.
A multi-threaded re-implementation of the first code sample here, for example, could output anything from the already wrong output to things that don't even look like integers, and the order of that garbage is not even predictable. It's even possible for it to occasionally output the correct text.
If something is written that we are aware is not thread-safe, it's a good idea for that to be mentioned in the XmlDoc comments, in a dedicated
<para></para>
block inside the remarks section, with a heading in bold like<b>Thread Safety:</b>
followed by the relevant notes.Alrighty. That's enough for now. Just wanted to point this quirk of the language and runtime out, since I recently had to explain it to a former colleague and then, coincidentally, came across a reddit thread about it earlier tonight, as well.
Footnotes
This one is highly situational. It's fine in a bunch of cases and not fine in a bunch of other cases. Be careful and, if VS warns you, pay attention and respect the warning. ↩
Which is often the case with LINQ methods, and is always the case with events and "Command" dispatch, but appears all over the place in common code and is easily overlooked. ↩
This is how ALL lambdas and other anonymous delegates and anonymous types work - it always results in a compiler-generated class. ↩
record
types come with implicit compiler-generated copy constructors accessible via thewith
operator, but they are by-value shallow copies (values of references are copied, not the instances they refer to, and value types are copied because they already are values), so keep that in mind when using that. ↩Remember that the scope of the contents of the parentheses of some control flow statement are the same as that of the keyword that began that statement and not of the following block (such as
if
,case
, and other conditionals), and some are scoped to the statement and its children, such as most cases offor
andforeach
loop variables. ↩Beta Was this translation helpful? Give feedback.
All reactions