-
Notifications
You must be signed in to change notification settings - Fork 698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix a logic error that caused AbandonedMutexException while executing migrations #4859
Conversation
test/NuGet.Core.Tests/NuGet.Common.Test/MigrationRunnerTests.cs
Outdated
Show resolved
Hide resolved
bool signal = mutex.WaitOne(TimeSpan.FromMinutes(1), false); | ||
if (signal && !File.Exists(expectedMigrationFilename)) | ||
bool captured; | ||
try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why the try catch is not wrapping the complete method below.
Is there a particular reason for that?
imo, it'd be cleaner that way, but maybe you had a reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a commit introducing this change to address https://github.com/NuGet/NuGet.Client/pull/4859/files/1f4771d337b4e4eba1d7360715fa883a93234f37#r1000018933 comment. Happy to get your feedback and adjust functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my suggestion is
try
{
if (!File.Exists(expectedMigrationFilename))
{
// Only run migrations that have not already been run
int highestMigrationRun = GetHighestMigrationRun(migrationsDirectory);
for (int i = highestMigrationRun + 1; i < Migrations.Count; i++)
{
try
{
Migrations[i]();
// Create file for every migration run, so that if an older version of NuGet is run, it doesn't try to run
// migrations again.
string migrationFile = Path.Combine(migrationsDirectory, (i + 1).ToString(CultureInfo.InvariantCulture));
File.WriteAllText(migrationFile, string.Empty);
}
catch { }
}
}
mutex.ReleaseMutex();
}
catch (AbandonedMutexException ex)
{
ex.Mutex?.ReleaseMutex();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zivkan raised a good question in #4859 (comment) comment, saying that My understanding of the docs is that AbandonedMutexException means that the current process successfully obtained the mutex, just that the other process failed to release it cleanly. If the other process (or thread?) was killed, then migrations might not have completed, so the current thread should probably ensure that all migrations completed successfully.
AbandonedMutexException
is only raised by Mutex.WaitOne
method. If that method invocation is not in the try block, then it may not fix this actual issue we are trying to address in this PR. Hence, I modified the location of catch block in the new commit so that migrations can run one more time incase if the previous thread didn't run successfully.
Please let me know if I missed anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more clear to readers if you extract the try-catch into a method named WaitForMutex
. Though a comment on the catch AbandonedMutexException
to explain that it obtained the mutex, despite the exception, would be very helpful too since it's really unintuitive to anyone who isn't familiar with this already.
} | ||
catch (AbandonedMutexException ex) | ||
{ | ||
ex.Mutex?.ReleaseMutex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't release the mutex until after the migration is complete, otherwise there can be two migrations running at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really just lean towards a try/catch/finally
where we release the mutex in the finally.
Wouldn't that solve the other logical error too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Andy's feedback is still valid in the latest iteration.
This is a logical error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that I didn't address this comment yet. So far in my debugging experience "ex.Mutex" was always null when the exception was thrown. Docs also state that this property will be null if the abandoned mutex can't be found. https://learn.microsoft.com/en-us/dotnet/api/system.threading.abandonedmutexexception.mutex?view=net-7.0 docs states that "If the exception is thrown on a call to WaitOne or WaitAll, this property always returns null." I think "ex.Mutex" will always be null in our case.
I'm leaning removing "ex.Mutex?.ReleaseMutex();" method call and return true so that migrations can run. If another thread throws mutex abandoned exception there is a chance of running migration logic one more time only when previous thread did not create migration file yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the try block, we use mutex.WaitOne
, so yes, it looks like we're in the "this property always returns null" scenario.
However, even if that wasn't true, if used WaitAny
and it returns a non-null mutex, we just waited on the "NuGet-Migrations" mutex, and AbandonedMutex
is telling us that we successfully got the exclusive access to "NuGet-Mutex", but the reason we captured it is because the previous holder didn't release it gracefully. Now we have two references to the "NuGet-Migrations" mutex, mutex
and ex.Mutex
. I don't know if they're the same instance or different instances, but they're representing the same win32 mutex. Therefore, releasing it immediately on capturing exclusive access to the mutex, rather than waiting until after the work that should be only done by one process/thread at a time, is still a bug.
Afterall, WaitHandle.WaitAny
is similar to Task.WhenAny
, in that it needs to return to you which handle/task in an array was completed, so the app can take action depending on which one is complete. The complication here is that WaitAny
can't return a value when the AbandonedMutexException
is thrown, so the "return value" (which mutex, from a list) is via the exception property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the doc more closely, I think we don't need ex.Mutex?.ReleaseMutex()
because as explained above it will be always null when Mutex.WaitOne
throws AbandonedMutexException
. The example in the docs also confirm that https://learn.microsoft.com/en-us/dotnet/api/system.threading.abandonedmutexexception?view=net-7.0#examples.
I copied the trimmed version of example from docs below for reference. The following points can be noticed in the below example.
ex.Mutex?.ReleaseMutex()
is not invoked in thecatch
block forAbandonedMutexException
.ReleaseMutex();
is invoked in the finally block. The comment saysWhether or not the exception was thrown, the current thread owns the mutex, and must release it.
- In our case (I mean the changes proposed in this PR), we only invoke
ReleaseMutex();
method whenWaitForMutex()
method returnstrue
. https://learn.microsoft.com/en-us/dotnet/api/system.threading.mutex.releasemutex?view=net-7.0#remarks also states thatIf the attempt to get ownership of the mutex fails (for example, when a call to the WaitOne method with a millisecondsTimeout or a timeout parameter returns false because the request times out), the thread shouldn't call ReleaseMutex
.
public class Example
{
private static Mutex _orphan1 = new Mutex();
[MTAThread]
public static void Main()
{
// Start a thread that takes the mutex, and then
// ends without releasing them.
Thread t = new Thread(new ThreadStart(AbandonMutex));
t.Start();
// Make sure the thread is finished.
t.Join();
// Wait on one of the abandoned mutexes. The WaitOne returns
// immediately, because its wait condition is satisfied by
// the abandoned mutex, but on return it throws
// AbandonedMutexException.
try
{
_orphan1.WaitOne();
Console.WriteLine("WaitOne succeeded.");
}
catch(AbandonedMutexException ex)
{
Console.WriteLine("Exception on return from WaitOne." +
"\r\n\tMessage: {0}", ex.Message);
}
finally
{
// Whether or not the exception was thrown, the current
// thread owns the mutex, and must release it.
//
_orphan1.ReleaseMutex();
}
}
[MTAThread]
public static void AbandonMutex()
{
_orphan1.WaitOne();
// Abandon the mutexes by exiting without releasing them.
Console.WriteLine("Thread exits without releasing the mutexes.");
}
}
bool signal = mutex.WaitOne(TimeSpan.FromMinutes(1), false); | ||
if (signal && !File.Exists(expectedMigrationFilename)) | ||
bool captured; | ||
try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more clear to readers if you extract the try-catch into a method named WaitForMutex
. Though a comment on the catch AbandonedMutexException
to explain that it obtained the mutex, despite the exception, would be very helpful too since it's really unintuitive to anyone who isn't familiar with this already.
f8ed6e5
to
43e303d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Thanks for fixing this!
Bug
Fixes: NuGet/Home#12159
Regression? Last working version:
Description
In 0966444, we added logic to fix permissions on non-Windows platforms which is executed only once per machine. Customers have reported a regression after October 2022 servicing release dotnet/sdk#28488. In this PR, I propose changes to fix the logical error that caused
AbandonedMutexException
when multiple migrations code is invoked in parallel.PR Checklist
PR has a meaningful title
PR has a linked issue.
Described changes
Tests
Documentation