Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix a logic error that caused AbandonedMutexException while executing migrations #4859

Merged
merged 11 commits into from
Oct 24, 2022

Conversation

kartheekp-ms
Copy link
Contributor

Bug

Fixes: NuGet/Home#12159

Regression? Last working version:

Description

In 0966444, we added logic to fix permissions on non-Windows platforms which is executed only once per machine. Customers have reported a regression after October 2022 servicing release dotnet/sdk#28488. In this PR, I propose changes to fix the logical error that caused AbandonedMutexException when multiple migrations code is invoked in parallel.

PR Checklist

  • PR has a meaningful title

  • PR has a linked issue.

  • Described changes

  • Tests

    • Automated tests added
  • Documentation

    • N/A

@kartheekp-ms kartheekp-ms requested a review from a team as a code owner October 19, 2022 23:03
src/NuGet.Core/NuGet.Common/Migrations/Migration1.cs Outdated Show resolved Hide resolved
src/NuGet.Core/NuGet.Common/Migrations/Migration1.cs Outdated Show resolved Hide resolved
bool signal = mutex.WaitOne(TimeSpan.FromMinutes(1), false);
if (signal && !File.Exists(expectedMigrationFilename))
bool captured;
try
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why the try catch is not wrapping the complete method below.

Is there a particular reason for that?
imo, it'd be cleaner that way, but maybe you had a reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a commit introducing this change to address https://github.com/NuGet/NuGet.Client/pull/4859/files/1f4771d337b4e4eba1d7360715fa883a93234f37#r1000018933 comment. Happy to get your feedback and adjust functionality.

Copy link
Member

@nkolev92 nkolev92 Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my suggestion is

                    try
                    {
                        if (!File.Exists(expectedMigrationFilename))
                        {
                            // Only run migrations that have not already been run
                            int highestMigrationRun = GetHighestMigrationRun(migrationsDirectory);
                            for (int i = highestMigrationRun + 1; i < Migrations.Count; i++)
                            {
                                try
                                {
                                    Migrations[i]();
                                    // Create file for every migration run, so that if an older version of NuGet is run, it doesn't try to run
                                    // migrations again.
                                    string migrationFile = Path.Combine(migrationsDirectory, (i + 1).ToString(CultureInfo.InvariantCulture));
                                    File.WriteAllText(migrationFile, string.Empty);
                                }
                                catch { }
                            }
                        }
                        mutex.ReleaseMutex();
                    }
                    catch (AbandonedMutexException ex)
                    {
                        ex.Mutex?.ReleaseMutex();
                    }

Copy link
Contributor Author

@kartheekp-ms kartheekp-ms Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zivkan raised a good question in #4859 (comment) comment, saying that My understanding of the docs is that AbandonedMutexException means that the current process successfully obtained the mutex, just that the other process failed to release it cleanly. If the other process (or thread?) was killed, then migrations might not have completed, so the current thread should probably ensure that all migrations completed successfully.

AbandonedMutexException is only raised by Mutex.WaitOne method. If that method invocation is not in the try block, then it may not fix this actual issue we are trying to address in this PR. Hence, I modified the location of catch block in the new commit so that migrations can run one more time incase if the previous thread didn't run successfully.

Please let me know if I missed anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more clear to readers if you extract the try-catch into a method named WaitForMutex. Though a comment on the catch AbandonedMutexException to explain that it obtained the mutex, despite the exception, would be very helpful too since it's really unintuitive to anyone who isn't familiar with this already.

}
catch (AbandonedMutexException ex)
{
ex.Mutex?.ReleaseMutex();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't release the mutex until after the migration is complete, otherwise there can be two migrations running at the same time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really just lean towards a try/catch/finally where we release the mutex in the finally.
Wouldn't that solve the other logical error too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Andy's feedback is still valid in the latest iteration.

This is a logical error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that I didn't address this comment yet. So far in my debugging experience "ex.Mutex" was always null when the exception was thrown. Docs also state that this property will be null if the abandoned mutex can't be found. https://learn.microsoft.com/en-us/dotnet/api/system.threading.abandonedmutexexception.mutex?view=net-7.0 docs states that "If the exception is thrown on a call to WaitOne or WaitAll, this property always returns null." I think "ex.Mutex" will always be null in our case.

I'm leaning removing "ex.Mutex?.ReleaseMutex();" method call and return true so that migrations can run. If another thread throws mutex abandoned exception there is a chance of running migration logic one more time only when previous thread did not create migration file yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the try block, we use mutex.WaitOne, so yes, it looks like we're in the "this property always returns null" scenario.

However, even if that wasn't true, if used WaitAny and it returns a non-null mutex, we just waited on the "NuGet-Migrations" mutex, and AbandonedMutex is telling us that we successfully got the exclusive access to "NuGet-Mutex", but the reason we captured it is because the previous holder didn't release it gracefully. Now we have two references to the "NuGet-Migrations" mutex, mutex and ex.Mutex. I don't know if they're the same instance or different instances, but they're representing the same win32 mutex. Therefore, releasing it immediately on capturing exclusive access to the mutex, rather than waiting until after the work that should be only done by one process/thread at a time, is still a bug.

Afterall, WaitHandle.WaitAny is similar to Task.WhenAny, in that it needs to return to you which handle/task in an array was completed, so the app can take action depending on which one is complete. The complication here is that WaitAny can't return a value when the AbandonedMutexException is thrown, so the "return value" (which mutex, from a list) is via the exception property.

Copy link
Contributor Author

@kartheekp-ms kartheekp-ms Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the doc more closely, I think we don't need ex.Mutex?.ReleaseMutex() because as explained above it will be always null when Mutex.WaitOne throws AbandonedMutexException. The example in the docs also confirm that https://learn.microsoft.com/en-us/dotnet/api/system.threading.abandonedmutexexception?view=net-7.0#examples.

I copied the trimmed version of example from docs below for reference. The following points can be noticed in the below example.

  • ex.Mutex?.ReleaseMutex() is not invoked in the catch block for AbandonedMutexException.
  • ReleaseMutex(); is invoked in the finally block. The comment says Whether or not the exception was thrown, the current thread owns the mutex, and must release it.
  • In our case (I mean the changes proposed in this PR), we only invoke ReleaseMutex(); method when WaitForMutex() method returns true. https://learn.microsoft.com/en-us/dotnet/api/system.threading.mutex.releasemutex?view=net-7.0#remarks also states that If the attempt to get ownership of the mutex fails (for example, when a call to the WaitOne method with a millisecondsTimeout or a timeout parameter returns false because the request times out), the thread shouldn't call ReleaseMutex.
public class Example
{
    private static Mutex _orphan1 = new Mutex();

    [MTAThread]
    public static void Main()
    {
        // Start a thread that takes the mutex, and then
        // ends without releasing them.        
        Thread t = new Thread(new ThreadStart(AbandonMutex));
        t.Start();
        // Make sure the thread is finished.
        t.Join();

        // Wait on one of the abandoned mutexes. The WaitOne returns
        // immediately, because its wait condition is satisfied by
        // the abandoned mutex, but on return it throws
        // AbandonedMutexException.
        try
        {
            _orphan1.WaitOne();
            Console.WriteLine("WaitOne succeeded.");
        }
        catch(AbandonedMutexException ex)
        {
            Console.WriteLine("Exception on return from WaitOne." +
                "\r\n\tMessage: {0}", ex.Message);
        }
        finally
        {
            // Whether or not the exception was thrown, the current
            // thread owns the mutex, and must release it.
            //
            _orphan1.ReleaseMutex();
        }
    }

    [MTAThread]
    public static void AbandonMutex()
    {
        _orphan1.WaitOne();
        // Abandon the mutexes by exiting without releasing them.
        Console.WriteLine("Thread exits without releasing the mutexes.");
    }
}

bool signal = mutex.WaitOne(TimeSpan.FromMinutes(1), false);
if (signal && !File.Exists(expectedMigrationFilename))
bool captured;
try
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more clear to readers if you extract the try-catch into a method named WaitForMutex. Though a comment on the catch AbandonedMutexException to explain that it obtained the mutex, despite the exception, would be very helpful too since it's really unintuitive to anyone who isn't familiar with this already.

@kartheekp-ms kartheekp-ms force-pushed the dev-kartheekp-ms-oct22-patch branch from f8ed6e5 to 43e303d Compare October 21, 2022 23:55
zivkan
zivkan previously approved these changes Oct 24, 2022
Copy link
Member

@zivkan zivkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks for fixing this!

@kartheekp-ms kartheekp-ms merged commit 41c08c5 into dev Oct 24, 2022
@kartheekp-ms kartheekp-ms deleted the dev-kartheekp-ms-oct22-patch branch October 24, 2022 18:18
@kartheekp-ms kartheekp-ms mentioned this pull request Oct 25, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: MSB4242: SDK Resolver Failure due to System.Threading.AbandonedMutexException
3 participants