Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating stats jobs to use new Azure SDK #10323

Open
wants to merge 39 commits into
base: dev
Choose a base branch
from
Open

Conversation

Lanaparezanin
Copy link
Contributor

@Lanaparezanin Lanaparezanin commented Jan 21, 2025

I updated the stats jobs (Stats.CollectAzureChinaCDNLogs and Stats.CollectAzureChinaCDNLogs) to use the new Azure Storage SDK.

This is part of the work for stats jobs migration: https://github.com/orgs/NuGet/projects/21/views/1?filterQuery=milestone%3A%22Sprint+2025-01%22+assignee%3A%40me&pane=issue&itemId=64259916&issue=NuGet%7CEngineering%7C5445

@Lanaparezanin Lanaparezanin requested a review from a team as a code owner January 29, 2025 18:59
//start a task that will renew the lease until the token is cancelled or the Release methods was invoked
var renewStatusTask = new Task( (lockresult) =>
var leaseResult = await _blobLeaseService.TryAcquireAsync(blob.Name, TimeSpan.FromSeconds(MaxRenewPeriodInSeconds), CancellationToken.None);
if (!leaseResult.IsSuccess)
Copy link
Contributor

@erdembayar erdembayar Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please invert the order. Please check the success first then it's easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the success before what?

/// <returns>An <see cref="AzureBlobLockResult"/> indicating the result of the lease acquisition.
/// If the lease is successfully acquired, the result will contain the lease ID and a cancellation token
/// source that can be used to stop the lease renewal task.</returns>
public async Task<AzureBlobLockResult> AcquireLease(BlobClient blob)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used outside of stats job world? Can we rename this to AcquireLeaseAsync without breaking anything? If not, ignore this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't, seems like this is the only reference:
image

return new AsyncOperationResult(true, null);
}

await resultStream.FlushAsync();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the previous implementation had a second explicit check for blob existence before flushing/commit the data so that if another process wrote out the file while this one was processing, the action was cancelled. New one, later one wins. Do we know if this slight change has any tangible impact on what we copy?

cancellationToken: token);
bool deleteResult = await sourceBlob.DeleteIfExistsAsync(DeleteSnapshotsOption.IncludeSnapshots,
conditions: blobRequestConditions,
cancellationToken: CancellationToken.None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably pass through the cancellation token passed in here (unless there is a specific reason we are passing through None).

@erdembayar
Copy link
Contributor

Nit: Using more explicit type makes code more readable.

else throw new ArgumentException(nameof(containerName));
}

_blobLeaseService = new BlobLeaseService(blobServiceClient, containerName, basePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces a potential issue: AzureBlobLeaseManager can be constructed with a BlobServiceClient configured for a certain storage account with a certain container and base path, and then AcquireLease can be called passing a blob that might reside in a different storage account, or different container or not use the base path passed into AzureBlobLeaseManager constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 3 added arguments (blobServiceClient, containerName, basePath) seem to exist only to be passed into a BlobLeaseService constructor. I suggest to change AzureBlobLeaseManager constructor to accept a IBlobLeaseService object instead and update the job setup accordingly.

Copy link
Contributor

@agr agr Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, we need to sort out the blob validation. One way of doing that could be to add a TryAcquireAsync overload in BlobLeaseService that would accept a blob URL instead of a resourceName, validate its values against the object configuration then infer resourceName from URL and call the original implementation. Then this new method (passing in full blob URL) can be used in AzureBlobLeaseManager.AcquireLease.

@@ -49,7 +48,7 @@ public async Task<BlobLeaseResult> TryAcquireAsync(string resourceName, TimeSpan

try
{
return await TryAcquireAsync(blob, leaseTime, cancellationToken);
return await TryAcquireAsync(blob, leaseTime, cancellationToken); ;
}
catch (RequestFailedException ex) when (ex.Status == (int)HttpStatusCode.NotFound)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a way to configure the object not to try to create blobs when they were not found. I suggest adding a boolean flag to BlobLeaseService's (defaulting to true for backwards compatibility) constructor to indicate if it is allowed to create blobs when they are missing and check it here. Then make sure that for this job it is not allowed to create blobs.


int sleepBeforeRenewInSeconds = MaxRenewPeriodInSeconds - OverlapRenewPeriodInSeconds < 0 ? MaxRenewPeriodInSeconds : MaxRenewPeriodInSeconds - OverlapRenewPeriodInSeconds;
if (!blobLockResult.BlobOperationToken.IsCancellationRequested)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we had this check in the original code. Technically, it is checking for a different thing than the while loop below, but I think the check in while loop would cover all cases this check would test. Any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting a task might be an expensive operation, so it's beneficial to check before entering the task to improve performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no starting tasks between line 66 and 68. They literally are run one after another.

blobLockResult.LeaseId);
}
catch (StorageException exception)
await Task.Delay(TimeSpan.FromSeconds(MaxRenewPeriodInSeconds - OverlapRenewPeriodInSeconds), lockResult.BlobOperationToken.Token);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not pass cancellation token into Task.Delay. It will throw if cancelled while waiting (unlike Thread.Sleep in the original code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants