Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenTelemetry event tracing to connections #3102

Merged

Conversation

SeanFarrow
Copy link
Contributor

This PR add event tracing to IConnectionLifetime.

@jeremydmiller I have a few questions:
How do we want to test the EventConnectionLifetime and MartenTracing classes?
When do we want to update the documentation?
Do we want to wait to merge this until all the other OTel stuff is done?

Thanks,
Sean.

@jeremydmiller
Copy link
Member

Honestly, just smoke tests. We'll have to rig up Prometheus or Signoz or something in a test harness to really test it manually.

The decorator around the connection lifetime, does that even need all the if/then checks? Shouldn't it be, only apply the decorator if that option is selected in the first place? You want to absolutely minimize the runtime steps because everything is in the hot path there.

For event metrics, I assumed we'd use an IDocumentSessionListener anyway

@SeanFarrow
Copy link
Contributor Author

SeanFarrow commented Apr 2, 2024 via email

@SeanFarrow SeanFarrow force-pushed the feature/sf/2909-open-telemetry branch from 9abe946 to 72e9668 Compare April 2, 2024 08:25
Comment on lines 9 to 10
public static string MartenTenantId = "MartenTentantId";
public static string MartenCorelationId = "MartenCorelationId";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be const are static readonly?
When they should be setable, then make them a property.

"Marten",
typeof(MartenTracing).Assembly.GetName().Version!.ToString());

public static Activity? StartConnectionActivity(Activity? parentActivity =null, IEnumerable<KeyValuePair<string, object?>>? tags =null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
public static Activity? StartConnectionActivity(Activity? parentActivity =null, IEnumerable<KeyValuePair<string, object?>>? tags =null)
public static Activity? StartConnectionActivity(Activity? parentActivity =null, IEnumerable<KeyValuePair<string, object?>>? tags = null)

return StartActivity("connection", parentActivity, tags);
}

public static Activity StartActivity(string spanName, Activity? parentActivity =null, IEnumerable<KeyValuePair<string, object?>>? tags =null, ActivityKind activityKind =ActivityKind.Internal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static Activity StartActivity(string spanName, Activity? parentActivity =null, IEnumerable<KeyValuePair<string, object?>>? tags =null, ActivityKind activityKind =ActivityKind.Internal)
public static Activity StartActivity(string spanName, Activity? parentActivity = null, IEnumerable<KeyValuePair<string, object?>>? tags = null, ActivityKind activityKind = ActivityKind.Internal)


public static Activity StartActivity(string spanName, Activity? parentActivity =null, IEnumerable<KeyValuePair<string, object?>>? tags =null, ActivityKind activityKind =ActivityKind.Internal)
{
return ActivitySource.CreateActivity(spanName, activityKind, parentActivity?.ParentId, tags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

Suggested change
return ActivitySource.CreateActivity(spanName, activityKind, parentActivity?.ParentId, tags);
return ActivitySource.StartActivity(spanName, activityKind, parentActivity?.ParentId, tags);

?

Otherwise an explicit Start needs to be done on the activity.

Or this method should be called CreateActivity too, else it's very misleading.

Comment on lines 16 to 21
private const string MartenCommandExecutionStarted = "MartenCommandExecutionStarted";
private const string MartenCommandExecutionFailed = "MartenCommandExecutionFailed";
private const string MartenBatchExecutionStarted = "MartenBatchExecutionStarted";
private const string MartenBatchExecutionFailed = "MartenBatchExecutionFailed";
private const string MartenBatchPagesExecutionStarted = "MartenBatchPagesExecutionStarted";
private const string MartenBatchPagesExecutionFailed = "MartenBatchPagesExecutionFailed";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According OTel semantics these should have different names which we should follow.

Suggested change
private const string MartenCommandExecutionStarted = "MartenCommandExecutionStarted";
private const string MartenCommandExecutionFailed = "MartenCommandExecutionFailed";
private const string MartenBatchExecutionStarted = "MartenBatchExecutionStarted";
private const string MartenBatchExecutionFailed = "MartenBatchExecutionFailed";
private const string MartenBatchPagesExecutionStarted = "MartenBatchPagesExecutionStarted";
private const string MartenBatchPagesExecutionFailed = "MartenBatchPagesExecutionFailed";
private const string MartenCommandExecutionStarted = "marten.command.execution.started";
private const string MartenCommandExecutionFailed = "marten.command.execution.failed";
private const string MartenBatchExecutionStarted = "marten.batch.execution.started";
private const string MartenBatchExecutionFailed = "marten.batch.execution.failed";
private const string MartenBatchPagesExecutionStarted = "marten.batch.pages.execution.started";
private const string MartenBatchPagesExecutionFailed = "marten.batch.pages.execution.failed";

}
catch (Exception e)
{
if (_openTelemetryOptions.TrackConnectionEvents)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same. I'll stop to comment on these, but there are some more below.

{
if (_openTelemetryOptions.TrackConnectionEvents)
{
_databaseActivity.AddEvent(new ActivityEvent("Database command execution started"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Event name should be a constant too, and have a name following OTel semantics (see first comment).

/// <summary>
/// Used to track OpenTelemetry events for a connection, for example when a command or data reader has been executed. This defaults to false.
/// </summary>
public bool TrackConnectionEvents { get; set; } = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public bool TrackConnectionEvents { get; set; } = false;
public bool TrackConnectionEvents { get; set; }

false is the default by C# language specs, so can be omitted.

@@ -93,6 +101,23 @@ internal IConnectionLifetime Initialize(DocumentStore store, CommandRunnerMode m
throw new DefaultTenantUsageDisabledException();
}

var innerConnectionLifetime =GetInnerConnectionLifetime(store, mode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var innerConnectionLifetime =GetInnerConnectionLifetime(store, mode);
var innerConnectionLifetime = GetInnerConnectionLifetime(store, mode);

@@ -93,6 +101,23 @@ internal IConnectionLifetime Initialize(DocumentStore store, CommandRunnerMode m
throw new DefaultTenantUsageDisabledException();
}

var innerConnectionLifetime =GetInnerConnectionLifetime(store, mode);

var corelationId = Activity.Current is not null ? Activity.Current.RootId : Guid.NewGuid().ToString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there's no activity that is recored (Current = null), is it OK to just create a new id?
I don't know what's best here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand things, if the current activity is null, this means you are the first activity in the chain, so creating a new activity/id is the right thing to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the current activity is null, this means you are the first activity in the chain

Being the first one in the chain means that there's no parent, but Current is set an Activity.

When there's no listeneres interested in an activity or no listeners are registered, then ActivitySource.StartActivity will return null, hence Activity.Current will be null too.

So when there's no activity I don't know if just creating a new (random) Id is correct.
Maybe in that case it should just be empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Activity would be null if the app isn't actually emitting Otel with a configured exporter. Sorry, just what @gfoidl already said

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremydmiller
So should we set the current activity if it is null? or not start an activity at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's common for the Activity to be null, so just expect it. If you look in the Wolverine code, you'll see that every call to Activity is Activity?.AddTag() or similar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code needed at all? The activity will internally check for a current activity and grab the correlation ID if required.

@jeremydmiller
Copy link
Member

@SeanFarrow One combination of the LInq tests will fail periodically on CI, but always succeed on a subsequent run. I've never been able to reproduce it locally.


public Task<int> ExecuteAsync(NpgsqlCommand command, CancellationToken token = new CancellationToken())
{
if (_openTelemetryOptions.TrackConnectionEvents)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this checked here if it's already checked when initializing this object?

var innerConnectionLifetime =GetInnerConnectionLifetime(store, mode);

var corelationId = Activity.Current is not null ? Activity.Current.RootId : Guid.NewGuid().ToString();
var tags = new Dictionary<string, object>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer TagsList when dealing with OTEL tags. This should also be lazy initialized only if EventTracingConnectionLifetime is being used.

Comment on lines 113 to 116
return OpenTelemetryOptions.TrackConnectionEvents
? new EventTracingConnectionLifetime(innerConnectionLifetime, OpenTelemetryOptions,
StartConnectionActivity(Activity.Current, tags))
: innerConnectionLifetime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we only allocate the event tracing lifetime if there's any active listeners:

Suggested change
return OpenTelemetryOptions.TrackConnectionEvents
? new EventTracingConnectionLifetime(innerConnectionLifetime, OpenTelemetryOptions,
StartConnectionActivity(Activity.Current, tags))
: innerConnectionLifetime;
return (OpenTelemetryOptions.TrackConnectionEvents && ActivitySource.HasListeners())
? new EventTracingConnectionLifetime(innerConnectionLifetime, OpenTelemetryOptions,
StartConnectionActivity(Activity.Current, tags))
: innerConnectionLifetime;

@SeanFarrow
Copy link
Contributor Author

OK, I think I've fixed everything, can people please check they are happy. I'll then look at tests.
@jeremydmiller Do you have any opposition to using Testcontainers in a separate project to run prometheus?

@SeanFarrow
Copy link
Contributor Author

@jeremydmiller I'm just adding tests. Things gone a bit mad at work as I've just switched teams.
I'll push what I've got later today. It's up to you whether you pick this last bit up.

I'll start the event/other stuff at the end of next week when things calm down.
Thanks,
Sean.

@SeanFarrow
Copy link
Contributor Author

SeanFarrow commented Apr 17, 2024 via email

…thod nullable in the MartenTracing class and remove unneeded using declarations.
…ing OpenTelemetry when the appropriate option is configured and we have listeners.
@jeremydmiller
Copy link
Member

@SeanFarrow You'd done a ton man, I'll take you up on some docs, but I'm good to pull this in sooner rather than later. I can easily deal w/ whatever CI hiccups are happening.

@SeanFarrow
Copy link
Contributor Author

@jeremydmiller I'm done and happy.
Please check you are happy with the test for the EventTracingConnectionLifetime class.
Other than that it's just docs and the CI issues.
I should be able to help with the event document listener in a couple of weeks if you don't get to it first.

Thanks.
Sean

@jeremydmiller
Copy link
Member

@jeremydmiller I'm done and happy. Please check you are happy with the test for the EventTracingConnectionLifetime class. Other than that it's just docs and the CI issues. I should be able to help with the event document listener in a couple of weeks if you don't get to it first.

Thanks. Sean

Cool, and thanks Sean! I'll try to pull this in in the next couple days. Might be until next week before 7.9 gets out with this though.

@jeremydmiller jeremydmiller merged commit 05692ce into JasperFx:master Apr 29, 2024
7 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants