Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates required to MP Telemetry docs #7655

Open
NottyCode opened this issue Oct 30, 2024 · 18 comments
Open

Updates required to MP Telemetry docs #7655

NottyCode opened this issue Oct 30, 2024 · 18 comments
Assignees
Milestone

Comments

@NottyCode
Copy link
Member

I have a lot of feedback on the MP Telemetry docs which are here and here.

Feedback on Enable observability with MicroProfile Telemetry doc

  1. The title of the doc isn't in keeping with the remainder of the titles in the docs. The title is a task title, but Open Liberty docs are not task oriented. This makes it look very strange. I would suggest the title simply be Open Telemetry.
  2. The doc should probably be lower down in the nav, I would put it below logs and metrics, although @donbourne might have a good perspective here.
  3. The first sentence of the first section reads

To enable MicroProfile Telemetry in your Open Liberty runtime, you must add the MicroProfile Telemetry feature to your server.xml file and enable the OpenTelemetry SDK.

This is true of any Liberty feature, so doesn't need to be said here. The last part about needing to enable the Open Telemetry SDK is true, but the following sentence repeats it:

OpenTelemetry is disabled by default.

This is also confusing to me because all function is disabled by default because by default no feature is enabled. What it is trying to say is even configuring the feature isn't enough, you also have to set a property but this isn't clear.

  1. I think the runtime and application level configuration needs to be simplified significantly. I think perhaps it would have been better as a table.
  2. The following paragraph doesn't provide any context as to what it is intending to convey:

For more information about configuration sources and their ordinal values, see Default configuration sources.

  1. The first sentence here should give an example or at least the feature name.

Enable the MicroProfile Telemetry feature in your server.xml file.

  1. I dislike the lack of progressive disclosure implied by the following:

To export metrics or logs, you must enable mpTelemetry-2.0 or later.
This is below the fold and there hasn't been any mention until here that there are different MP Telemetry features or what the difference in behaviour might be. This should not be introduced here. There should have been something earlier summarizing the differences between the various capabilities between MP Telemetry 1.0, 1.1 and 2.0.

  1. The following sentence doesn't tell the user how to set this property. We are expecting them to know:

Set the otel.sdk.disabled=false property at the runtime level or the application level.

  1. This is superfluous. We have just told them to set this to false in this step. We do not need to write a sentence that caveats the behaviour like this:

At runtime initialization, if the otel.sdk.disabled property is set to false, the runtime-level OpenTelemetry SDK instance is created.
Also the OpenTelemetry SDK is an internal detail, we likely don't need to talk about it as if it is something they care about.

  1. Concepts should be introduced prior to examples using them. In this case we introduce an example for otel.service.name before telling them what that means.
  2. This sentence doesn't actually tell me what the purpose of otel.service.name is

The otel.service.name property creates a name for any telemetry that this OpenTelemetry SDK instance collects and emits.
It might make sense if I'm deep in the weeks of Open Telemetry, but I am probably not if I'm here.

  1. If we are going to give examples of multiple ways to do something they should be presented as such, similar to how this is done for the feature doc. The following is confusing and doesn't tell someone why they would care to do bootstrap.properties vs server.env. To be honest we should document the one we expect and ignore the other options.

Alternatively, you can set runtime-level configuration properties by using environment variables in your server.env file. For any property definition, make the key name uppercase and convert any punctuation to underscores. For example, the otel.sdk.disabled=false property is equivalent to the OTEL_SDK_DISABLED=false environment variable.

  1. I do not understand why I would do this section. It says you can, but doesn't explain why you would. There is an earlier bit about runtime vs application but it seems totally disconnected to this:

To enable the OpenTelemetry SDK at the application level, set the otel.sdk.disabled=false property in the microprofile-config.properties file of each application.

  1. There is no explanation or justification for why someone would want to mix application and runtime. The example doesn't make sense to me given the description which states that runtime takes precedence. The application enables Otel, but that seems to just be application level. So I'd expect maybe this would allow me to disable Otel for an application, but the example doesn't show that. I think this should be dropped unless we have a clear explanation of what the point is.

Optionally, use a combination of application-level and runtime-level configuration.

  1. In the sections on Logging, Tracing and Metrics they all start with this:

Enable the MicroProfile Telemetry feature and specify a MicroProfile Config property or environment variable to enable the OpenTelemetry SDK.
This is a pointless step since it is covered by the intro section and just links up. It doesn't even usefully link to where the information on how to configure it exists. Further metrics and logs require mpTelementry-2.0 but at the point it would be useful to explain this because it has the context (and by that I mean the logs and metrics section) it isn't mentioned.

  1. The setting otel.exporter.otlp.endpoint=http://localhost:4317/ is mentioned only in the section on trace and yet it is relevant to all, it should have been in the main section on enabling Telemetry. There is no doc on how the collector and exporter concept works either in this page, so we have a setting for a concept we haven't explained.
  2. These sections are all written as a series of steps, but it doesn't make sense. There is an attempt to document some different scenarios, but by trying to shoehorn them into steps it makes it less comprehensible. An example is the section where it talks about zipkin, but without context of why you would do that:

Alternatively, to use a Zipkin server, you might add configuration similar to the following example to your bootstrap.properties file:

  1. There is an optional step that configures to output logs to console rather than OTLP. This isn't an optional step, it undermines the whole point of the first two steps. This shouldn't be an optional step and it needs to explain why you would want to do this.

For example, if you want to export traces to Open Liberty log files, set the following property:

  1. This isn't a step, this is a scary last step where you leave the user dangling. I'm pretty sure they get something and this is about customizing but the way it is expressed it sounds like there is an unknown bit of extra work. The linked doc is also to dev work which an ops person isn't going to know what to do with.

Depending on how you choose to instrument your application code for tracing, further configuration might be required.

  1. The trouble shooting section should be a separate page like the security one.
  2. There is a reference to converting properties to env vars which likely isn't required because that is just how MicroProfile config works, but if it isn't a table showing the properties and env var alternatives along with a description would be better than telling people how to do a conversion.

I could write more, but I think this is enough. I would recommend a complete rewrite.

Feedback on MP Open Telemetry Feature Example

  1. There is only one example on this page and it doesn't tell you to set the otel.sdk.disabled property to true.
@dmuelle
Copy link
Member

dmuelle commented Nov 1, 2024

Additional @NottyCode feedback from #7657:

This page has been written as a task and Liberty docs are not tasks.

It is also wrong for a dev topic to link to an ops topic for important context.


response from @lauracowen
To clarify, I think Alasdair's point is not that we don't document how-to information at all in the Liberty docs, but that we don't make everything into a WebSphere Liberty-style task topic when it's not necessary. In particular in this instance, no one cares about preparing their development environment for Telemetry; that's not a something someone would have in their heads to do. What they care about is adding observability to their applications. Enabling the telemetry feature is a minor step in that, not a whole task in itself.

The reason that we care about this is that the approach of adding many tasks in the Liberty IBM Docs has led to many unnecessary topics, bloating the entire set of docs and making it harder to find the information you actually need. That's not to say that the OL docs are perfect by any means, but laying the Liberty IBM Docs format on them like this is not going to help.

It mostly comes down to understanding the target audience and ensuring that the information that user audience needs, and only the information that they need, is provided, as concisely as possible.


I general, I don't think it's necessarily bad to link to topics for aimed at other users as long as that information is relevant to the user you're currently writing for (and it's clear what info you're expecting them to read there).

In this particular case, though, I agree with Alasdair because the developer needs to understand things like application vs runtime-level configuration in the context of the app they're developing and what to choose etc. The current approach of just linking to the OPERATIONS topic is dumping and running - how is the developer meant to decide what to set? The "preparing dev env" topic is fairly uninformative and doesn't need to exist, but the info the developer actually needs isn't available to them. That info should be provided in a topic in the DEVELOPMENT section couched in language that makes sense to the developer in an intro/overview topic called something like "Observability in microservices" in the DEVELOPMENT section.

@dmuelle
Copy link
Member

dmuelle commented Nov 1, 2024

additional @NottyCode feedback from #7658

I do not think there is value to this page. I think this would be of more value if it were in the Ops oriented page here

@donbourne
Copy link
Member

@dmuelle , the page you're commenting about is something that is only useful to developers, since it's their job to use java.util.logging (or something that routes to JUL) to have their logs included with the mpTelemetry logs. The operations team can't control that.

@dmuelle
Copy link
Member

dmuelle commented Nov 4, 2024

@donbourne - sorry I should have attributed that quote properly to @NottyCode . I am reviewing these comments from Alasdair and @lauracowen to come up a with a rewrite.

In the case of this particular comment, it the logs info could be included in a revised version of the Development> Observability overview that consolidates conceptual information about developing with OpenTelemetry on Open Liberty.

I think Alasdiar and Laura's feedback was generally that the doc is not well served by having a bunch of smaller task topics that mix config and conceptual info. Instead, we could consolidate to topics for Ops and Dev that explain OpenTelemetry as a single solution for logs, trace, and metrics, and point to the feature page for configuration examples. This is more in keeping with the overall content strategy of the open liberty docs. In any case, I will share and discuss any edits before making any public-facing changes. Hoping to contain this work in the 24.0.0.12 release

@donbourne
Copy link
Member

donbourne commented Nov 4, 2024

some comments on @NottyCode 's original list of comments:

Feedback on Enable observability with MicroProfile Telemetry doc:

  1. sounds good
  2. I don't have a strong opinion... could go after Metrics topic
  3. sounds good
    4/5/6/8/9/12/13/14/15. perhaps this topic could be simplified by having 2 sections - one for people with a single app running on Liberty, another for people with multiple apps. Then we could just be prescriptive about what the config should look like. Have it address logging/metrics/tracing all in one, perhaps with comments to show which attributes apply to each signal.
  4. describing earlier versions of mpTelemetry might make this a more confusing topic. would it make sense to cover the other versions in a different topic?
  5. sounds good
  6. sounds good
  7. we could start this doc by giving an intro to a few OTel concepts, and explain that the doc will show how to connect Liberty to an OpenTelemetry Collector (after we explain what an otel collector is).
  8. sounds good
    18/21. we could include a section on debugging, where we could show how to use the console exporter
  9. somewhere in the doc we need to explain that any of the otel properties defined at https://opentelemetry.io/docs/languages/java/configuration/ can be used
  10. sounds good

dmuelle added a commit that referenced this issue Nov 6, 2024
dmuelle added a commit that referenced this issue Nov 6, 2024
dmuelle added a commit that referenced this issue Nov 8, 2024
@dmuelle dmuelle mentioned this issue Nov 8, 2024
dmuelle added a commit that referenced this issue Nov 8, 2024
@dmuelle dmuelle mentioned this issue Nov 8, 2024
dmuelle added a commit that referenced this issue Nov 8, 2024
@dmuelle dmuelle mentioned this issue Nov 8, 2024
dmuelle added a commit that referenced this issue Nov 11, 2024
@dmuelle dmuelle mentioned this issue Nov 11, 2024
dmuelle added a commit that referenced this issue Nov 15, 2024
@dmuelle dmuelle mentioned this issue Nov 15, 2024
dmuelle added a commit that referenced this issue Nov 15, 2024
@dmuelle dmuelle mentioned this issue Nov 15, 2024
dmuelle added a commit that referenced this issue Nov 18, 2024
dmuelle added a commit that referenced this issue Nov 19, 2024
@dmuelle dmuelle mentioned this issue Nov 19, 2024
@dmuelle
Copy link
Member

dmuelle commented Nov 19, 2024

Hi @donbourne @NottyCode @lauracowen - I've revised the following pages per your comments:

Please let me know if further edits are needed. So far, I've just worked on the Ops topics. Once this is sorted, I'll work on the dev topics to follow a similar pattern to what we agree upon here w/r/t OpenTelemetry. Thanks again for your feedback.

@donbourne
Copy link
Member

@dmuelle , I've just looked at the first doc so far. Here are my comments:

https://docs-draft-openlibertyio.mqj6zf7jocq.us-south.codeengine.appdomain.cloud/docs/latest/microprofile-telemetry.html

adding th following
->
adding the following

when describing the combination of runtime-level and application-level configuration you should point out that the otel.sdk.disabled property must be set at the application level (otherwise mpTelemetry-2.0 will start in single application mode).

OpenTelemtry
->
OpenTelemetry

When you use the otlp default log exporter, the OpenTelemetry Batch LogRecord Processor (BLRP) is enabled and log records are exported in batches according to BLRP default settings. You can can adjust these settings with otel.blrp.* properties. For more information about the available properties and their default settings, see see MicroProfile Config properties: MicroProfile Telemetry.

  • this is tricky as people using the "single application mode" should not use MP Config properties -- they must use properties visible to the runtime. I see that using MicroProfile Config properties is mentioned in multiple places.

OpenTelemetry collects both JVM runtime-environment metrics and HTTP metrics.

  • this makes it sound like those are the only metrics we collect (whereas we collect all of the ones shown in the MP Telemetry metrics reference list)

dmuelle added a commit that referenced this issue Nov 19, 2024
@dmuelle
Copy link
Member

dmuelle commented Nov 19, 2024

Thanks Don- I've addressed these in the updated draft. For MP Config properties, I've changed this to "system properties" and just mention mp-config in the context of application-level enablement. I also made some edits to the config properties reference page to try and clarify that these are OpenTelemetry properties that are available when you enable MP Telemetry, rather than mp config properties that you specify in mp config properties files.

MicroProfile Telemetry: OpenTelemetry properties

We also might consider documenting these on a separate page instead of with the MP Config properties. Mp Telemetry is different from ohter mp specs in that it's just exposing another non-MP API so I'm trying to reorient the doc to focus on OpenTelemetry and realy only mention MP Telemetry in the context of enabling the OL feature.

@lauracowen
Copy link
Member

OpenTelemetry topic

  • I feel like the title of the "OpenTelemetry" topic should be more explanatory, eg "Log, metrics, and traces management with OpenTelemetry" (or something shorter but that's the best I could come up with; maybe could remove "management" from it to make it shorter?).
  • I like how there are references to OpenTelemetry 2.0 and why in each of the existing Logs and Metrics topics.
  • I don't think the <feature manager...<feature... etc snippet should be given here - that should just be a short instruction to add it to the server.xml then link to the feature config page for the actual XML.
  • Should this be ordered list rather than bullet points?
  • I feel like the instruction to set the ....disabled=false property is repeated many times in this topic.

Troubleshooting topic

  • I like the troubleshooting info being in a separate topic.

feature config topic

  • Is the "third-party APIs" bit really a development environment thing? Isn't it just something that's in the app/server's config? Maybe in the previous "Enable..." section?
  • Is there some overlap between the "Enable..." section and the "Prepare your dev env" section? Could these be one section?
  • Maven snippet - I think this would be relevant to developers doing anything with this feature, wouldn't it? Is it possible (in our current hacky way) to insert it after the "Enabling this feature" section before the "Examples" section, titled "Maven coordinates" (or whatever the correct Maven terminology would be)?

dmuelle added a commit that referenced this issue Nov 21, 2024
@dmuelle
Copy link
Member

dmuelle commented Nov 21, 2024

Thanks @lauracowen -

I feel like the title of the "OpenTelemetry" topic should be more explanatory

Agreed- maybe "Collect logs, metrics, and traces with OpenTelemetry"? or "Unified observability with OpenTelemetry"? Still thinking

I don't think the <feature manager...<feature... etc snippet should be given here

Deleted these

Should this be ordered list rather than bullet points?

yes

I feel like the instruction to set the ....disabled=false property is repeated many times in this topic.

Agreed- I took it out where I could

For the "Preparing your development environment..." both these are things you only need to do if ypou want to customize your app for for OpenTelemetry- ie define custom metrics in your app code or manually instrument traces. If you just want to turn it on and collect data without making any changes to your app code, you dont need this stuff.

So it seems like it might be more confusing in two separate sections, and we dont want it to get mixed up with the basic prereq of enabling the feature and the system property. It's just about exposing the API and annotations for your app code, if you want to do custom stuff. Maybe a different title would help? "Customize your application telemetry with the OpenTelemetry API" ?

@lauracowen
Copy link
Member

Yes, maybe a separate section with all the custom stuff in it? And in there make it clear that you don't have to do this for most cases (and give examples or the full scope of where you can just use the default stuff without any custom stuff). That way, they can get up and running as quickly and easily as possible without having to contemplate custom stuff. (I think the word "custom" in the title will help put people off rushing to do it too.)

dmuelle added a commit that referenced this issue Nov 22, 2024
@dmuelle
Copy link
Member

dmuelle commented Nov 25, 2024

Ok, I tried to add more context to make it clear that you only need this config if you want to use the OT API to customize your metrics or trace. I also moved the example down the page so it wouldn't be easily confused with basic enablement:

Customize your application telemetry with the OpenTelemetry API

dmuelle added a commit that referenced this issue Nov 26, 2024
@dmuelle dmuelle mentioned this issue Nov 26, 2024
@lauracowen
Copy link
Member

There's a random + before the code snippet.

@dmuelle
Copy link
Member

dmuelle commented Jan 15, 2025

The following topics are updated per the feedback compiled in this issue:

Topics in the Development>Observability section will be updated under #7659

@lauracowen
Copy link
Member

Would it make sense in here to modify:

When your Open Liberty runtime hosts only one application, you can enable

to

When your Open Liberty runtime hosts only one application (typical in container configurations), you can enable

(I'm not sure if the word "configuration" is correct, but basically that's what I mean.)
Could do similar for the multiple apps version of that sentence (eg "not typical in container configurations" or similar - check the technical accuracy of this tho).

I can't remember exactly how the article was before but I like the separation of single and multiple instructions - they seem clear to me.

Otherwise, it all looks good to me. Thanks, @dmuelle

@dmuelle
Copy link
Member

dmuelle commented Jan 16, 2025

Maybe:

When your Open Liberty runtime hosts only one application, as is typical in containerized environments, you can enable OpenTelemetry at the runtime level to collect and export logs, metrics, and traces for both the runtime and the application.

When your Liberty runtime hosts multiple applications, which is typically associated with bare metal or virtual machine environments, you might choose to enable OpenTelemetry at the application level to collect logs, metrics, and traces for each application.

or could sub "non-containerized" for "bare metal or virtual machine" as I've see that term used

dmuelle added a commit that referenced this issue Jan 16, 2025
@dmuelle dmuelle mentioned this issue Jan 16, 2025
@lauracowen
Copy link
Member

I like that. You could just lose "bare metal or". I think we used to generally talk about virtual machines being the alternative to containers (I don't know how much we used the term in the docs but that was how we talked about it when planning the docs, I think?)

@dmuelle
Copy link
Member

dmuelle commented Jan 16, 2025

Done, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants