Have you ever used the Azure Storage Account Queues and realized that you lack visibility into the number of messages in your individual queues? This can be particularly concerning when you have things like poison queues set up—you definitely want to know if something's in them, right? 🕵🏻♀️
Of course, you could explore different tools like Service Bus, but making a switch like that isn’t always feasible.
Not to worry—I’ve got a solution for you! 😉
This repository contains a sample that demonstrates how to use a small service based on Azure Functions’ Timer Trigger functionality to create a monitoring solution. ⚡️
Note
What if your architecture already is using a multi-instance Azure Functions? Still not a problem! This approach can live as its own service or within an existing Azure function instance. The Timer Trigger will handle concurrency issues, ensuring that only one instance of your solution will emit the metrics, taking away the complexity for you to have to implement a distributed lock for a simple monitoring solution. 🔒
- ⚙️ Configurable Queue Connections: Monitor a set of queues using either Managed Identity or the Storage Account Key.
- 🏗️ Scalability: Integrate seamlessly into solutions that already have a multi-instance Azure Function setup.
- 📜 Infrastructure as Code: Full Terraform setup for the case where you only want the monitoring service up and running.
- 🚨 Alerting: Illustrates how you can set up alerts based on the newly introduced metrics.
The design of this solution is based on the fact that we can query the queue length using the Azure Storage Account SDK, even though the resource itself does not publish this information as a metric.
In short, we will do the following:
- Create a gauge for each queue 📏
- Periodically query each queue for its length 🔄
- Emit the new length value as a metric measurement to your telemetry sink 📊
Now, regarding the periodic situation, we have multiple options:
One option, if you have a pre-existing single-instance application, is to run the second and third steps above in a background task that will execute periodically or use an asynchronous gauge.
However, in most cases, we deal with services that have multiple instances (for resilience and other reasons... yeah, I know how inconvenient 🙄), which makes this a tad bit trickier. You could have each instance emit metrics, but that would produce a 💩-ton of telemetry data. Alternatively, you could implement a full-blown distributed lock for a simple metric.
Don't get me wrong—I’m the first to emphasize how important telemetry is—but making sure your lock is foolproof might give you more headaches than necessary, especially when there are already solutions out there to help you!
✨ And this is where Azure Functions and its Timer Trigger come into play! ✨
Azure Functions are very lightweight, making them a small yet powerful addition to your architecture. If you're already using Azure Functions, even better! Regardless of how many instances of the same function you have running, as long as they all use the same underlying storage account, Azure Functions will ensure that only one of those instances executes the timer trigger. So it's handling the distributed lock for you! No headache, no worries! 😱
Now let's see what the described design looks like visually. The diagram below shows what was explained above if Application Insights is chosen as a telemetry data sink.
Given the sample is using OpenTelemetry as the instrumentation standard, you will also have no issues emitting your telemetry data into a different sink that is not Application Insights (e.g., OTEL Collector, Prometheus, etc.) by simply using a different exporter.
Finally, let's head to how to use the sample in the sections below!
- Prerequisites 📋
- Running Locally 🖥️
- Deploy 🚀
- Application Insights 📊 (either created manually or using the Terraform scripts)
- Python 3.11 🐍
- Azurite 🚀
- Azure Function Core Tools ⚒️
- Terraform (Optional) 🌍
Important
This sample leverages Azure Application Insights as a sink for your telemetry data as part of the auto-instrumentation with OpenTelemetry Distro. If you need the telemetry data to be sent elsewhere, don't worry! Simply replace the auto-instrumentation part with your own, and add one of the many sinks that OpenTelemetry provides for you 😉
To get you started, there is a handy Makefile
in the root of this repository! Use its setup
command, and it will get you going in no time. 🚀
It will do the following things:
- Create a virtual environment
.venv
- Activate said
.venv
- Install all dependencies using
poetry
- Create the Azurite demo queues called
demo-queue-1
anddemo-queue-2
with the script inscripts/azurite_setup.py
To do that, just run the following command from the root of the repository:
Important
Make sure you have started Azurite up before you do that!
make setup
So now you should be all set to start running the monitor! ✅
Let's now configure the monitor correctly before we run it. To do that, copy the local.settings.sample.json
in the src/
directory to local.settings.json
.
Replace the placeholders with your Application Insights connection string. The monitored queue list is already pre-configured to use the demo-queue-1
and demo-queue-2
created in the previous step. Unless you want to monitor other queues as well, you can keep that as is for now.
Next, let's get this thing up and running in the next section! ⏩
Here again, the handy Makefile
can help us.
By running the command below from the root, the function will start up.
make start
Under the hood, this will navigate into the
src/
folder and runfunc host start --port 7071 --verbose
.
After a short while, you should be able to see custom metrics in your Application Insights, like in the image below:
There is now a metric table, but it looks rather empty and lame. To actually see something there, we need to place messages into the queues.
For that, there is a sample script scripts/send_message.py
, which will publish a few messages in intervals between 1-10 min into our demo queues. 📨
By running another make
command, we can execute said script.
make send-message
Now, after waiting a while, the message count should be reflected in Application Insights.
How would this whole thing look deployed, you wonder? In that case, let's look at the next section, which is provisioning the infra and deploying the monitor. 🔍
If you want to see how the solution will look when deployed, the sample includes the necessary Terraform scripts in the infra
folder.
What does the infra contain:
- Application Insights 📊
- Azure Container Registry 📦
- Storage Account with queues that will be monitored (
<prefix>-poison-queue
and<prefix>-queue
) 🗄️ - Azure Function with Monitor Service deployed using a Docker container ⚡️
- Managed Identity role assignment for the Azure Function to access the queues it’s monitoring 🔐
- Alerts for the
<prefix>-poison-queue-length
metric 🚨
So, let's get started! 🌟
Navigate to the infra
folder and initialize Terraform. Then, plan it with the commands below:
terraform init
terraform plan
This will show you the changes that Terraform will make, in this case, the resources it will create. If the plan looks good, run the following command to apply the changes:
terraform apply
This might take a moment, so feel free to grab a coffee ☕ while you wait. When you return, your environment should be provisioned!
[!INFORMATION] Given that we don’t have a Docker image in the ACR yet, the function will show an error if you visit the resource. Not to worry, as we will resolve this in the next step!
Next, we’ll build the image of our Monitoring Service and push it.
To deploy the monitor in this sample, we chose to package the service in a Docker image and push it to the ACR we created in the previous step. There is a Dockerfile
under src/
which can be used to build and push the image. To make things easier and reduce configurable variables, the image name in the infra is hardcoded to storage-queue-monitor
, and the ACR will be called qmsampleacr
with the default values. Hence, there is already a make command to handle the build and push step.
If you have changed the Terraform variable var.prefix
or the image name, feel free to adjust the make command or do it manually. Otherwise, use the command below:
make build-push
If you restart the Azure Function, you should have the monitor ready to run. The metrics should start showing up sooner or later as well.
Also does the service have two helpful endpoints so you can check on its heath and configuration:
/api/health/
: A classic health endpoint/api/get_config
: This endpoint will return the configured queues that are monitored
Caution
/api/get_config
should be disabled in production as it might contain a storage account key!!
Hope that was helpful! With that, enjoy monitoring 👩🏻💻