Cloud Canaries

Cloud Canaries in the Enterprise Cloud

Written by Cloud Canaries | Jun 6, 2024 7:52:37 PM

Cloud Canaries in the Enterprise Cloud

In many cases, the most effective way to detect reliability, monitoring, observability, and SLA compliance issues in your enterprise cloud environments is also the simplest: using canary applications, which offer a lightweight and nimble means of testing workloads and predicting risk.

However, managing a multitude of canaries can be a lot of work – if you try to do it manually. Individual canary apps are easy to deploy, but it quickly becomes impractical to keep track of all your canaries – let alone make sense of the data they generate – if you have dozens or hundreds of canaries operating across your environment.

We built the Aviary Platform to solve this challenge and make it easy for every enterprise to take advantage of canary applications. As we explain in this article, the Platform allows enterprises to deploy and manage canaries at scale while also keeping track of workload performance and compliance insights – no matter how many canaries you have to manage or how many workloads or environments you are supporting.

What Are Canaries?

Canaries are lightweight applications that allow engineers to test workloads and predict risk. By deploying canary applications into the same cloud environments that host production apps, teams can detect problems (such as networking issues or API failures) that are likely to impact production applications. They can also anticipate patterns (like slowdowns in performance as the volume of application requests increases) and take action to address them before they turn into failures.

Thus, you can think of canary applications as a type of early-warning system – akin to canaries in a coal mine – for your cloud environments. If your canaries fail or experience performance degradations, you know that trouble is brewing for your production apps, too – and you can take action to address it before you experience an actual production application failure or SLA compliance violation.

It's important to note that canary apps are different from canary deployments. A canary deployment is a type of application release strategy that involves pushing out a new version of an application to some users before others. In contrast, a canary app is a lightweight app that teams can use to track workload health and compliance inside production environments. Canary apps and canary deployments are both useful ways of reducing risk, but they work in distinct ways and serve different goals.

 

The Role of the Aviary Platform

While the job of canaries is to simplify workload reliability and compliance testing, the job of the Aviary Platform is to simplify the management of canaries while also maximizing the value of the insights they provide.

The Platform does this in several ways:

Creating Canaries

Each canary serves a unique purpose, but the code within canaries may overlap significantly. Rather than writing each canary from scratch, developers and DevOps engineers can benefit from a streamlined approach that allows them to draw on templates to build canaries.

For this purpose, the Aviary Platform allows users to generate Canary Template Libraries from OpenAI 3.0 schemas. They can then use the templates to create new canaries or modify the based on their needs. Templates make it possible not just to set up canaries quickly, but also to deploy canaries of the same type for different environment settings.

To work with templates, access the "My Schema Page" from the Admin dropdown menu. There, you can upload an API schema, as well as generate and download a new Canary Template Library. Aviary can also generate RESTful Python libraries for more advanced canary development.

To create a new canary using a template, first select the appropriate library in the Platform:

Then, configure the canary as needed:

With this approach, you can build canaries customized for your needs in minutes rather than spending hours coding each one from scratch.

On top of this, you can easily share libraries between development and DevOps teams or acquire third-party templates using the Cloud Canaries Store:

All the above means that you can spend more time generating insights from canaries – and much less time developing the canaries.

Canary Lifecycle Management

The Aviary Platform streamlines the management of canaries at all stages of their lifecycle. Using a dashboard that gives you centralized access to all of your canaries; you can easily monitor the status of each canary as well as start or stop canaries as needed:

Note, too, that the Platform allows you to deploy canaries as standalone executables or as deployments on Kubernetes clusters. No matter where you need your canaries to run, the Platform gives you centralized control over all aspects of their lifecycle, including creation, configuration, scheduling, deployment, communication, and data collection.

Data Analysis

Canaries are of little value unless they deliver actionable insights that help you detect and understand problems with your apps or virtual environment. To that end, the Aviary Platform enables centralized collection and analysis of data produced by canaries. The deep learning neural network allows recommendations and guide autonomous actions when DevOps teams are unable to respond.

For example, you can track the compliance status of individual canaries through a single pane of glass:

You can also dive deep into the performance of individual canaries and gain access to time-series metrics that show how performance and compliance outcomes have varied over time:

Going further, you can gain a holistic view of the health of a complete workload by tracking all canaries related to it:

As you can see in the screenshot above, the health compliance dashboard makes it easy to detect which aspects of this workload are at risk – specifically, HTTP response rates, which are falling short of the SLA requirement that were configured for the canary that monitors HTTP responses in the example.

These insights allow teams to home in quickly on the source of failures while also keeping track of the health of multiple workloads running across any cloud environment.

All the compliance and health data generated by the Platform is easy to export to external tools, in case you want to store or analyze it elsewhere. You can also use the Platform's customer-facing APIs to share health status data and notifications (which we'll discuss in more detail in just a moment) with external tools.

Alarms and Notifications

With the Aviary Platform, you don't have to sit in front of your dashboards all day to detect a problem. You can easily configure notifications to fire in channels of your choosing – including within Aviary itself or within external tools like Slack and Jira.

This means that no matter where your teams work, they'll know immediately when a health or compliance check falls short of the thresholds you configure. They can then act before a critical failure occurs.

Performance and Compliance Forecasting

Being able to track compliance status and generate alerts when something goes wrong is great if your main goal is to react to problems. But in a perfect world, you'd do more than just react – you'd be proactive by anticipating and reacting to problems even before your canaries start to fail.

The Aviary Platform enables proactive performance and compliance management through forecasting. Using a deep learning neural network, the Platform can identify failure patterns in advance – from months to minutes before an issue occurs.

5 Day look ahead Composite Forecast:

Daily forecasts by Service:

Individual Canary forecasts:

Forecasting is critical not just because it gives your team advanced warning, but also because you can pair AI-based forecasting models with automated remediations to enable autonomous resolution of problems. Thus, even if your DevOps teams can't respond in time, your automated remediation systems may be able to correct issues for them.

Compliance Reporting

Finally, being able to prove that you meet compliance requirements is important in enterprise contexts. With the Aviary Platform, you can generate granular compliance reports easily to demonstrate that you met SLAs, while also showing where any compliance problems occurred:

Conclusion: A Universal Approach to Canary-Based Performance Management

To make the most of canary applications, enterprises need a platform that allows them to manage canaries and leverage the insights that canaries generate across any cloud, any environment, and any workload. In other words, they need a universal platform for working with canaries.

That's exactly what the Aviary Platform does. No matter where you need to deploy canaries or what you need them to do – whether in a Kubernetes cluster or on a VM, and whether you're running some simple health checks or tracking a complex series of SLA metrics – Aviary makes it simple to build, configure, operate, and analyze canaries at scale. Meanwhile, the automations and predictive AI-based forecasting features built into the platform make it easy for overstretched DevOps teams to run as efficiently as possible.

So, say goodbye to tedious workload monitoring. Say hello to scalable, automated, canary-based health and compliance management through the Aviary Platform.