A Guide to Canary Testing: How It Works, Benefits, and Importance

Table of Contents

What’s Canary Testing?

Canary testing is a deployment strategy where a new release is rolled out to a small subset of users before a full launch. Similar to miners using canaries to detect danger, this method helps identify bugs, performance issues, or risks early without affecting all users. Typically, about 5% of traffic is routed to the canary version, while the rest continues using the stable one. If the canary performs well, the rollout expands gradually; if not, it’s rolled back quickly.

Canary Testing

Why Canary?

In the coal mining days, miners used canaries to detect dangerous gases. If the canary stopped singing, it was a sign that something was wrong. In our tech world, our "canaries" are the early adopters who help us sniff out any issues before a full-scale release.

Developers create automated tests for their software’s new features and modifications. The changes are deployed to a testing environment where others can explore and interact with the new features. If everything goes smoothly, the new software update is rolled out to the production environment, allowing end users to benefit from the newly added feature.

However, given the nature of software, bugs tend to move into production. As humans, it is impossible to anticipate every potential edge case. Moreover, deadlines and budget constraints add to the pressure.

Setting Up Your Canary Test

For canary testing, two approaches are mainly implemented to achieve reliable outcomes. We’ll be using GoLang for our examples because, well, Go is awesome. Here are those two approaches:

Feature Flags

Start by incorporating feature flags into your code. These nifty toggles allow you to enable or disable certain features at runtime. Perfect for controlling who gets to see your new changes.

Canary Testing

With the feature flag, you can limit the release to 5% of the users and monitor the key metrics. This approach is handy for business stakeholders who need to test new features before implementing them for everyone. However, while performing a canary test, if any issue is detected during the deployment method, you can easily disable the new features by turning the feature off.

go
package main

import (
    "fmt"
    "feature" // Assuming a feature package for checking feature status
)

func main() {
    if feature.IsFeatureEnabled("new-feature") {
        // User Input and Processing
        var userInput string
        fmt.Println("Enter your name:")
        _, err := fmt.Scanln(&userInput) // Read user input
        if err != nil {
            fmt.Println("Error reading input:", err)
        } else {
            fmt.Println("Hello,", userInput + "!") // Process input
        }
    } else {
        // Code for the old behaviour
        fmt.Println("Hello, User!")
    }
}

In this code, the IsFeatureEnabled function is expected to perform the actual check for the feature’s status. Depending on the result, the program executes either the code intended for the new feature or the code representing the old behaviour.

Time to deploy! Let’s push our changes but only to a small fraction of our users. In Go, we can use something like this:

go
// deployment.go
package main

import "github.com/my/deployment/package"

func main() {
    // Deploy to 5% of users
    deployment.Rollout("new-feature", 5)
}

Blue-Green Deployment

Blue-green deployment is a software release strategy that minimizes downtime and risk by running two identical production environments, often referred to as "blue" and "green."

Once the new version is deployed to the green environment, a series of comprehensive tests are conducted to ensure its functionality, performance, and integration meet expectations. This meticulous testing phase helps identify potential issues before exposing the new version to users. If the green environment proves stable and successful in the tests, traffic is seamlessly switched from the blue to the green environment, making the latter the new production environment.

To implement blue-green deployment successfully, organisations often leverage automated deployment tools and adhere to continuous integration practices to streamline the process and ensure a smooth transition between environments.

go
package main

import (
    "fmt"
    "net/http"
    "os"
)

var isBlue bool

func main() {
    isBlue = true
    http.HandleFunc("/", handler)
    http.ListenAndServe(":8080", nil)
}

func handler(w http.ResponseWriter, r *http.Request) {
    if isBlue {
        fmt.Fprint(w, "Blue Environment - Old Behaviour\n")
    } else {
        fmt.Fprint(w, "Green Environment - New Feature\n")
    }

    isBlue = !isBlue
}
func switchEnvironment() {
    isBlue = !isBlue
    fmt.Printf("Switched to %s environment\n", map[bool]string{true: "blue", false: "green"}[isBlue])
}

func init() {
    switchEnvironment()
}

The handler function serves different responses based on whether the environment is in the "blue" or "green" state. The switchEnvironment function allows you to switch between blue and green environments manually.

Monitoring Like a Hawk

Once your canaries are out there singing, you need to listen carefully. Monitoring is key. Keep a close eye on your system’s performance, error rates, and user feedback.

Logging and Metrics

Enhance your logging game. Log relevant information about the new feature and monitor metrics that matter. Go’s standard library makes this a breeze.

go
// logger.go
package main

import "log"

func LogFeatureUsage(userID string) {
    log.Printf("User %s is using the new feature", userID)
}

Error Tracking

Integrate error-tracking tools. Identify and squash those bugs before they spread.

Gradual Rollouts

Once the deployment of the new feature has been successful, we can enable the feature flag and all users can enjoy the new feature.

go
// deployment.go
package main

import "github.com/your/deployment/package"

func main() {
    // Increase to 20% of users
    deployment.Rollout("new-feature", 20)
}

We can then monitor the usage of the new feature with our logs or analytics dashboards to see if users are adopting the feature and how they’re using it. The release of the feature has been completely independent of the app’s deployment!

Rollbacks

Oops, something went wrong. No worries! Quickly roll back to the previous version using your deployment tool.

go
// deployment.go
package main

import "github.com/your/deployment/package"

func main() {
    // Rollback to the previous version
    deployment.Rollback("new-feature")
}

Monitoring and Observability Tools

Monitoring and observability are crucial components in overseeing canary releases and ensuring effective testing. These tools help teams gain insights into the performance, health, and behaviour of applications during canary deployments. Here are some notable monitoring and observability tools that can aid in this process:

Prometheus:

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It excels in collecting metrics, making it suitable for tracking the performance of canary releases. Prometheus is particularly valuable for its support of multi-dimensional data collection and querying.
Grafana:

Grafana is a popular open-source platform for monitoring and observability that works seamlessly with data sources like Prometheus. It provides a customizable and interactive dashboard, allowing teams to visualise key metrics, logs, and traces. Grafana is valuable for gaining insights into application behaviour during canary releases.
Datadog:

Datadog is a cloud-based monitoring and analytics platform that offers comprehensive observability solutions. It provides real-time monitoring, logs, and traces, enabling teams to correlate data and troubleshoot issues effectively. Datadog supports canary releases by offering end-to-end visibility into application performance.
New Relic:

New Relic is a cloud-based observability platform that provides insights into application performance, errors, and infrastructure. It offers APM (Application Performance Monitoring) capabilities and supports distributed tracing, making it valuable for monitoring canary releases and identifying any performance regressions.
Istio:

Istio is an open-source service mesh that enhances visibility and control over microservices-based applications. It includes features like traffic management, security, and observability. Istio’s observability tools, such as Prometheus integration and distributed tracing, are beneficial for monitoring canary releases in complex, microservices architectures.

When implementing canary releases, a combination of monitoring and observability tools ensures that teams have comprehensive insights into the impact of changes on application performance, allowing for effective testing and rapid identification of any issues that may arise during the release process.

How to do Canary Testing?

Canary testing isn’t complicated once you break it down into steps. You can think of it as a careful rollout where you start small, watch closely, and then move forward if things look good. Here’s how you can do it:

How to do Canary Testing

1. Plan the Rollout Strategy

Determine what percentage of traffic to direct to the canary release. A good starting point is 1–5% of users, which exposes the least amount of risk but still offers valuable insights.

2. Set Up Your Infrastructure

Containerized apps or microservices → Implement Kubernetes with Istio, Argo Rollouts, or Flagger to route traffic.
Legacy applications → Implement load balancers (NGINX, HAProxy, AWS ALB, etc.) for weighted routing.
Feature-flagged apps → Implement feature flags to switch new features on and off at runtime.

3. Deploy the Canary Version

Deploy your new build into a isolated small environment or set of servers. Make sure your deployment pipeline will be able to facilitate partial rollouts.

Example with Kubernetes & Istio traffic routing:

yaml
http:
- route:
  - destination:
      host: my-app
      subset: stable
    weight: 95
  - destination:
      host: my-app
      subset: canary
    weight: 5

Here, only 5% of traffic goes to the canary version.

4. Monitor Closely

Most important step. Monitor both technical metrics and business KPIs:

Error rates (HTTP 500s, crash loops, failed requests)
Latency & throughput
CPU and memory usage
User-facing KPIs (checkout success, signups, conversion rates)

Tools: Prometheus + Grafana, Datadog, New Relic, or Cloud provider dashboards.

5. Analyze and Decide

If everything looks good, slowly increase traffic like: 10% → 25% → 50% → 100%.
If issues show up, roll back quickly to the stable release.

6. Gradual Rollout & Rollback Automation

Rollouts and rollbacks become error-prone with human intervention. With tools such as Argo Rollouts, Spinnaker, or Flagger, you can specify success conditions (such as error tolerances) that will automatically roll back if necessary.

Example (pseudo-code):

go
deployment.Rollout("new-feature", 20) // increase to 20%
deployment.Rollback("new-feature")    // rollback if failure detected

Challenges and Solutions of Canary testing

Cons of Canary Testing

Canary testing presents challenges such as the risk of false positives or negatives, where issues in the canary group may not accurately mirror the broader user base, leading to an incomplete understanding of the impact of changes. Managing multiple environments demands meticulous coordination, adding operational overhead.

Striking a balance between the canary group’s size and its representativeness is crucial, and requires constant attention. Improper execution can yield skewed results, hindering accurate assessments of changes in the overall system.

Pros of Canary Testing

Canary testing minimizes software deployment risks by releasing changes to a limited user subset before wider adoption. This controlled exposure enables early identification and resolution of potential issues, reducing the impact on a larger audience and enhancing overall system stability.

Valuable insights into real-world performance are gained, informing informed decisions based on actual user interactions. Aligned with continuous delivery practices, canary testing cultivates an iterative and cautious approach, fostering a culture of continuous improvement and innovation within development teams.

Conclusion

The concept of canary testing is a strategic approach to introducing changes in a controlled manner by initially deploying them to a limited subset of users or systems.

The successful implementation of canary testing hinges on careful planning. A well-thought-out strategy is crucial to ensure a smooth rollout process, encompassing considerations for the deployment mechanism, monitoring protocols, and contingency plans in case issues arise during the limited release.

Additionally, effective communication is paramount. Transparent and clear communication among development, operations, and other involved teams establishes a collaborative environment where stakeholders are informed about the canary testing strategy, progress, and any necessary adjustments.

FAQ’s

How is canary testing different from A/B testing?

Canary testing involves deploying changes incrementally to a small subset of users or systems before rolling them out to the entire user base.

In contrast, A/B testing compares two or more versions of a feature or webpage to determine which one performs better based on predefined metrics, often involving larger user segments in a randomized manner to assess differences in user behaviour or preferences.

What are some successful examples of canary testing in real-world applications?

Google employs a sophisticated canary analysis system to gradually roll out changes across its vast infrastructure, allowing for early detection of issues before they affect a significant portion of users. By carefully monitoring key metrics such as latency, error rates, and user engagement, Google can automatically scale back or halt deployments if abnormalities are detected during the canary phase.

How do you set up a canary deployment pipeline?

To set up a canary deployment pipeline, first, define criteria for the canary release, such as metrics for performance and stability. Next, configure deployment tools like Spinnaker or Kubernetes to automate the deployment process. Integrate monitoring tools like Prometheus and Grafana to collect and analyze metrics during the canary release, enabling informed decisions about proceeding or rolling back changes.

Author

Keploy Team

Keploy is developer-centric API testing tool that creates tests along with built-in-mocks, faster than unit tests. Keploy not only records API calls, but also records database calls and replays them during testing, making it easy to use, powerful, and extensible.

Post Views: 348