Novel Deployment Strategy? Active Blue/Green Andy Waller, February 21, 2024June 19, 2024 Active Blue/Green is a type of Canary Blue/Green deployment strategy where we ALWAYS have two running deployments of a service. In the case of a service that can take several minutes to spin up a new deployment of that service, Active Blue/Green provides a means to fall back rapidly to a previous version of the deployment without having to wait for that service to be redeployed. Active Blue/Green deployments have the following traits: Two deployments of the service are running at all times (Blue and Green) Production traffic is directed to the ‘Active‘ deployment (either Blue or Green) Preview and testing traffic is directed to the ‘Inactive’ deployment Switching to a new version of the service just swaps which deployment is the ‘Active’ deployment. Fallback to the previous version doesn’t require a redeployment of that version, but can simply be a swapping of the ‘Active‘ deployment (rapid fallback, post-validation, being the requirement that this strategy is meant to address) The diagram below shows the deployment process as we move from v1 to v2 of a service: Step 1: The service is deployed as version v1, on the Green deployment, which is the active deployment. We then deploy a new version of the service, v2, to the inactive deployment (Blue, in this case). Step 2: As part of deploying the new version to the inactive service, we run automated conformance and smoke testing. The v2 service itself can be viewed from a special ‘preview’ url, that directs traffic to the inactive deployment. Step 3: The new version of the service is scaled up to production levels (if not there already), and load tested. Step 4: Switch the active deployment to Blue, diverting all production traffic from Green to Blue. Pending Time-Wait Period: A configurable period is set where we leave the inactive deployment scaled to production levels as a fail-safe means where we can swap the production traffic back to the previous version of the deployment with little or no wait time. Step 5: Inactive deployment (now Green) is scaled back to minimal resourcing (not zero) in preparation for deployment of the next version. This can happen automatically after the expiration of the ‘Pending Time-Wait’, or not at all*, depending on your organization’s resource management strategy. * Note: In the most sensitive of environments, the inactive deployment would be left allocated at production levels, so fallback could happen instantly at any time. Standard Blue/Green Canary Deployment Method: Flagger For the sake of comparison, below is a diagram of the standard Canary Blue/Green deployment methodology, as provided by the amazing Flagger project: Standard Blue/Green Deployment Traits: Blue is always considered the ‘Active’ deployment, where production traffic is directed. Green is always considered the ‘Canary’ deployment, where new versions are first pushed. New version is pushed to Green, which scales up when changes are detected Run Conformance and Load tests for the Canary (Green) pods Promote Canary over Blue for production traffic Update Blue with the new version of the service Promote Blue back to being the primary for production traffic Scale down the Canary (Green) The key here is that once the Canary has been promoted, and the new version is being rolled out to Blue, there is no longer a running deployment of the previous version. If it’s determined after this point that v2 has a problem that was not detected by the automated tests, rollback to v1 would be a full redeployment of the above process. Conclusion While it may not be entirely novel, the use of Active Blue/Green Canary deployments provides the ability to rapidly fall back to the previously deployed version of a service, at the cost of increased resource allocation within the Kubernetes cluster. In environments where the uptime SLAs are of critical importance, Active Blue/Green may be a great alternative to the standard Blue/Green Canary methodologies. Infrastructure