Issues

Umbraco Cloud Failover with Azure

We are believers in the Umbraco Cloud offering and use it for all of our public and internal websites (five Umbraco Cloud projects with about 20 sites). For our relatively small-scale usage of Umbraco Cloud, the service pays for itself in terms of time saved about four-times the actual cost. We think of Umbraco Cloud as our devops robot that handles patches, upgrades, keeps our codebase in sync across projects (hello Baselines!), and ensures we use a well-defined process for deployments - and it hosts our sites too.

Even with all we love about Umbraco Cloud, it is notably missing a few key features — namely geo-redundancy for the service (failover) and load-balancing for individual projects. We have seen an external solution for load-balancing sites using Azure services but, until recently, had not come across a solution for geo-redundancy - in this case for failover in the case of an Umbraco Cloud outage.

Spurred by a few days of intermittent outages on Umbraco Cloud in early 2019, we worked with our development partner ProWorks (an Umbraco Gold Partner) to create a failover environment for our sites on Umbraco Cloud. At the time it was not clear that the outages would be resolved at any certain time. On a positive note, the Umbraco Cloud outages did cease even before we were able to put our failover environment in place — within 24-hours of deciding to implement this plan. During the time our sites were not reliably available we realized the importance of these sites to our business — our lead generation flow dropped off dramatically, as did our revenue 😞.

We considered various options including moving the sites to a different host but, in the end, decided that creating a failover environment and continuing to use Umbraco Cloud was the best plan. Our requirements included; 1) no on-premise dependencies, 2) the failover environment always up-to-date with the live Umbraco Cloud environment, 3) run within Azure but from a different data-region than Umbraco Cloud (West Europe), 4) hands-off updates, and 5) auto-failover if Umbraco Cloud was unavailable. We were able to accomplish all of this and created an environment that could be used for zero-downtime upgrades or blue-green deployments.

One caveat before we get rolling with the details. We believe this is a real-world solution that can be used to gain geo-redundancy by way of failover while allowing you to take advantage of what Umbraco Cloud offers. Of course this solution can always be improved and optimized — and we hope you do optimize and then share that with us and the community. It is not, however, a low-cost solution relative to Umbraco Cloud. We found the lowest cost we were able to configure was about three-times the cost of Umbraco Cloud alone. We use the starter tier, with two environments, of Umbraco Cloud and the lowest tiers of Azure services that support our requirements.

The Components

The general approach is to create an Azure AppService environment that is not dependent directly on any Umbraco Cloud services or resources and then to create a set of services that keeps the Umbraco Cloud project synchronized with the Azure AppService. Simple concept!

The solution we created uses a number of Azure services, a GitHub repository, and specific Umbraco configuration. Note that this solution was built for Umbraco 7 but likely will work as-is with Umbraco 8, assuming you make the corresponding Umbraco configuration:

  • Azure Blob Storage for Umbraco Media, Forms, and any other file-based assets that need to be synced
  • Azure Functions for database backup and restore, file sync, and other utility tasks
  • Azure Batch for database restore with larger database sizes
  • Azure AppService for hosting the Umbraco sites and handling deployments
  • Azure Sql Database for the failover database
  • Azure Traffic Manager to auto-failover if Umbraco Cloud is unavailable
  • GitHub repository kept in sync with the live site from Umbraco Cloud (this can also be the Umbraco Cloud Live site repository or another git repository)
  • TLS/SSL certificates (we used Let’s Encrypt initially but now use CloudFlare)
  • DNS records
  • UmbracoFileSystemProviders.Azure for all sites including those on Umbraco Cloud
  • ImageProcessor.Web.Config for all sites including those on Umbraco Cloud
  • Umbraco Examine specific configuration for use in a Cloud environment

If you’ve ever setup any single one of these items you already know that each comes with its own set of specific peculiarities and optimizations. We are quite sure the solution we describe here can be further optimized.

Umbraco Bits

UmbracoFileSystemProviders.Azure

With Umbraco Cloud the use of UmbracoFileSystemProviders.Azure for media, forms, etc… makes sense and is a natural fit. You can fairly easily configure UmbracoFileSystemProviders.Azure to use the Azure CDN to give your current Umbraco Cloud site a CDN boost. In any case, to setup and configure UmbracoFileSystemProviders.Azure for Umbraco Cloud follow the excellent and detailed documentation here: https://github.com/umbraco-community/UmbracoFileSystemProviders.Azure

Note that if you are already running sites on Umbraco Cloud you will need to manually upload existing media and forms data to Azure Blob Storage, otherwise it will seem to have disappeared 😮. Once this is complete you can remove the media items from the file system of your Umbraco Cloud site using the Kudu tools. We like http://storageexplorer.com as a tool, with good docs, to manage Azure Blob Storage files.

ImageProcessor.Web.Config

You will also need to install and configure ImageProcessor.Web.Config using these steps https://imageprocessor.org/imageprocessor-web/#config There is another level of configuration that can be completed in order to take advantage of Image Processor caching with Azure. While not something we’ll cover here we highly recommend exploring those options.

Umbraco Examine

You will also need to ensure that the configuration for Umbraco Examine is set correctly for Azure AppService in examineSettings.config. The short version is it should be set to:

...directoryFactory="Examine.LuceneEngine.Directories.TempEnvDirectoryFactory,Examine"

Azure Bits

In Azure we will use Storage, Function Apps, Sql Database, AppService, and Traffic Manager (optionally Azure Monitor and Azure Batch) to create the failover environment.

Blob Storage

As noted, one of the requirements is that the Umbraco Cloud site uses Azure Blob Storage for Media, Forms, and other file-based content. We create a storage account in the West Europe data-region, where Umbraco Cloud is also located to reduce latency as much as possible, and a storage account in our failover data-region. In our case, that is Central US, but it can be any other data-region.

In each storage account create a blob container for media and for forms. We find it easiest to use the same names; media and 'forms'. You will then configure the FileSystemProvider and ImageProcessor for your Umbraco Cloud site to use the blob containers in the West Europe data-region as their file system.

We will use Azure Functions to handle the media and forms synchronization between the Umbraco Cloud and failover data-regions — see below for details.

Sql Database

Create an Azure Sql Server with a Sql Database in your failover region, this is where the Umbraco Cloud database will be restored. We opted to use the Basic database tier with 5 DTUs (i.e., the lowest cost) with a plan to scale up the DTUs in the event of a failover. Given the low-cost of Azure Sql Database DTUs you may opt for a higher-tier.

Azure Function Apps

File Sync

In the West Europe data-region create a Function App, then add two functions. Starting with the Azure Blob Storage trigger template is an easy way to set these up. For the Function App, create an Application setting:

destinationStorageConnectionString = [YourStorageConnectionString]

Then create the functions:

Media

  • Function name: SyncMedia
  • Trigger: Azure Blob Storage
  • Blob parameter name: myBlob
  • Storage account connection: destinationStorageConnectionString
  • Path: media/{name}

Forms

  • Function name: SyncForms
  • Trigger: Azure Blob Storage
  • Blob parameter name: myBlob
  • Storage account connection: destinationStorageConnectionString
  • Path: form-data/{name}
Database Backup

Due to a regulatory requirement we have been running a nightly Sql backup that ProWorks created for us since mid-2018. In our case, as we run multiple Umbraco Cloud projects, we modified the original ProWorks implementation to accept an array of Azure Sql connection strings (one for each Umbraco Cloud project). The nightly backup creates a .bacpac file for each database and stores in an Azure Blob Storage container in our failover region. This is a known location with a predictable naming approach so our restore function can find the corresponding database backup.

In the failover data-region create a Function App, then add two functions. Starting with the Timer Trigger template is an easy way to set these up. For the Function App, create the following Application settings. All of these, except the storage connection string, can be found in your Umbraco Cloud project’s portal:

  • AzureWebJobsStorage=[YourStorageConnectionString]
  • BackupFilePrefix: LIVE (or STAGE or DEV)
  • BlobContainer: webbacpacs
  • DbName: [YourDbName]
  • DbPassword: [YourDbPassword]
  • DbServer: [YourDbServer]
  • DbUsername: [YourDbUser@YourDbServer]

Alternately, if you choose to use the multiple database backup version, also create an Application setting:

DbConnStrings: [array of connection strings]

The string is a json array with this format:

{"DBs":[{"DbName":"YourDbName","DbServer":"YourDbServer","DbUserName":"YourDbUser@YourDbServer","DbPassword":"YourDbPassword"},...]}

Then create the functions:

Backup

  • Function name: NightlyWebBackup
  • Trigger: Timer
  • Timestamp parameter name: myTimer
  • Schedule: 0 0 7*** (once per day at 07:00)

Restore

  • Function name: NightlyWebRestore
  • Trigger: Timer
  • Timestamp parameter name: myTimer
  • Schedule: 0 0 10*** (once per day at 10:00)

We have added a check in the restore function to cancel the restore in the case the database backup is more than 24-hours old to avoid regressing to an older backup in the failover environment, which could be the case if the Umbraco Cloud environment was not available and the failover environment was the active one.

Alternately, Azure Batch for Database Restore

As you may already know, an Azure Function may run for a maximum of five-minutes before the functions runtime shuts it down. This is intentional and, as you can imagine, could be an issue if your database is large and may take more than 5-minutes to backup and/or restore. For this, ProWorks has a solution (of course) that uses Azure Batch kicked off by a function. See the shared repository for details.

Azure AppService

Create an Azure AppService in your failover region. Make sure to create a new Application setting for the connection string named umbracoDbDSN and set the value to the database you created in the previous step. Since we are creating an auto-failover solution using Azure Traffic Manager we need to select the AppService tier that supports this. At the time of this writing, that is the Standard AppService tier and above.

Create a deployment for the AppService from the Deployment Center in the Azure Portal selecting the following settings:

  • Deployment Type: External
  • Build Provider: App Service Kudu build server
  • Source Control: Your Git repository endpoint OR Umbraco Cloud repository endpoint (see below)
  • Branch: master

If you use the Umbraco Cloud git repository your Url will look similar to:

https://username:password@mycloudsite.scm.s1.umbraco.io/some-guid.git

Since Umbraco Cloud usernames are generally email addresses you will need to escape the "@" character and any other special characters in the string.

If you use a GitHub (or other) git repository you will first need to create a Deployment Credential from the Azure AppService Deployment Center along with the git repository endpoint. We successfully set this up using a private GitHub repository, though it did require a few tries to get it right. Then use that Deployment Credential along with the repository setting to configure the build service source.

But That’s Not All

A side note for you adventurous types. Kudu supports incoming webhooks and Umbraco Cloud supports outgoing webhooks. Putting these two together could be great like peanut butter and chocolate or whiskey and ice - every time a deployment completes for your Umbraco Cloud site an outgoing webhook will notify Kudu on your failover site which then begins a deployment so the sites will stay in sync. If that is something that sounds exciting, here are the guides — let us know how that works!

https://github.com/projectkudu/kudu/wiki/Continuous-deployment#setting-up-continuous-deployment-usin… https://our.umbraco.com/Documentation/Umbraco-Cloud/Deployment/Deployment-Webhook/

Azure Traffic Manager

Traffic Manager can be used in a variety of ways — think global scale load-balancing or a widely dispersed app where you want to direct a visitor to the closest node — but in this case we use it to automatically direct web traffic to the failover environment in the case the Umbraco Cloud environment is not available. In theory, Traffic Manager could be used to ensure your site visitors have an uninterrupted experience while your Umbraco Cloud site is undergoing an upgrade or a deployment that causes the site to be unavailable - or is just slow for other reasons.

Create a Traffic Manager Profile with the following configuration settings:

  • Routing method: Priority
  • DNS TTL: 60 (can be adjusted as needed)
  • Protocol: HTTPS (in testing, this may need to be HTTP)
  • Port: 443 (in testing, this may need to be 80)
  • Path: / (but perhaps you have a health page like /im-happy/)
  • Custom Header: empty (we did not use this)
  • Expected status codes: 200-299, 300-399
  • Probing interval: 30 (can be adjusted as needed)
  • Tolerated number of failures: 3 (can be adjusted as needed)
  • Probe timeout: 10 (can be adjusted as needed)

Then create an endpoint for your Umbraco Cloud site and your failover site. Probe requests to the endpoints will use the protocol set in the Profile configuration. By setting the Priority for each endpoint you can ensure that traffic will be sent to the endpoint with the highest priority first when it is available. We have the Umbraco Cloud endpoint priority set to 1 and the failover set to 2.

The Umbraco Cloud endpoint will be an External Endpoint and the failover endpoint will be an Azure Endpoint of the type AppService.

Finally, note the DNS Name for your Traffic Manager profile. This will be used as we update the hostnames for the Umbraco Cloud site, the Azure failover site, and for DNS record updates.

Azure Monitor for Notifications and Auto Scaling

Adding alerts to notify an administrator when a failover event occurs is a recommended approach for monitoring your environment. Especially in the case when Traffic Manager detects a condition that triggers traffic be sent to the failover site that was unexpected.

Azure Monitor Alert rules can also be used to scale Azure resources up or down as appropriate based on the condition of the monitored endpoints or other conditions. While we have not yet configured these rules, we believe this post details the required steps;

https://blogs.msdn.microsoft.com/waws/2017/12/01/scale-up-appserviceplan-from-azure-function-app/

TLS/SSL

Perhaps, the most straightforward way to configure this is to obtain certificates from a commercial vendor (we like CloudFlare, but certificates are agnostic so choose who you like). One reason to use a certificate is that the certificate is used by both the Umbraco Cloud and the Azure failover site, making configuration somewhat more simple.

Adding a certificate to both Umbraco Cloud and Azure AppService uses the .pfx format for certificates and both require that a password be set when saving the .pfx file. Once you have the .pfx file you add a certificate to Umbraco Cloud from the portal and then bind to a hostname. For Azure AppService you add and bind a certificate from the Custom domains section.

Another option is to use an TLS/SSL proxy provider (i.e., CloudFlare) and make some specific Azure AppService configuration settings. Here is a guide that worked for us:

https://kvaes.wordpress.com/2017/12/07/combining-azure-traffic-manager-cloudflare-azure-app-service-for-geographic-scale/

With Traffic Manager and CloudFlare in the mix we found that setting the SSL settings to Strict resulted in periodic errors on page load but that setting to Full in CloudFlare and leaving Strict = Off in Traffic Manager resulted in the most consistent success.

Yet another option is to use the built-in Umbraco Latch certificate and to configure an auto-renewing LetsEncrypt certificate for your Azure AppService. There are many guides on ways to accomplish this, one of our favorites is:

https://www.hanselman.com/blog/SecuringAnAzureAppServiceWebsiteUnderSSLInMinutesWithLetsEncrypt.aspx

DNS

With the DNS name of the Traffic Manager profile from above in hand, add this as a new Hostname to your Umbraco Cloud site from the Umbraco Cloud portal. Do the same for your Azure failover site using the Custom domains configuration option.

The final configuration task for your failover environment is to add the DNS name for your Traffic Manager profile as a CNAME record to your domain’s DNS. Once that DNS update has propagated and all the other bits are ready, your site is ready to provide failover support in the case of an outage.

Nice work!

A little testing may be in order to verify your specific configuration and settings are as you expect of course.

Publishing Changes

In this hot failover environment content updates will be present once the database restore has completed, however the content updates will not be published. We created a scheduled task that calls an ApiController method that publishes all content — PublishWithChildrenWithStatus(). We set a shared secret in web.config that is passed on the scheduled task url and checked in the controller.

Wrap up

In the end the solution works very well and has been implemented by ProWorks for other clients that run at much higher scale that we do with similar success. After a few months of running the failover environment as a hot-failover (i.e., always on) we decided to scale back to a warm-failover that, in the event of an outage, will require some manual tasks to bring online.

Those tasks are:

  • Scale up the Azure Sql Server database
  • Enable the Azure AppServie
  • Start a Kudu Deployment for the AppService
  • Update the DNS records to resolve to the Azure AppService
  • Publish all content from the Umbraco backoffice

A relatively small list that, in practice, takes about 20-minutes to complete once started. The largest potential break with the warm approach is that the elapsed time between an outage and us starting on the failover activation tasks is wholly dependent on a human being notified and being available to complete the tasks. It is a risk to consider.

One other complexity that could potentially arise if the failover environment is active — how to resolve content or site updates made in the failover environment to the Umbraco Cloud environment. We can imagine a variety of solutions, but have opted to use the "don't make updates in the failover environment" model. 🤷‍♂

We have shared code samples for the noted bits in this article, please feel free to use these and to improve or optimize and submit a pull-request to the repository. Note that ProWorks has used Azure Key Vault to store sensitive information such as connection strings and passwords — this is highly recommended. If you need a hand or have a requirement for a specific solution, I highly recommend you get in touch with the fine folks at ProWorks.

https://bitbucket.org/proworks/proworks.azure.public/

For the future, what we hope to see is an equivalent offering from Umbraco Cloud as an option for Umbraco Cloud projects for those who require this sort of redundancy.

Paul Sterling

Paul has been working with Umbraco since 2006, is an Umbraco MVP, and is now CTO for Mold Inspection Sciences, an environmental inspection and testing organization dedicated to helping people live and work in healthy environments. He lives in Bellingham, WA (USA) where he spends most of his spare time outdoors, usually in the rain.

Benjamin Carleski

Benjamin is a soccer dad first, Umbraco developer second. An Umbraco Certified Expert, he has been working with ProWorks for the past 4 years, only recently coming on as a full-time employee. When not chilling in the office, you'll likely find him in the attic of his house, trying to wire up his latest gadget, or refereeing out on the soccer fields, keeping his 7 future soccer stars in check.

comments powered by Disqus