In Prod, Everyone Can Hear You Scream

Posted by Claude Remillard - December 18, 2012

header-picture

alien

Really, they will.

Over the last few years providing ALM services and eventually creating InRelease, I came to discover a few fundamentals around which we are building our product. The first one I call the “First Law of Deployments”, and here it goes:

Dev deployments ≠ Ops deployments

Where Dev deployments are deployments made on intermediary test environments and Prod deployments are, well, Prod deployments.

Okay, any ops person will agree with this readily (probably telling themselves – can he be more obvious?). But it does not seem so clear to everyone in my experience.

Here are some additional requirements for production deployments:

  • In Prod, you cannot take the system down for long, if at all. In Dev, you can.
  • In Prod, If a deployment fails, you need to fail elegantly and be able to rollback quickly. You can’t just re-create the environment as you would in dev.
  • In Prod, you need to preserve and protect the existing data. Not so much in dev.
  • In Prod, you need to configure the infrastructure. In dev you can re-create it.
  • In Prod, you have server farms, mirroring mechanisms, and more dependencies. In Dev, you can keep it simple (simpler?).
  • In Prod, you need to keep logs on everything so that you can investigate or diagnose if needed.
  • In Prod, you need to limit who can deploy what top preserve against fraudulent activities.
  • And not the least, failing in prod is always waaayyy more public – everyone will definitively hear you scream.

Build once, deploy many times

The typical release cycle, at a very high level, is to build,test and then deploy to production (build-deploy-test is also emerging but I’ll talk about that separately). More or less like this:

 

image

And this is true at 10,000 feet, it is a bit of an oversimplification.

If we take the example of releasing through 3 stages (ex: dev,integration,staging) prior to hitting production, with one bug fix discovered in the first validation stage, the process, starting from TFS, often look more like this:

 

image

 

This process stems from the fact that in order to make sure that the version that goes to prod is the same code as was tested in previous environments, you do not re-build the code between stages. Instead, you want to promote the same binaries to all the environments - ensuring traceability from the original build to production.

Again, with just one bug in the cycle. When you consider the life story of a given production release, you find a lot of deployments (or a lot of bugs if you don’t!).

Deploy Early, Deploy Often

In fact, what we propose is to use the relevant portions of your Prod deploy as early as you can so that it is tested over and over before hitting Prod. Of course, to do that, you need to automate the deployments and you need to be able to transform the config files for each server, that’s what InRelease and other release automation products do (they also automate tasks to run and sometimes the approval workflow). You get something like this instead:

image

 

You still do different things in Dev than in Prod, mostly regarding provisioning and test data management, but the core deployment of each component on each server should be shared with Prod. Approaching the problem this way elevates the deployment mechanism as a first class constituent of your application and is tested at a similar level than the other aspects of your app. It will also bring production requirements awareness to the dev team, which will be a lot more conducive for them to create Ops-friendly applications over time.

The Deployment Reverse Frequency Corollary

The one thing that always puzzled me is the following – on one hand you have Dev deployments that are simpler and have a lot less failure impact. On the other hand, you have Prod deployments that are more complex and have a lot more failure impact. And yet, strangely, in many environment the frequency of the deployments is reverse to its impact. In other words, we practice the easy less-impactful deployments a whole lot, and we neglect practicing the complex dangerous deployments.

image_thumb13_thumb

 

Of course, the cost of performing one type of deployment versus the other is not the same, that’s where automation comes in. But still, who has never heard the famous last words:

“But it worked in dev?”

Topics: Blog


Recent Posts

InCycle Named Azure Data Explorer (ADX) Partner

read more

OpsHub & InCycle Help Top Medical Device Company Accelerate Innovation

read more

InCycle Continues to Lead, Recognized by Microsoft for Industry Innovation. Earns Impact Award.

read more