This concept of process isolation has been around for a long time in Unix/Linux space and more recently Windows. With it comes new challenges that forces developers and engineers to approach what they have been doing for the last few years differently (i.e. process changes). In order to run our applications in containers, we need to create an image that contains a lightweight OS and our application. Sounds simple enough, right? If you are building an application by yourself it probably is, but we don’t build applications in isolation. We work in teams and often need to integrate with other teams. This creates challenges such as:
- How do you handle versioning the container image?
- How do you test and verify a container image is stable?
- How do you handle application configuration and/or secrets in the container?
The questions above are just some of the challenges working with containers brings. While some may sound scary on the surface, a solid container image management strategy will help to either mitigate or resolve the issues these challenges bring. Let’s take a closer look at some of these challenges and look at some ways to overcome them.
Versioning Container Images
Throughout the history of developing software, versioning has always been a pain point. I can remember learning the hard way, early in my career, trying to troubleshoot a production issue locally only to find out the version of code I was using wasn’t what was running in production. If you have ever felt this pain of losing hours of work because of not having the right version, then you most likely can appreciate this topic.
There are varying styles such as using the date the assembly was built. There is a good chance that you have seen the format “YYYY.MM.DD.[BuildNumber]” or “YYYY.MM.DD.HHMM”. It has the advantage of the consumer knows when the assembly was released and potentially how far out of date they are from the latest. There is also a slew of sequenced based versioning techniques to tell when something has changed. One of the popular ones is Semantic Versioning (https://semver.org/) which has the format “Major.Minor.Patch”.
So why should you care to the format of the version number? When you create container images and share them amongst other developers that are not actively contributing to the code base, they need to understand what has changed. Using Semantic Versioning helps to explain to other developers the level of change that took place. Seeing a version jump from “3.1.2” to “3.3.1” shouldn’t worry the consuming team. Seeing a jump from “3.1.2” to “4.5.2” is an indication they have work to do in order to consume the latest version.
For consistency purposes, the version for your application assemblies should also match the version of the container. There is nothing to stop you from versioning your assemblies using a date format and your containers using Sematic Versioning. This disconnect can lead to confusion and potential issues trying to debug a container when something goes wrong. To help resolve this, you will want to leverage a solution such as GitVersion (https://gitversion.net/docs/) to help drive consistency and tie the container back to the source code it came from.
Developers that will need to consume a container will need to know what version to target (i.e. what version is running in production). Docker allows us to tag container image to tell consumers what version the image is. Simply relying on the “latest” tag won’t do for external consumers, you need to be able to give them a version number to target. There is no public standard on how and what you tag, just look at Docker Hub. You will see lots of groups doing similar things, but rarely do you see identical patterns. Let’s say you have five applications; each application is broken out into three containers which gives you 15 containers. Each container has its own versioning strategy which means you have 15 different ways to target a version of a container. How easy would it be to accidentally target the wrong version during a sprint? How much time would you lose if a team spent a sprint or two developing against the wrong version of a container?
Hopefully the picture I painted shows you why it is so crucial to take versioning your container images seriously. Having a standard strategy for versioning and tagging across the organization will help to either mitigate or prevent confusion when consuming the containers. It quickly becomes foundational pillar to be successful at using containers in your organization.
Container Image Management Accelerator
I’m sure we are all aware of the value of testing the software or systems we produce. Practices like test-driven development (TDD) or “Shift-Left” testing have emerged to help us focus on an aspect of software development that was historically deprioritized or had truncated timetables due to various reasons. It is not simply about running some automation tests and hope for the best, it has become a focus on the accuracy and quality of the tests we run. To ensure this, we need to test our containerized applications the right way.
Let’s look at this scenario:
We have a team of developers that are working on a web application using Visual Studio 2019. The solution contains three projects:
- Blazor Server project running .NET Core 3.1
- Web API application running .NET Core 3.1
- Unit Test project running .NET Core 3.1 using the xUnit framework
The team has containerized the Blazor Server and the Web API applications using Linux as the base. They are using Docker Compose to orchestrate the two containers locally. The team has about 50 unit tests written that does a fairly good job at exercising the code in the two applications.
On the surface, this seems rather straightforward setup. A developer will make a change locally, write or update a few unit tests, and then run the unit tests from Visual Studio or using the “dotnet test” at the command line. The tests pass and so they run the application code locally in containers do some spot checking, commit the changes and push to the remote. The CI process begins which builds and runs the unit tests first and then creates the containers. Those containers get pushed to a registry and the CD process delivers the updated container so someone can pick up testing the change. A tester begins validating the change and after about a half hour of testing an issue is found. The developer and tester start to dig into the problem and after about an hour of back and forth, they discover it was an issue related to .NET Core running on Linux that the unit tests should have caught. A simple, quick fix to the code as well as an update to the unit test and the work resumes.
With exception to the root cause, what I described is a rather typical practice that occurs often when building software. It seems harmless and almost expected but it notes a flaw in the team’s testing process; the unit tests never ran on the container. The change the developer made should have been caught with the unit tests, but it wasn’t due to the slight difference to developing on Windows and running on Linux.
When using containers, it is important to test them. Moving your unit tests to run inside your containers ensures you are testing with the exact conditions that production will be. If your container image management strategy is solid, the image should be exactly what runs in production (provided it passes all the quality checks). The only difference should be the amount of undying compute, I/O, and memory that is available for the container to consume.
This concept can be expanded to include things like integration and functional tests as you don’t need to run dedicated environments anymore. Incorporating practices like these into your container image management strategy will help to support the shift-left mentality but more importantly, will help to ensure that the images you are publishing are stable and ready for consumption by other team member and production.
Learn more about containers, Kubernetes & AKS
Application Configuration in a Container
A simple search of the web on this topic and you will get A LOT of results. As you read through the results, you will also find a wide variety of ways to address with some being better than others. A lot of people’s instincts would be to generate a settings file for the target environment, then copy the result into the container image. This could work but it immediately creates an environment dependency onto the container. You will need to have multiple containers with the same code, just different configuration, to support the application. Managing containers quickly breaks down and will cause more confusion and problems as it evolves.
Most people’s next course of action would be to say, let’s use volumes. The idea is to take an environment specific configuration file and mount it to the container image in a well-known place. The volumes would get defined at time of running the container making the image environment independent. While it is certainly a step in the right direction, it introduces new problems. Volume mappings with containers binds the container to a host or specific set of hosts. The complexity increases depending on how storage is shared across hosts, if at all. This increases the difficulty in starting a container on just any host. While this is certainly a better solution, it still isn’t an ideal one.
Make configuring an application in a container truly independent, we need to look at the guidelines from The Twelve Factors. Section three says we should store configuration in environment variables. Some people get scared of this because other applications can view and modify your applications configuration. They also worry that they would need to add logic to check for when a variable has changed adding another layer of complexity. All valid concerns, but containers address both of those in a very simplistic way. First, a container should be a single process, i.e. your application is the only one running on it. There should be no other applications running that could interfere with your application. Second, environment variables are defined at runtime for a container. This means if variables need to change, you simply stop the container and start a new one in its place with the updated values.
Leveraging environment variables for your configuration not only allows for the image to be fully independent of an environment, but it also allows you to run it on any host. Some people may call out a security concern with regards to secrets, such as passwords and/or API keys. You do need to take care as to who has access to the container instance running. All the big orchestration tools (such as Docker and Kubernetes) have mechanisms that allow you to inject secrets into the container. Depending on your security requirements and sensitivity of the data, that should dictate the level you need to go to secure your applications.
I’ve always found it exciting to try out a new piece of technology with the projects I’ve been working. It makes work fun and keeps things from getting routine. I’m sure many of you reading this would agree. The trick is once I’m doing trying out the shiny new toy, what do I do with it? Tools like Docker and Kubernetes are still relatively new. There are have success stories and there have been failure stories. The real trick is how well your organization can take the shiny new toy that only a few people understand how it works and turn it an operationalized tool that everyone can use. Managing your container images is that crucial steppingstone to go from toy to tool.