YAML Pipelines: A History Lesson

Posted by Daniel Mann - April 27, 2020


YAML Pipelines  Background & Context

header-picture

Before we talk about YAML, let's talk a bit about the history of CI/CD in TFS / Azure DevOps. Get ready, because we're going to get in our time machines and go back to the start of the 21st century for this one.

TFS 2008 - MSBuild

All the way back in TFS 2008, Microsoft introduced a build system that was based on MSBuild targets. It made sense at the time; JSON and YAML hadn't yet risen to prominence, and TFS was targeted at people working exclusively in the .NET ecosystem. Why not specify your CI in the same language you specified your C# project files in? Unfortunately, MSBuild isn't the easiest build language to work with. If you've ever tried to do anything complex in MSBuild, I'm sorry. If you haven't, save your sanity and don't. You could write deployment capabilities, but it wasn't particularly easy to do and was a bit of a maintenance nightmare. I've seen older deployment setups based on this and my #1 imperative was to migrate it to something more modern.

TFS 2010 - XAML workflows

To address all of that, in TFS 2010, Microsoft deprecated that build system and released a build system that was based on Windows Workflow (XAML build). You had a couple of out-of-the-box build templates that were pretty good for about 90% of users, because, still, the expectation was that the users were mostly doing C# applications. If you had to do anything not supported in the OOB templates, you had to customize the workflow in a visual designer. Building and maintaining custom workflows was... not fun. Especially since each major release of TFS tended to introduce breaking changes, requiring custom templates to be reworked in advance of an upgrade. Still no deployment mechanisms, either.

TFS 2013 - Release Management

The deployment story changed in 2013, when Microsoft bought InRelease from InCycle and rebranded it as Release Management Server. This gave a similar workflow-based visual designer experience to the deployment experience, and for the first time, there was a really good end-to-end continuous delivery story in the TFS ecosystem. Release Management Server had some problems with scaling, though -- it expected you to build a deployment workflow through copious amounts of copy/paste and composing deployments out of a big library of small, granular release activities. It worked great for organizations with relatively small portfolios of simple applications, but it was very hard to maintain a large, diverse library of releases.

TFS 2015 - JSON builds/releases

This brings us to 2015. In TFS 2015, the slate was wiped clean. Microsoft deprecated XAML build and release management server. To replace it, they introduced what you probably know as the CI/CD system that's present in TFS and Azure DevOps today (JSON builds). It's based on JSON files and have robust visual designers to allow you to create build and release pipelines. However, there are still some problems with this approach...

First, JSON pipelines aren't source controlled. They are versioned, but that versioning is separate from source control. This leads to fun scenarios like this: You have a feature branch where you're upgrading an app from, say, .NET Core 2.2 to .NET Core 3.0. This requires making some changes to your build process. You have a single build for this application that's used as part of a PR workflow via branch policies. How do you manage your builds? The answer is, "it ain't pretty". The best way I've found is with a bunch of conditional steps based on the branch the build is originating from. But this still involves going back and updating that build later.

Second, JSON pipelines aren't really intended to be human-readable and modifiable. Although you certainly can extract them with REST APIs, modify them by hand (or by scripts), and update them, it's not a first-class experience and it requires a fair bit of trial-and-error to get right. Basically, there's no mechanism for quickly doing a search-and-replace or otherwise updating JSON pipelines en masse. Visual designers are great when you're building something out, but once you have the core structure laid out, sometimes it's easier to just fall back into your text editor of choice and blast problems with regexes. You can't really do this in JSON pipelines without jumping through hoops.

Which brings me to the third point, maintainability. It's still tough to maintain a big library of pipelines. Task and variable groups help, but don't entirely solve the problem. As an example: If you have 20 pipelines and you create a new task group to implement some new behavior that has to be applied to all 20 pipelines, guess what? Someone is going to have to go and click through the UI for all 20 pipelines to add that task group in. It's slow, monotonous, and error prone.

Azure DevOps Server 2019 / Azure DevOps - Enter YAML pipelines.

What are YAML pipelines?

YAML pipelines are the latest iteration of continuous integration and continuous delivery. They address the remaining problems with the JSON pipeline system, while maintaining a high degree of backwards compatibility. Nothing needs to be massively redesigned from the ground up, because YAML pipelines use the exact same engine as JSON pipelines, which means the same pipeline tasks work.

The idea is that your pipeline steps are specified in YAML and versioned in source control alongside the application code that the YAML builds and deploys. YAML is a whitespace- and newline-sensitive format that's less verbose than JSON (which, in turn, is less verbose than XML...) and is well-suited to creating human-writeable configuration specifications. This also lets you split definitions across multiple files or even multiple repositories, enabling easy reuse and standardization. Why copy the same pipeline 9 times, when you can just point 9 different pipelines at the same YAML file?

Jobs, Stages and Steps, Oh My!

This is where things start to get tricky. Tasks remain the core "unit" of pipelines -- these are the same build/release tasks that we used in JSON-world. However, YAML pipelines can have a lot of different scopes in which tasks run. You can have jobs. And stages. A lot of these concepts remain the same as the equivalently-named thing from JSON pipelines, but it's a lot more front-and-center now. The documentation is simply not great for any of this, which can make developing pipelines difficult. There's the rudiments of a visual designer, but it really only works for configuring tasks.

Build pipelines are usually easy; you can totally ignore the concepts of jobs and stages and environments and just have a bunch of tasks. However, releasing is where things start to get ugly.

There is the concept of multi-stage pipelines and deployment jobs (versus "regular" jobs). However, this is not as straightforward as just setting up a release definition.

Problem 1: Getting artifacts from a build is difficult, unless you have build and deployment in one pipeline.

Yes, that's right. You can have build and deployment in a single pipeline. You just break it down into multiple jobs, some of which are deployment jobs. However, in JSON releases, you can easily link in one or more build artifacts and have the ability to specify a build version at time of creating the release.

In YAML pipelines, you can't do that. You can either have everything in a single pipeline, or you can have a separate build and a separate release. The release pipeline needs to have special "build completion" triggers defined so that it knows to trigger when the build completes, and you have to explicitly write YAML to tell it which build definition it should be downloading from, and you have to jump through some pretty significant hoops to be able to specify anything other than "latest". So it makes it tricky to redeploy older versions.

Problem 2: Multi-stage pipelines don't have feature parity with JSON releases.

As I said above, you can't easily deploy different versions. You also can't skip stages or manually deploy stages. There's also no support for tweaking the parallelism. Releases run in parallel

Problem 3: UI/UX isn't there

The JSON release pipelines have a great visualization. YAML, not so much. It's hard to see what's released where, when it was released, etc. The lack of robust visual designer that generates YAML is a disappointment, as well. I'd love for the designer to let you design more complex YAML, with there being some eventual complexity cut-off point where it's up to you to write YAML without the designer.

You can't control the parallelism, so every release runs in parallel if there are available agents. You also can't control the behavior so that it cancels prior pipelines if a newer pipeline has already been promoted to that stage.

Probably the most annoying thing is that approvals are defined with a timeout, after which the approval fails. When a stage is waiting on approval, the release still shows a blue spinning "in progress" icon with a timer. Even if it's waiting on an approval. The only time it's green is if the entire pipeline is complete. This makes it hard to know what stage has been deployed most recently at a glance.

Problem 4: No TFVC support.

I'm one of the holdouts in the industry who still thinks that TFVC is a fine alternative to Git, and that Git isn't appropriate for every team. That said, I'm almost completely alone in that belief. So this is a minor quibble at this point, but it's worth noting: YAML pipelines don't work with TFVC repos. YAML pipelines will never work with TFVC repos.

Summary

All of that said, YAML pipelines are awesome and are clearly the path forward. I'd start investing in them sooner rather than later. There are a ton of advantages, and relatively few downsides.

The situations where I'd say "definitely start using them immediately":

  • You have a large number of builds and/or deployments that are exactly the same or very similar (barring things like variables, of course). YAML makes these situations a lot easier.
  • You are building and deploying to Kubernetes or otherwise heavily invested in containers. YAML has much better features around building in containers, and the Environments section makes monitoring K8S clusters easy.
  • You have problems with auditing/change control of pipelines, or you simply want to be able to better integrate them into your PR process.

The situations where I'd say: "proceed with caution":

  • You're using TFVC for version control. YAML doesn't support TFVC, end of story. Move to Git first. That's a whole different blog post.
  • You have relatively few, very complex pipelines that don't have a lot (if any) commonalities. YAML is just going to increase complexity for you without giving you a lot of benefit.
  • You need to be able to skip release stages or manually deploy them.

That said, you still need to be looking at YAML sooner rather than later. Microsoft is investing heavily in YAML pipelines across the board; GitHub Actions (GitHub, of course, being owned by Microsoft) is YAML based as well and shares some commonalities with Azure DevOps YAML pipelines. Look for a blog post about GitHub Actions in the not-too-distant future!

Topics: DevOps, CI/CD, Pipelines


Recent Posts

What's New in Azure DevOps?

read more

InCycle Named 2020 Microsoft US Partner Award Winner

read more

What Is Container Image Management?

read more