One codebase tracked in revision control, many deploys
This principle seems obvious to any seasoned developer. And yet, .... Do you have shell scripts pasted into a
Jenkins server? What about ad-hoc jobs? Those little edit boxes which encourage you to paste in code are nefarious
little traps that undermine the robustness of your CI system.
Even if the code in those edit boxes is technically under revision control, how is it versioned?
Can you re-create any build that you ran in the past? Or is it likely that some little edit box has been
changed in an un-trackable way? Do you have a catch-all “CI” scripts repository, which is used by many jobs?
This is a no-no, because combining code for different jobs into a single repo makes it impossible to version
Test code should be versioned alongside the code it is testing.
Job configuration should also be tracked in source control. Committing to a repository on job config change is good;
defining the jobs as code is better
Setting up job configurations by hand in user interfaces is error-prone and a history of changes is
important for rebuilding past artifacts.
Only paste a single command into code boxes.
Document the operation of the job.
Store the job configuration in a source code repository.
1. For example if you're using Jenkins, use the
Job DSL Plugin
or a project like DotCI to define your jobs.
Explicitly declare and isolate dependencies
Ah yes. Dependencies. CI jobs have all kinds of dependencies:
- Languages like Python and Ruby
- Tools like Docker, Vagrant, Selenium and AWS
- Backing services like Postgresql
Without the right dependencies, the CI tools break. Worse yet, dependencies generally drift over time.
When you install a dependency a year from now (after a CI server crashes, or you’re rebuilding or updating it),
you’ll likely get something quite different from today.
So, it’s not enough to carefully manage the CI code, you have to manage the dependencies just as carefully.
Otherwise, the result is a system that works today, but:
- Won’t work tomorrow
- Makes non-reproducible artifacts
The tools for specifying dependencies are getting a better, but it’s still very easy to leave a
dependency underspecified and then experience a system failure when it’s unexpectedly upgraded in a
The only way to completely lock down dependencies on a full system (e.g. Linux machine), is to capture an image
and tie the build job to that specific image id.
This used to be quite difficult; thanks to Docker, it’s now gotten a lot easier.
Containers are great for build jobs because:
- The image has all of its dependencies built in; OS, libraries, application code, everything.
- The image is locked with a specific version (SHA). It’s guaranteed to work just the same in the future as it does today.
Just keep in mind, container technology is not a panacea.
For example, you can’t containerize your Windows and OS X build jobs (yet).
When you can't use a container, fall back on a traditional virtualized image.
If you can't do that either, manage the build machine as strictly as you can using configuration management tools.
Sometimes dependencies cannot be locked to a specific version. For example, when you are a writing a library. If you
pin your dependencies to specific versions, you are forcing the library's user to also use that dependency. This is
often not desirable or possible. In this case, pin the major version of the project and run your library's tests on
a schedule to ensure that updated dependencies do not break functionality.
Explicitly declare all dependencies.
Dependencies must be locked to specific versions (e.g. Gemfile.lock).
Run builds on strictly controlled containers or virtual machines.
Build servers (bare metal or base images) should have the minimum possible set of installed packages and configuration.
Store config in the environment
Externalize the configuration of the job to the environment. This makes the job maximally flexible, it ensures
that the job is not tied to a specific file arrangement, and it makes it easy to move secrets out of the source code.
This includes sensitive configuration. Secrets should not be checked into source control or entered into your CI system
without encryption. Ephemeral or one time credentials are best. Externalizing your secrets ensures that the code
is not tied to a specific environment or user. It also prevents accidental leakage of secrets.
Finally, this pattern ensures that jobs can be run only by specifically privileged individuals and systems.
Don’t rely on hard-coded paths to configuration files unless the whole file can be safely committed to the source control repo.
Externalize secrets out of the source code and out of images.
Treat backing services as attached resources
Build jobs may depend on backing services such as a database, message queue, or caching service.
Jobs should be sufficiently abstracted from these backing services so that the physical location of the
service (local or remote) is abstracted from the job.
This enables backing services to be swapped out and reconfigured without requiring modification, re-test and
re-deployment of the job.
Things get really interesting when your CI system is building and testing a service-oriented application.
Then, for acceptance testing, services will have dependencies on other upstream services.
If your services have hard-coded assumptions about the location of upstream services, it makes it very hard
to verify code along different development streams and branches.
Mocking out calls to backend services is an option to be considered carefully. Mocking is fine for unit tests, but then
smoke tests should also be run to ensure the real backend behaves as expected. The above guidelines apply for smoke
A job should work equally well with local or remote backing services.
The location of backing services should be provided to the job as configuration.
Any secrets needed to connect to a backing service should be provided to the job as configuration.
Strictly separate build and run stages
Testing is an integral part of building, releasing and running software.
The jobs that make up the CI system are software. Therefore, they should be coded, tested, packaged, and released.
In addition, job code must be meaningfully versioned, because the requirements of the product being built change
It’s important that as the job code is changed in response to the product code, that the association between the
older product code and older job code is not lost.
Otherwise, it becomes very difficult to re-build historical packages, and to effectively test and deliver hot fixes.
Create a release artifact for the job code.
Job code should have test cases, and a build and release process of its own.
Maintain version relationships between the product code and the job code.
Execute the app as one or more stateless processes
Stateful CI jobs are a big problem.
It’s very common for a job to rely on some specific filesystem configuration
or for a sequence of jobs to use a common filesystem to pass information down the pipeline.
Not only are stateful jobs fragile, they are also not scalable. When build jobs are written with the
assumption that they are operating on a shared filesystem, it’s impossible to scale the CI system out across a
pool of worker machines.
In practice, this means that the CI server becomes more and more stressed until a
breakdown ensues. The build team tries to scale out the CI server to multiple machines, only to find that the
jobs won’t run because they are stateful.
Job code should not rely on the file system to persist data across runs.
Job code should be portable across build nodes.
Maximize robustness with fast startup and graceful shutdown
Jobs can sometimes take a long time to run. Track how long jobs take to run so you can take action when new changes
cause an unexpected spike in build time.
It’s common for a build manager to realize that a build job is going to fail, and terminate it prematurely. When this
happens, any resources created for the job should be released. Cloud instances should always be terminated and Docker
containers should be run with the
Job code should handle shutdown gracefully and not leave orphaned resources.
Keep development, staging, and production as similar as possible
This is very important for CI, and very often overlooked.
This manifests most often in CI as job code which will only run on the CI system.
This makes it very difficult for developers to run, test and modify the job code.
Automation is often mistakenly identified as the objective of CI / CD.
Actually, the objectives are reliability and predictability, followed by speed.
Automation is a technique which can help improve predictability and speed, but it’s far too easy to build a system
that’s automated, but unreliable. When automation breaks down (and it will, frequently), it’s important that:
- Troubleshooting is efficient
- Manual workarounds can be applied to keep important code moving through the pipeline, without compromising the quality of the output.
For these reasons, there shouldn’t be anything special about the CI environment that can’t be easily
reproduced on a developer’s laptop (or on a cloud server).
One of the trickiest aspects of this is build credentials, aka secrets. For example, suppose a build job is going
to push an artifact to an S3 bucket for official releases. If this secret is tied to the build machine, then there’s
no way to produce an official release from any other environment if the build environment is broken or the
job starts to fail.
Base environments and dependencies used by the CI system should be easy for developers to provision and use.
Trusted developers should have access to secrets used by the CI system, in a controlled and audited way.
Manually built artifacts should be valid input for downstream automated jobs.
The CI system should be built from images that can be pulled and run by developers without the need to setup a complex build environment.
Treat logs as event streams
Robust log capture is pretty well handled by standard CI systems.
This is largely due to the fact that the CI pipeline jobs are often shell-based, and shells have robust support for
When a build job is a more complex piece of software, such as a Java program, it should log verbosely
in order to assist with troubleshooting.
Finally, if you need to keep build logs for historical reasons, store them externally to your CI system.
Build jobs should log verbosely.
Send command results to stdout and logging to stderr.
Build steps and artifacts written to an audit system independent of any single CI component
The audit is the definitive record of the CI process.
Trusted developers can perform CI / CD operations manually, as long as they generate the necessary audit records.
Audit is used for detailed provenance of artifacts.
Write CI job metadata to an audit record, not just a log file.
Support manual / override workflows in addition to automation.