Men in boxing gloves on black background
The cloud is an exciting technology. One of the main reasons is that almost every aspect can be automated. Traditional data centres have continued to automate infrastructure tasks and push the boundaries of what is possible, but the cloud brings new opportunities with enormous potential.
It is November 2018. We have an Amazon Web Services (AWS) cloud platform and about 30 accounts and five applications in production. When we started setting up the platform, one of our key objectives was to automate the provision of the infrastructure as much as possible. The most logical way to achieve this was to use AWS CloudFormation.
With growing experience and constant use of the tool, we have often wondered if there were more efficient ways to provide an infrastructure in AWS. What followed was inevitable: an analysis of the Infrastructure-as-Code (IaC) Tool Terraform. An important aspect of Terraform is that it supports multiple clouds – a major advantage in light of our upcoming multi-cloud initiative.
We compared Terraform 0.11 with CloudFormation. The results are described in detail as follows. (Note: 1) We only evaluated Terraform’s capabilities in comparison with AWS. 2) At the time of writing this post, Terraform 0.12 with HCL 2.0 had already been announced, but this version will not be available in the next few months. Some of the tool’s characteristics may need to be reassessed in the future. AWS has also recently introduced some new features for CloudFormation. In this blog we can only deal with these superficially.)
|Support of AWS services||(+) Terraform supports almost all AWS services. Another highlight is its fast support of new functions for existing services. Completely new services are supported some time after their release.||CloudFormation supports almost all AWS services.|
It is somewhat slow in supporting new features for existing services. Completely new services are occasionally supported at the time of their release.
|User experience||The open source version of Terraform is a CLI-based tool and does not offer a user interface.||(+) CloudFormation provides a user-friendly user interface with important information for tracking changes.|
|Programming language||(+) The HCL-based templates are more efficient.||The JSON and YAML-based templates are somewhat clumsier.|
|Modularisation||(+) Terraform modules help create a reproducible infrastructure.||CloudFormation does not use modules. Nested stacks and cross-stack references can be used to achieve modularisation.|
|Administrative overheads||Terraform offers well-established ways to create an infrastructure, but does not specify a process. The user is responsible for managing this process and all necessary artefacts.||(+) CloudFormation is an AWS-managed service that makes some decisions for the user and makes it easier to use. Though this limits the possibilities, it reduces administrative overheads.|
|Locking of resources to prevent parallel processing||Terraform does not have an integrated mechanism to prevent multiple parallel processing by different users. This can lead to an inconsistent infrastructure state.||(+) CloudFormation has integrated state management and stack locking to prevent parallel processing.|
|Importing of infrastructures||(+) Terraform supports the import and management of resources created outside of Terraform.||CloudFormation cannot manage resources created outside of CloudFormation.|
|State verification, change overview and change management||(+) The Terraform plan command can be used to determine the differences between the known state and target state based on planned changes and can display a visual representation of the differences. Terraform also recognises historical infrastructure changes that it did not make.||CloudFormation can use change sets to create an overview of how proposed stack changes in the most recently submitted stack template will impact your resources. This is only possible for resources created with CloudFormation. CloudFormation does not offer a way to check the actual state of the infrastructure. Changes to resources not made by CloudFormation will not be recognised.|
|Error handling and rollback||Terraform does not offer automatic rollback when errors occur. However, errors are isolated to dependent resources. Non-dependent resources will continue to be created, updated or destroyed as usual. The user has to implement or manually restore the state.||(+) When errors occur, CloudFormation will automatically roll back to the last working state.|
|Rolling updates of Auto Scaling groups||Terraform does not support rolling updates of Auto Scaling groups.||(+) CloudFormation supports rolling updates of Auto Scaling groups.|
|External wait conditions (e.g. for the termination of a shell script)||Terraform does not support external wait conditions.||(+) External wait conditions can be defined in CloudFormation. Resources that are dependent on wait conditions can only be created or updated after the conditions have been met.|
|Multi-cloud support||(+) Terraform supports multiple cloud providers.||CloudFormation is AWS-specific.|
|Tool support and licensing||The open source version is free of charge. Even though there are no support SLAs, problems are generally resolved quickly. The pay-based Terraform business version offers different support options as SaaS or private install.||CloudFormation is a free AWS service. AWS provides support within the scope of the selected Support plan.|
Support of AWS services
CloudFormation supports almost all AWS services. CloudFormation often supports completely new services either immediately or shortly after their release. However, it can take some time before it supports new features for existing services.
Terraform also supports almost all AWS services. Since it is developed by a large open source community and the developers often want to use new features themselves, completely new services and new features for existing services are quickly available in Terraform. However, since the Terraform community can only implement entirely new services after they have been officially introduced by AWS, it can happen that CloudFormation supports these new services before Terraform.
AWS CloudFormation provides a user interface where users can create, modify and graphically display resource dependencies. It also offers a way to validate the template syntax. The console provides an overview of all activities and lots of information to track them. This is especially helpful in case of an error.
Terraform does not have a user interface. The open source version is controlled exclusively via the command line interface (CLI). This can be challenging for inexperienced users. Terraform SaaS and Enterprise provide a user interface.
In CloudFormation, the infrastructure is defined in JSON or YAML files known as CloudFormation templates. They can become quite long and confusing for more complex resources. As far as conditions, branches and loops go, CloudFormation supports explicit “if” and “and” constructs, but it is not possible to use “for-loop” or “if-else” constructs.
Terraform has its own domain-specific language, the Hashicorp Configuration Language (HCL). Terraform templates are easy to read and efficient. On average, you need less code than with CloudFormation to achieve the same result. One reason is the “count” meta-parameter that can be used to create constructs like “if”, “if-else” and “for-loop”. But this requires some creativity, as can be seen in the article Terraform tips & tricks: loops, if-statements, and gotchas.
Terraform modules support a reproducible infrastructure that can be controlled with parameters. Modules support the export of template values in output parameters that can be used as input parameters for another module.
Terraform supports a multitude of data sources for input parameters. If the desired parameter is not available as an internal data source, scripts can be used to dynamically calculate the parameter as an external data source.
Terraform also offers a module registry with verified community modules. A helpful registry feature is that modules can be accessed via their version number. This helps prevent unwanted changes.
In CloudFormation, modularisation is limited to the use of nested stacks and cross-stack references. Cross-stack references make it possible to export values from a stack generated at runtime. These values can be referenced as parameters in subsequent stacks. Nested stacks consist of CloudFormation templates with hierarchical dependencies.
CloudFormation also allows users to use custom resources to define lambda functions that dynamically provide input parameters.
There is no AWS equivalent to the Terraform module registry. However, there are many CloudFormation templates online that have been provided by AWS and users.
Terraform offers well-established ways to create an infrastructure, but does not specify a process. The user is responsible for managing this process and all necessary artefacts. Terraform stores the last updated state of the infrastructure in a JSON file called a state file. The user is responsible for managing this file. If this is not done properly, it can lead to simultaneous parallel updates to the same infrastructure, which will be discussed in the next point. It is also somewhat confusing and there is a risk that the file will be corrupted by manual changes, which can lead to related resources being deleted.
CloudFormation is an AWS-managed service that makes some decisions for the user and makes it easier to use. Though this limits the possibilities, it reduces administrative overheads. CloudFormation manages the current state directly in the AWS account – in the stack. The stack represents the infrastructure provided by CloudFormation. It is easy to view the content of the stack on the user-friendly CloudFormation console. The stack can only be changed by a regular CloudFormation update.
Locking of resources to prevent parallel processing
Since CloudFormation is a managed AWS service, there is no risk of a stack being updated or modified multiple times in parallel.
Since Terraform is not (necessarily) managed from a central location, parallel updates of the infrastructure can occur. Measures must be taken by the user to prevent this. Terraform SaaS or Terraform Enterprise offer established ways to accomplish this.
Importing of infrastructures
It is easy to import resources created outside Terraform into Terraform and create a template from them.
CloudFormation cannot be used to manage or integrate resources that were not created with CloudFormation.
State verification, change overview and change management
This is perhaps the most important function. The Terraform plan command lets the user view and compare the current state of the infrastructure with the planned new state and display a visual representation of the differences. It also detects previous changes to the existing infrastructure that were not made with Terraform. This allows the user to check and confirm planned changes. It facilitates the traceability of changes and creates a high degree of confidence that they will have the desired effect.
Terraform will always change the infrastructure to the state defined in the template. This allows unwanted changes made by others to be undone. On the other hand, Terraform offers no way to keep changes that were made outside of Terraform. But depending on the application, both functions are required.
The drift status function recently introduced by AWS compares the stack with the actual state of the infrastructure to detect changes made outside of CloudFormation. However, drift status does not help undo changes.
When making a change with CloudFormation, there is no way to check it against the actual state of the infrastructure. Changes outside CloudFormation can only be overwritten by replaying a template, provided the template and stack are different. This has the disadvantage that external changes cannot simply be undone. On the other hand, it has the advantage that it is somewhat tolerant to external changes.
Error handling and rollback
If an error occurs while handling a resource, Terraform isolates the error to dependent resources. Non-dependent resources will continue to be created, updated or destroyed as usual. This can result in the infrastructure being in an unusable and unstable state. If the user wants to reset the entire infrastructure to the previous state, this has to be done manually or automated. It is helpful that Terraform backs up the state each time the state is changed.
CloudFormation always (or mostly) tries to keep the infrastructure in a usable and stable state. If a change fails, CloudFormation rolls the infrastructure back to the last stable state. There are some situations where this does not happen, but most of the time the user still has a working infrastructure despite errors.
Rolling updates of Auto Scaling groups
CloudFormation supports rolling updates of Auto Scaling Groups with a rollback should the update fail. Terraform does not support this.
Configuration of wait conditions
CloudFormation allows wait conditions to be defined. This means that all dependent resources are only handled after the resource with the wait condition is created or updated. This makes it possible, for example, to implement a workflow that waits until certain software is installed and available before continuing with the next steps.
Terraform does not inherently support wait conditions, but it is possible to implement them with an external data source.
Terraform currently supports more than 90 providers. This means that a single tool can be used to provide infrastructures for all major public cloud providers such as AWS, Azure, Google Cloud Platform and a variety of cloud native tools such as Kubernetes and Docker.
The use of CloudFormation is limited to AWS.
Tool support and licensing
AWS CloudFormation is free of charge. AWS provides support through its Support Plans.
Terraform is available either as a free open source tool or as a paid business tool. The open source version has no support SLA, but most requests will eventually be answered.
Terraform Enterprise is an integrated IaC tool and features built-in integration with code repositories such as Git. Each commit triggers the plan command, which can be shared and applied. It has well-established methods for managing user and resource states.
Terraform Enterprise is available as a Pro version (SaaS install) and a Premium version (private install). The Pro version comes with 9-to-5 support and the Premium version with 24/7 support. The Premium version includes an unlocked Hashicorp Sentinel for precise control of resource provisioning. It is possible, for example, to specify when and under what circumstances changes can be made.
Terraform recently announced a free version of the SaaS solution that is ideal for small teams and developers. It provides a solution for managing the state file, locking the state, and providing a change history on a user interface.
Something that perhaps only I noticed about Terraform is that at some point users will accidentally delete or modify resources. This is not due to a bug or a lack of documentation from Terraform. In my case, it was due to my initial inexperience with the tool. Terraform provides a resource type called “aws_iam_policy_attachment” that creates exclusive attachments of IAM policies. I used this resource to give an AWS user “admin rights”. It took me a while to figure out why all other AWS users had been deprived of their admin rights – until I read the documentation, which clearly describes the use and implications of using this type of resource. This is by no means a shortcoming or problem. It is simply something to remember because CloudFormation does not provide features that grant exclusivity, and it could also be detrimental to other resources outside the tool.
It is hard to say which tool is better and can be considered the winner. Both tools have their disadvantages, but also workarounds to fix them. In a direct comparison like this, we think Terraform would be the right choice. But we also believe that we cannot only use Terraform and do completely without CloudFormation. Since we also operate an OpenShift cluster as a private cloud, where we use Ansible, OpenShift templates and HELM for IaC, we already have a wide range of tools for similar tasks in the Kubernetes environment. This multitude of tools represents a certain burden for the different teams. Therefore, the next step is to examine whether we can reduce the number of tools. We need a holistic automation concept for all platforms that takes into account not only the tools, but also the teams and their capabilities. Only when we have such a strategy will we possibly introduce Terraform.
Cool tools on the horizon
There is hope for all those who are sceptical about infrastructure as code due to the use of JSON, YAML or HCL. New emerging tools such as Pulumi for Terraform and AWS CDK for CloudFormation, allow infrastructures to be defined with common programming languages and object-oriented methods. Both tools are still new and have their teething pains – but they look promising.