surendra vsr
Blog entry by surendra vsr
Deliver infrastructure and the software running on it rapidly and reliably at scale.
Deliver infrastructure and the software running on it rapidly and reliably at scale.
To successfully implement these principles, patterns, and practices, a certain level of organizational maturity is required. While this article doesn't focus on the cultural aspects, they are crucial for successful adoption.
The examples in this article use Terraform and AWS, but the principles, patterns, and practices are generally applicable to other IaC tools like Pulumi, CloudFormation, and to cloud providers like GCP and Azure, or even on-premise environments.
What is Infrastructure as Code? Infrastructure as Code (IaC) is an approach that applies proven coding techniques to infrastructure. It is a key DevOps practice that enables teams to deliver infrastructure, and the software running on it, rapidly and reliably at scale.
For continuous delivery of your applications, a rapid and reliable provisioning mechanism for your infrastructure is essential.
In this article, we will explore various principles, patterns, and practices that have proven valuable in my experience and the organizations I have worked with over the years.
Key Principles
Before diving into patterns and practices, let's review the key principles for effective IaC.
Idempotency Idempotency ensures that no matter how many times you run your IaC, you will achieve the same end state, regardless of the starting state. This principle simplifies infrastructure provisioning and reduces the risk of inconsistent results.
You can achieve idempotency by using a stateful tool with a declarative language, like Terraform. In Terraform, you define the desired end state of your infrastructure, and the tool ensures that the state is achieved. If it cannot reach the desired state, it will fail.
In Diagram #1 below, you can see that non-idempotent IaC may provision 6 VMs instead of the desired 3 if run twice. In contrast, idempotent IaC will provision only the 3 VMs, ensuring reliability and consistency.
Immutability Configuration drift occurs when changes to infrastructure are not recorded, causing environments to diverge in ways that are not easily reproducible. This issue is common with long-lived mutable infrastructure, which can become brittle over time due to problems like slow memory leaks or disk space exhaustion.
Immutable infrastructure addresses this issue by replacing existing infrastructure with new versions instead of modifying the existing infrastructure. This approach ensures reproducibility and prevents configuration drift.
Immutable infrastructure also supports scalability in cloud environments. In Diagram #2 below, you can see that for mutable infrastructure, version 2 of an application is deployed on the same servers as version 1, while immutable infrastructure provisions new VMs for version 2.
Patterns and Practices
Everything in Source Control
All IaC code, including occasional scripts and the pipeline used to provision infrastructure and deploy software (pipeline as code), should be in source control. This practice ensures that the code is accessible to everyone in the company, even developers who don't make changes to the IaC codebase.
Having all code in source control provides visibility and understanding for those who run applications on the infrastructure. It also prevents situations where critical scripts are unknown or inaccessible.
Modularize and Version Modularizing IaC, like software code, aids in maintenance, readability, and ownership. It also allows for smaller, independently deployable changes. While refactoring IaC is challenging, especially for critical components like DNS records, CDN, network, and databases, over-abstraction upfront is beneficial.
In organizations with separate teams for networking, security, and platform engineering, separating infrastructure layers and assigning ownership to appropriate teams can provide better control. Diagram #3 below shows an example of modular deployment to Amazon Elastic Kubernetes Service (EKS), with different modules for each infrastructure layer.
Versioning modules is crucial to prevent breaking changes in production unless using a monorepo where the latest version is always used.
Documentation While IaC codifies most information, some documentation is still necessary. Good documentation benefits both the team maintaining the IaC and the consumers of the infrastructure.
Documentation should be concise and kept close to the code, such as in a README file within the same repository. This proximity increases the likelihood of updates and can serve as a reminder during the pull request process.
Testing Testing IaC is as important as testing software, and various levels of testing are necessary. Here is a test pyramid for IaC:
- Static Analysis: Run as often as possible, even locally. Tools like
terraform validate
andTFLint
can automate static analysis. - Unit Testing: While often unnecessary for declarative IaC, unit tests can be useful for conditionals or loops. Bash scripts can use
bats
, and Pulumi supports testing with frameworks in languages like TypeScript, Python, Go, or C#. - Integration Testing: Verify resource provisioning in an environment and ensure requirements are met. Do not test the declarative tool's functionality but focus on aspects like security group rules. Use tools like Chef InSpec and Goss, and consider using an ephemeral environment for testing.
- Smoke Test with Dummy Application: Deploy a dummy application to verify infrastructure provisioning. This test should mirror the real application's environment and can be run after provisioning and periodically.
Security and Compliance Security and compliance are critical aspects often overlooked in IaC. Automating these checks can enhance security and streamline compliance processes.
- Identity and Access Management (IAM): Implement robust IAM for IaC and the infrastructure it provisions, using Role-Based Access Control (RBAC) to minimize the attack surface.
- Secrets Management: Use reliable secrets management tools like HashiCorp Vault or AWS Secrets Manager. Avoid storing secrets in the state file, but if necessary, ensure they are encrypted.
- Security Scanning: Run security scans in lower or ephemeral environments to detect vulnerabilities and ensure best practices are followed. Tools like CIS Benchmark and Amazon Inspector can help.
- Compliance: Automate compliance checks using tools like Chef InSpec or HashiCorp Sentinel, especially for industries with strict requirements like healthcare or finance. These checks can be run on each change to the IaC, using an ephemeral environment to catch issues early.
Automate Execution from a Shared Environment
Bringing all the steps together, execute IaC with appropriate checks in sequence to provision infrastructure confidently across various environments. There are two main approaches:
- Infrastructure as Code Pipeline: An example pipeline sequence is provided below, using CircleCI. Any pipeline tool can be used to execute this. The pipeline offers visibility and alerts relevant teams in case of failure.
- GitOps: Extending IaC, GitOps adds a workflow (pull request process) for applying changes to production or other environments. It includes a control loop to ensure the actual infrastructure state matches the desired state. GitOps can replace the IaC pipeline. For more details, see the documentation on the Weaveworks website.
GitOps = IaC + (Workflow + Control Loop)