Beyond Terraform: Building Scalable SaaS Infrastructure
We think that Terraform is an excellent IaC tool if you want to setup your cloud infrastructure once. However, when it comes to SaaS, we feel that it falls short on many grounds.
- Not tenant-aware: If you have a need to deploy infrastructure per tenant, there is no such concept. You will have to manage multiple state files, which is a big challenge.
- Not ACID compliant:
- Not Atomic: In the case of any error during the apply phase, Terraform leaves the infrastructure in a broken state and leave it to DevOps to perfrom manual recovery. For SaaS with thousands and millions of tenants, this can be quite challenging. does not automatically rollback to the previous state. This may leave the infrastructure in a partially provisioned state.
- Not Consistent: Due to the lack of basic recovery mechanisms, Terraform can leave the underlying infrastructure in an inconsistent state
- Not Isolated: There is no built-in mechanism to run Terraform commands concurrently on the same state files. If two team members try to apply changes at the same time, they might face conflicts or undesirable outcomes. To avoid this, you will need to implement a state locking mechanism or follow certain operational practices.
- Not Durable: By default, the state files are kept local and require explicit mechanism to store them durably for each tenant.
- No Versioning: In real world, your customers will be updating the infrastructure all the time but the TF doesn't have any native versioning
- No Day-2 Infra support: Terraform is just limited to provisioning the infrastructure and leaves the big part of operating the infrastructure (from patching, monitoring, alerting, capacity planning, failure handling to evolution) to the SaaS providers
- No Multi-cloud support: Terraform requires manually creating and maintaining scripts for every cloud provider. Every time, there is any change, you have to manually update all the terraform scripts, run them appropriately across thousands of state files manually, handle any issues and manually fix them. At scale, this becomes quite unmanageable. In our previous experiences, we gave up on Terraform within a few months as we realized that it doesn't work at scale for SaaS use-case.
- Not Cloud-native:
- SaaS capabilities: There is no support to manage different accounts, VPC, networks for deployments natively. As a result, there is no native way to support BYOA models in Terraform
- Drift Detection: Terraform struggles with drift detection, which means understanding if the actual state of resources has changed outside of Terraform since the last terraform apply. Terraform can refresh its state file before making changes to help mitigate this, but unexpected changes can still cause problems.
- Error handling: Terraform can be somewhat vague in the errors it produces, making it hard to debug complex scripts.
Omnistrate aims to address the IaC gaps in SaaS by providing a native solution that also enhances your existing Terraform setup with these capabilities—no modifications needed.
Now, IaC (or infrastructure management) is just one piece to the bigger SaaS puzzle. For SaaS control plane, you have to handle many other pieces:
For more details on what we do, please see this page
Comments