Posts tagged as

30 posts

Lifecycle Management and Infrastructure Cleanup: The Overlooked Challenge of Efficiently Decommissioning Resources

In today's fast-paced world of cloud computing, the importance of full lifecycle management in cloud infrastructure cannot be overstated. Many organizations are quick to provision resources in cloud platforms like AWS, GCP, and Azure, but all too often, the critical decommissioning phase is overlooked. This oversight carries both financial and security implications that can be detrimental to a company's bottom line and data integrity.

Simplifying Multi-Cloud, Multi-Tenant SaaS Deployments with Infrastructure as Code (IaC)

The cloud landscape has evolved significantly in recent years, moving businesses beyond the confines of single cloud providers. Enterprises are increasingly embracing multi-cloud strategies to leverage the strengths of various cloud platforms. As they journey into this multi-cloud world, the intricacies of managing their infrastructure through Infrastructure as Code (IaC) become apparent, especially in the context of Software as a Service (SaaS) platforms operating in multi-tenant environments. In this blog post, we will explore the rise of multi-cloud deployments, the role of IaC as the glue that holds it together, the unique challenges posed by SaaS in a multi-cloud world, and best practices and solutions to navigate these complex scenarios.

Day 2 Operations - SaaS Maintenance

I’ve likely performed thousands of operational tasks during my career, but one sticks out clearly for me over a decade after I performed it. It was a major version upgrade of a decently-large Cassandra cluster while I was working at Signal, with between 8 and 64 nodes in each of 4 separate geographic regions in AWS - two in the US, one in Europe, and one in Japan. I think in total the cluster had about 160 nodes.

The upgrade included a data format change, which meant each node had to go through a lengthy step-by-step process including a drain, clean shutdown, data migration, software upgrade, startup, re-sync, and finally after passing health checks I could do the next node that shared the same token space. Nodes which didn’t share the same token space could in theory be upgraded at the same time, but we had to keep a specific percentage of nodes working in each region at the same time to maintain operational functionality.

SaaS - What Does it Really Entail?

What does it really mean for software to be available as a service? Sure, there are plenty of dry definitions out there, but I think a lot of us like to adhere to the ideology of “I know it when I see it” even if we wouldn’t readily admit it. What I’d like to explore in this post is the makeup of what I personally would consider to be the table stakes features of a modern SaaS; and spend a bit of extra time going over the areas which you’re probably far more likely to gloss over. If you think I’ve missed anything, be sure to leave your thoughts in the comments below!

Challenges in Building SaaS Billing

By far the most consistent area of focus I’ve had in my career is monitoring and observability. Way back in 2005 while at Orbitz, I had the opportunity to learn from and work alongside Chris Davis, the original author of Graphite. It was an enormous success both at Orbitz and within the industry as a whole, and it’s been an honor and privilege to essentially ride that initial wave of monitoring innovation my whole career.

6 years later, another adjacent area emerged as my second-most common area of focus, and that is the collection of specific metrics used to quantify end-user usage, which are typically used to generate the bills for SaaS companies which have any sort of pay-per-use cost dimensions. Even including Omnistrate, I’ve now had direct involvement with the design and development of the metrics systems and/or billing integrations for the past 5 companies I’ve worked at over the past decade, and it’s incredible to me how similar the problems and solutions have been even when the industry or design of each business’ technical architecture has been so different from each other.

In today’s post I’d like to explore the internal complexities of SaaS billing systems and what challenges and design patterns for addressing them keep showing up. Hopefully by the end of this you’ll have a better understanding of how SaaS billing works and a blueprint for how to go about implementing it yourself if you need to.

Provisioning and Deployments - Your SaaS Foundation

Before the cloud, getting fresh hardware and deploying your new software was always done in big budget events involving procurement, finance, several IT teams and the whole process included no small amount of arguing most of the time.

Now that we have the ability to just press a button or make an API call to have a trove of shiny, powerful VMs added to our fleet at a moments’ notice, surely all the other problems regarding provisioning have been simplified as well, right?

Why Enterprises are Shifting to SaaS

In today's fast-paced digital landscape, the way enterprises consume and leverage software is undergoing a profound transformation. No longer confined to the traditional bounds of on-premises solutions and cumbersome software updates, businesses worldwide are pivoting towards Software as a Service (SaaS) platforms.

Beyond the obvious allure of cost savings and scalability, what deeper transformations are driving this shift? As we delve deeper into this transformative journey, we'll uncover the compelling reasons why enterprises are increasingly putting their trust in SaaS solutions and how this paradigm shift is reshaping the future of business technology.

Diving Deep into Control and Data Plane Responsibilities

Building upon our initial introduction of the control and data planes, this post delves deep into the interplay between these two planes. Shedding light on their individual roles and their combined impact on the overall SaaS experience. As we navigate through this post, we'll uncover the magic behind how these components work in tandem to ensure seamless operations, swift data transfers, and a scalable environment.

Everything about Scaling

Scaling - it’s the reason we’re all using this cloud thing anyway, right? Surely all of your applications have been tested to effortlessly scale from 0 to 1,000 in milliseconds, and your databases can rebalance after scaling within minutes with zero impact to anything important, correct?

Don’t worry, I don’t think anyone has fully cracked this nut. But why is this? What makes it so difficult to actually get ALL the benefits of the infinitely flexible cloud?

Cloud Platform Monitoring and Auto-Recovery Challenges - Part 2

The Complications and Strategies

In the first post of this two-part series, we introduced primary topics under the umbrella of cloud platform monitoring and went into a bit of detail for how they present specific challenges. In this follow-up post we’ll explore some of the state-of-the-art strategies for dealing with these issues and the additional complications that will arise when utilizing these techniques.