Any system starts out with a small capacity to provide value. As the demand for the system goes up, its capacity should also be accordingly scaled for good performance. In the context of software, the topic of scalability comes up when a developer needs to increase the capacity of their system.
Scalability is one among the three important aspects of designing better cloud infrastructure, the other two being reliability and maintainability which we will touch upon later in this series.
Lets zoom out of software for a moment and understand scalability through a real-world parallel.
Suppose you’re an Architect and you’re tasked with building homes for two families. You started out by building one apartment with two homes. Now, 6 more families want to buy a property designed by you. How would you decide to accommodate the other 6 families’ needs? Would you add more floors to your existing apartment or would you rather build two apartments with three homes in each?
You get the drift?
That’s somewhat close to what a cloud architect would think of except in terms of scaling software applications.
So what exactly has to scale here?
Anything that helps a system to handle its workload— such as processing, memory, network and I/O resources— have to be scaled, although a lot of times scaling is used in reference to Databases.
What was scaling IT resources like a few years back?
So back then, the idea of scaling an on-premise application involved buying costly hardware which would take a lot of time to arrive at the organisation (or individual’s) location. These costly additional resources that were meant to help an increased load such as a Black Friday sale would not be properly utilised during other times when the demand was low. This over-provisioning of IT resources meant high fixed costs for businesses.
The consequence of not scaling IT resources of a business is dire. It means lost revenue opportunities when the business is generating a lot of demand. Customers will have longer wait times and bad experience with your app.
Much of these challenges in on-premise applications have been addressed by cloud applications as they offer better scalability, reliability and maintainability.
Do you know how your system is going to be used?
One component that is most critical to a business is their database as it handles more and more of customer data. How your app is going to be used will play a big role in your choice of database technology and its ability to scale.
Is your app used for browsing for most of the time (Read-heavy) or is it used for doing something on it like purchasing, commenting (Write-heavy)?
If you’re going to build a read-heavy system like a blog or an online shopping site, you can scale your database vertically with a relational database such as MySQL or PostgreSQL along with a good caching system. So when there’s too much to store, some of this data can be moved to cache to reduce too many reads to your database. Further, when there’s nothing to cache, you can move to a database hardware with more CPUs or better performing disks. Read-heavy systems are easier to scale and in general follow vertical scaling.
If you’re going to build a write-heavy system like events registration, user survey, analytics app etc. Cassandra, MongoDB, HBase, Riak are preferred as they allow for horizontal scaling.
On a high-level, these are the differences between vertical and horizontal scaling.
|Vertical Scaling||Horizontal Scaling|
|Increasing CPU, GPU RAM, memory and other resource of an existing server or machine. Eg: ~Adding more floors to a building.||Increasing CPU, GPU, RAM, memory and other resources by adding new servers or machines of same capacity. Eg: ~Adding more buildings besides the first building.|
|It’s typically costly as additional resources become fixed costs.||It’s cost effective as these resources are utilised on a need basis.|
|It may have downtimes as you might have to restart your system||It scales dynamically with no downtime.|
Let’s imagine I have hosted an nice product-market fit app on AWS and it goes viral (for good reasons). It receives a lot of hits — way more than what I expected. These visits might affect the performance of the app as a whole as the servers have to attend to several requests at a time from people across the global. I would have to better manage the incoming load and provide optimal experience for all my visitors.
I can accomplish this by vertical scaling where I opt to increase the instance size of AWS. But this could burn a hole in my pocket in the long-term. I would need a bigger instance size only when there is steady high demand for my app.
To give you some idea of how expensive increasing the capacity of my servers can get, lets refer to on-demand pricing of AWS EC2.
|AWS EC2 type||CPU||RAM||Cost|
|m4.2xlarge||8||32 GB||$307 per Month|
|m4.4xlarge||16||64 GB||$613.2 per Month|
|m4.10xlarge||40||160 GB||$1533 per Month|
If I need say an additional 16GB of RAM and I scale up from type 1 to type 2, my costs would almost double! Besides the cost, having to rely on a single server might make it prone to some type of server failure (also known as fault intolerance)
So another method I can try is horizontal scaling where I can opt to use several simple servers than rely on one powerful one. I can increase the number of instances of EC2 in AWS. This way, I make sure a single server does not always have to handle all the traffic and it gets distributed across the additional servers. When requests to a server per second is reduced, there are lesser threads in that system which will in turn improver performance.
In general, horizontal scaling is considered as a best practice in the tech industry as it allows automatic addition or subtraction of servers on the fly. This ensures continuous availability of your app, better distribution of resources, and optimal cost (pay-per-use).
Pic: Google’s four-storeyed data center in Oklahoma taken by Google.