Published on: October 18, 2023
In the world of distributed systems, making the right design choices is crucial for performance and reliability. The CAP theorem provides a framework to understand the trade-offs between Consistency, Availability, and Partition Tolerance. In this article, we'll explore what each pillar means, why you can only pick two, and how Crudly fits into this landscape.
Let's start with a brief definition of each of the 3 pillars:
High consistency means that all parts of the system see the same data all of the time. A high consistency system would produce the same result for the same read operation made at the same time. This might seem obvious if you've not worked with databases before, but in larger systems we often get a little bit better performance by embracing “eventual consistency”: the idea that not all reads need to be up to date.
High availability simply means the system is always able to produce a response. A high availability system will always be able to provide users with a response (not necessarily the most up-to-date data) whereas a low availability one may sometimes fail. Again, high availability sounds like something you always want, but in some systems we can get away with sacrificing it.
Partition tolerance is a tricky one to get your head around. When we scale systems horizontally (i.e. adding more replicas rather than increasing resources on a single instance) we end up with a partitioned system where no one instance is responsible for all the data or operations. As the system is partitioned, we often rely on the network connections between these partitions to keep the system running. Partition tolerance concerns how well a system can be partitioned in this way. Most of the time this is just a yes/no question. Either the system can have partitions or it can't.
Here's the crux of CAP theorem: you only get to choose 2 of the 3! For example, if you want availability and consistency you can't have partition tolerance. This is not some magic rule imposed by the elders of software development, it is a fundamental limitation in the construction of distributed systems. Let's take a look why with some real world examples:
Single node SQLite database: when running a single database node we have no partitions so partition tolerance is not a concern. SQLite will make sure all reads get the latest up-to-date data so we have consistency. We will also always be able to read from the database so we have availability. If we were to introduce partitions to the system, we would have to account for network failures which would involve sacrificing consistency or availability.
HBase: HBase is a distributed (partition-able) database from Apache that can be partitioned. During partition failure the database will shut off incoming connections sacrificing availability. Incoming connections won't be accepted until network conditions are restored and data is synced again so we have consistency. If we wanted to increase the availability of such a system, we would lose consistency since partitions might not be up to date.
Amazon DynamoDB: DynamoDB is a distributed database that allows incoming reads even when there are network failures between partitions giving us availability. This sacrifices consistency as writes could occur on a partition disconnected from another partition where the data is subsequently read meaning the data is out-of-sync. If we wanted to increase the consistency of this system, we would lose availability since not all requests can get a response if there is a network failure since the partitions might be out-of-sync.
When using a cloud database, the concept of partitions is often abstracted away. Internally within Crudly we use a distributed Postgres cluster. Postgres clusters have a strong consistency protocol which will not accept reads in the event of network failures between partitions. This means with Crudly you're getting CP (Consistency and Partition Tolerance). This means you'll always be able to get the latest up-to-date data for your application but if it's not available, we'll politely ask you to come back later… The advantage of using Crudly however is that you're not responsible for managing partitions. We'll do all the work there and make sure your system sees the highest availability possible.
When building any distributed system, there will always be trade-offs to consider in the design. CAP theorem gives us a great framework for understanding those trade-offs and building systems accordingly. With Crudly, we offer a CP (Consistency and Partition Tolerance) system, allowing you to focus on your application while we handle the data complexities. Understanding CAP theorem is not just theoretical; it's practical wisdom for making informed decisions in today's world of distributed systems.