ARCHITECTING -️ January 9, 2023

When should you use SQL vs NoSQL?

You're the architect on a new project x and you need to store data y, which one would you pick?

Obviously SQL right? It's straightforward, been around longer, you got constraints, relationships, joins on data, the language is declarative and easy to use. If you have more demand for queries amp up the resources, and if that fails put write requests in a queue and have many read replicas.

The last two sentences of the prior paragraph says a lot about the answer to this question. You need to do a lot more to scale a relational database system (RDBMS) and even then you become limited.

There is a tradeoff as you go, and its easily explained by the CAP theorem. Consistency, Availability, and Partition Tolerance. The theory is you can only have two out of the three, and as you move from one to the other, you lose "something" in return.

SQL/RDBMS have been the goto database for a long time. They are convenient because they are both available and consistent. The issue comes with scaling horizontally since the data is tightly coupled together from their inherent relationships. There are strategies to combat this, but they all come with tradeoffs.

For example lets say you have users and you partition there data by zip/postal codes. One machine hosts users from a range to another, and so on. In this case you need to proactively shard the database, you need to have x machines running at all times, and you have little control on how to modify each machines resources. It adds a lot of overhead and maintenance to your system.

NoSQL/Non Relational/Not Only SQL, whatever you want to call it was designed to combat this limitation. You say tightly coupled data is your issue? Well lets just remove all the relationships and just have data. Lets call each piece of data a document, we'll still have tables, and indexes. Now since all the pieces of data are generic, and essentially "stateless" it doesn't matter what underlying machine its hosted on, so we can add more servers/computing power as demand is increased vs decreased. Scaling horizontally is easier but what did we lose? As the CAP theorem states if you gain one thing you will lose another.

We still have availability, in fact we are even more available now. We now have partition tolerance since each piece of data is a generic document. It has to be that we lost consistency.

This can be explained from two acronyms ACID which SQL complies by, and BASE which NoSQL typically follows.

A - Atomicity - Transactions consist of many read/writes, if one fails they all do. All incomplete transactions are rolled back to preserve state.

C - Consistency - Constraints, referential integrity and rules are adhered to, only data that fits the mold will be saved.

I - Isolation - One transaction cannot see any other transaction, it is a completely separate entity.

D - Durability - Stored in non-volatile memory. If the power goes out and it comes back online all data is preserved.

SQL dbs comply to the ACID acronym while generally NoSQL complies to BASE.

Basically Available - Guarantees response from the data either success/fail.

Soft State - State of the system changes over time. As resources are needed machines will be scaled up or down.

Eventually Consistent - Updates come into the database and eventually this data is spread to all nodes. Since data is partitioned to many nodes. When you query a NoSQL database depending on the setup you may run into stale data, which means its not the latest updated data.

With all open ended questions the right answer is, "It depends".

You need to be making decisions with different heuristics based on company values.

  • Is frugality one of the main tenants?
  • Is future problem avoidance another?

As a developer you will build your own heuristics over time.

Once you decide on values, then you can look at micro decisions like these, SQL vs NoSQL.

  • How many users are you building for?
  • What is the use case?
  • Internal vs External facing?
  • How much data are you working with?

What are your availability needs? How many nines of availability? How much uptime throughout a year is needed? What are your consistency needs? Is accurate data important for your use case? How important? Is it a necessity, or can it wait? What are your partitioning/scaling needs? Is vertical scaling enough, are you working with big data?

After you answer the above questions and more you will have a clearer picture of which to decide.

Which one do you prefer SQL or NoSQL?

This has been a letter from a coder.

Best Regards, Tyler Farkas

Before you leave

We know, not another newsletter pitch! - hear us out. Most developer newsletters lack value. Our newsletter will help you build the mental resilience and emotional intelligence you need to succeed in your career. When’s the last time you enjoyed reading a newsletter? Even worse, when’s the last time you actually read one?

We call it Letters, but others call it their favorite newsletter.

Delivered to developers every end of the Month
Back to Blog