October 26, 2023 (1y ago)

Why Normalize?

We all read about data normalization and why is it important at some point during our computer science studies.

For the uninitiated, data normalization is the process of reducing data redundancy and improving data integrity. There are different levels of normalization as well, called 1NF(Normal Form), 2NF, BCNF etc.

The main goal of normalization is to reduce data redundancy and inconsistency. It also has the added advantage of addressing the insert, update and delete anomalies if the normalization / database design is done properly.

But, why? We know why it's important and what advantages it gives, but why did we come up with this whole concept of normalization in the first place?

Cost. It was always about the money!!

Earlier, data storage was really expensive and people's time was not. Hence we as engineers decided to optimize for data storage. We wanted to store the least amount of data which would suffice and in the process came up with data normalization.

When data is normalized, it's generally broken up into multiple parts and stored in separate tables. And when it's required, we process that data(filter, fetch, join etc.) and then show it to the user. This takes time, hence highly normalized databases while saving time, cost us more time to fulfill the same query.

Data storage is no longer that expensive. But now people's time has become really expensive and as a result we started optimizing for that instead.

By denormalizing(or not normalizing) the data, we are essentially saving up of the processing(filter, fetch, join etc.) time and showing the data to the user faster, even if it consumes more storage.

This is the same reason we have seen a lot of NoSQL databases come up in the past decade. Be it DynamoDB, MongoDB, Redis etc., all of them have some level of denormalized data present.

So, how does these new databases take the insert, update and deleting anomalies into account?

NoSQL databases generally have some or all of the following properties to help them out:

  1. Schema Flexibility
  2. Distributed Updates
  3. Atomic Operations
  4. Tombstone Records
  5. Conflict Resolution
  6. Versioning

Also, it's very important to keep in mind, the why and the philosophy of SQL and NoSQL databases.

SQL databases were and are meant for more consistent workloads, like banking systems where there are huge consequences for any kind of anomalies. On the other hand, NoSQL databases were meant for speed and gets away with eventual consistency, like the number of likes you have on your social media post.

This was just a quick peak into why we started normalizing databases and how it helps us out still today!