UUIDs Unveiled: Why Modern Data Relies on Unique Identifiers

Published on: Nov 4, 2023

When working with data in digital applications, we often need a unique identifier to refer to individual records. The unique identifier allows us to distinguish between two records even if they might have identical data.

In some scenarios we can simply re-use some existing data for this. In a table of users, for example, we might use the email address as the unique identifier. In other cases, we need to generate a unique identifier ourselves.

The Traditional Approach: Enumerable IDs

The simplest way to generate a unique identifier is to simply count up from 1. This is how many databases work by default. The first record created in the database is given the ID 1, the second 2, and so on.

There are a couple of drawbacks to this approach. Firstly, it doesn't scale so well. As your system grows, you might end up creating thousands or even millions of records per second. Your system will also likely be spread out geographicaly, with different servers handling different requests. This makes maintaining a single counter extremely difficult.

The second issue is around security. If your system has some kind of insecurity allowing an attacker to fetch records by ID, an attacker can easily pull all records simply by pulling ID 1, 2, 3 etc. They simply have to iterate over all the numbers. This is called an enumeration attack.

Enter UUIDs: A Modern Solution

Having seen the issues with simple integer IDs, let's explore the more modern standard.

UUID stands for Universally Unique Identifier. It is a 128-bit number that is typically represented as a string of 32 hexadecimal digits separated into 5 groups separated by hyphens. For example:

9c0b58fc-b00d-402c-9675-692c3659bd64
39fe434d-7a32-4c9d-9e9b-785ed1ced88b
9d81b76d-dad2-4652-b4fb-34bdd434834f

In fact, we have our very own random UUID generator you can play with here!

We'll go over some of the specific algorithms of how UUIDs are generated later but for now let's just say they're generated randomly. Being 128 bits long means the probability of ever picking two identical UUIDs is almost zero. This means we can easily scale our system as IDs for objects can be generated anywhere at anytime.

Not generating the IDs sequentially also means we don't have to worry about enumeration attacks. Even if an attacker can get into our system, they would have to guess the UUID of each record which will also be near impossible.

Different Versions of UUIDs

Let's take a look at how UUIDs are generated and how it's evolved over the years.

In order to keep track of different UUID versions over the years, a standard was maintained that the 13th character (the first digit of the 3rd group) of the UUID would be a number representing the version. 4 bits are reserved for this. There are also an extra 2 bits reserved for the variant.

Version 1

V1 UUIDs were introducted in late 1980s and were not actually random! They were generated by combining the system's current time with a "node identifier" i.e. something that identifies the specific device/server the ID was generated on. This was usually just the MAC address.

Version 2

V2 came in the 1990s and was very similar to V1 with minor technical changes, we won't discuss it here.

Version 3

V3 came in the late 1990s and were deterministic based on 2 parameters. The namespace and the name. The namespace is a UUID itself and the name is just a string. The V3 UUID is generated by taking the MD5 hash of the combined namespace and name.

Version 4

V4 came in the early 2000s and is arguably the most popular way to generate UUIDs. They are just generated randomly! All 122 bits (the total 128 bits without the version and variant bits) are generated randomly.

Version 5

V5 is also from the early 2000s and is the most recently introduced. It was simply an iteration of V3 to use the SHA-1 hash instead of MD5.

Motivations Behind Different UUID Versions

Version 1 and 2 UUIDs allowed easy guarantees of uniqueness when using multiple machines. It also meant that the UUIDs themselves could be traced back not only to the time they were generated but also the machine they were generated on. This was useful for debugging purposes, but in many scenarios presents some security concerns.

Version 3 and 5 UUIDs take the approach of determinism, meaning you can do easy collision detection for records you don't want duplicated such as user emails.

Version 4 are the simplest to generate and are the most popular. Simply generating randomly means collisions are extremely unlikely. The birthday paradox tells us that you'd have to generate around 2 sextillion (2,000,000,000,000,000,000,000) random UUIDs before you could expect a collision!

UUIDs in Crudly

In Crudly, we use V4 UUIDs for all our unique identifiers. Every entity in a table will automatically get a random UUID generated that can be used to identify it. This means you can easily scale your system without worrying about collisions.

Conclusion

There are many ways to uniquely identify records in your database. V4 UUIDs are arguably the easiest and best scaling. By using Crudly to manage your data, you'll have an easy-to-use and scalable way to identify your entities, ensuring both security and efficiency in your application's backend.

Get started with Crudly today

Blog

Pricing

Documentation