![]() Many of the world's biggest companies generated UUIDs at speeds that made all of these deficiencies a problem.Ī cambrian explosion of UUIDs resulted, as noticed by the IETF - this resulted in the new UUID formats (v6,v7,v8) being published in 2021. UUIDv4s index terribly, as they're essentially random values (obviously, they SORT terribly as well).UUIDv1s are less random than UUIDv4, and can collide/overlap in close enough time intervals, at large scale.UUIDv1s contain a time element but they're not lexicographically sortable (this means they SORT terribly, relative to integer or a timestamp column).UUIDs are twice the size of bigint/ bigserial.There are a couple shortcomings that plague both v1 and v2: UUIDv1 and v4 were a start, but weren't enough for many companies out there. The Post-UUIDv1/v4 era: A Cambrian explosion of identifiers This is a workable solution, but as you might expect, it's not that easy. Since Postgres would catch a collision on a PRIMARY KEY or UNIQUE INDEX column, we're done right? If we want to generate UUIDs all we need to do is choose UUID v1 or V4, and we won't leak any schema structure information to the outside world, right? MAC addresses uniquely (usually) identify network cards - which is a security risk - and those bits can be made random. a 60 bit date-time (at nanosecond precision)īut where's the randomness? Well v1s assume that you won't generate a ton of values in the same nanosecond (and there are some extra bits reserved for differentiating even when you do), but another source is the MAC address.Version 1 UUIDs have three two components: There are a lot of versions of UUID, but let's discuss the ones we're more likely to use/see day to day. More importantly, UUIDs introduce methodology to the madness - different versions of UUID are derived different ways - combined with other sources of randomness or known values. They're very random (almost always generated with secure random sources), and while they're even worse for remembering, they're near impossible to practically guess - the search space is just too large! (Secure) Random UUIDsĪlong comes UUIDs - you're probably used to seeing them now, values like this UUIDv4: People can still technically check them all (the guessing space is 1 to MAX_RANDOM_USER_ID!).The keyspace is fairly small (maybe good for comments on a popular website, but not for IDs!).These numbers are random and quite inscrutable.But having values like 5832916 get generated are cool and all, but there are a few problems: Now we have a secure random value coming in for our user IDs. This means we can happily go back to using integer/ biginteger: Postgres 10 added support for the IDENTITY column syntax in CREATE TABLE (EDB has a great writeup on the addition). not SQL standards compliant)ĭon't be too put off by these reasons - serial is still the go-to for most use-cases.Įven the last point about serial not being standards compliant is solved in Postgres 10+ by using. When used from outside code serial may leak some data or give attackers an edge (e.x., if /users/50 works, how about /users/51?).if an INSERT was rolled back - sequences live outside transactions). When writing automation that simply iterates through id values, note that serial columns can have gaps, even if you never DELETE (e.x.We've taken a somewhat meandering path to get here, but this is the standard solution for most reasonable database schemas. ![]() Multiple application instances don't need to coordinate what ID to use - they just don't, and find out from the database. What's the first way you might think of identifying things? Assigning them numbers!ĪLTER SEQUENCE tablename_colname_seq OWNED BY lname īack in application land, the INSERT statement returns, and provides the new id the database assigned our new row. Let's think about identifying rows of data from first principles. A brief history of identifiers and why we use them integer/ biginteger How can we get these UUIDs into postgres?īut first, a quick history lesson.If we choose to use/add UUIDs, which ones should we choose?.Turns out the question of which identifier (and in this case, UUID) to use is complicated - we're going to dive into some of the complexity and inherent trade-offs, and figure things out: Sometimes it makes sense to use a “natural key” (like an email column in a users table) and sometimes it's better to use a “ surrogate key”, a value made for the purpose of identifying a row (and has no other meaning).Īt first glance, the question of which primary key to use is easy! Just throw a integer/ serial on there, right? Numeric IDs are cool, but what about random value IDs or Universally Unique IDentifiers (UUIDs)? The job of a database is to archive and recall data and you're going to have a hard time finding data without a good primary key or a good index. They uniquely identify rows of data in tables, and make it easy to fetch data. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |