[ACCEPTED]-Deciding between an artificial primary key and a natural key for a Products table-primary-key

Accepted answer
Score: 47

This is a choice between surrogate and natural primary keys.

IMHO always favour 10 surrogate primary keys. Primary keys shouldn't 9 have meaning because that meaning can change. Even 8 country names can change and countries can 7 come into existence and disappear, let alone 6 products. Changing primary keys is definitely 5 not advised, which can happen with natural 4 keys.

More on surrogate vs primary keys:

So surrogate keys win right? Well, let’s 3 review and see if any of the con’s of 2 natural key’s apply to surrogate keys:

  • Con 1: Primary key size – Surrogate keys generally don't have problems with index size since they're usually a single column of type int. That's about as small as it gets.
  • Con 2: Foreign key size - They don't have foreign key or foreign index size problems either for the same reason as Con 1.
  • Con 3: Asthetics - Well, it’s an eye of the beholder type thing, but they certainly don’t involve writing as much code as with compound natural keys.
  • Con 4 & 5: Optionality & Applicability – Surrogate keys have no problems with people or things not wanting to or not being able to provide the data.
  • Con 6: Uniqueness - They are 100% guaranteed to be unique. That’s a relief.
  • Con 7: Privacy - They have no privacy concerns should an unscrupulous person obtain them.
  • Con 8: Accidental Denormalization – You can’t accidentally denormalize non-business data.
  • Con 9: Cascading Updates - Surrogate keys don't change, so no worries about how to cascade them on update.
  • Con 10: Varchar join speed - They're generally int's, so they're generally as fast to join over as you can get.

And 1 there's also Surrogate Keys vs Natural Keys for Primary Key?

Score: 10

In all but the simplest internal situations, I 39 recommend always going for the surrogate 38 key. It gives you options in the future, and 37 protects you from unknowns.

There's no reason 36 why additional keys, like an SKU, couldn't 35 be made non-null to enforce them, but at 34 least by removing your reliance on third-parties 33 you're giving yourself the option to choose, rather 32 than having it taken from you and enduring 31 a painful rewrite at a later stage.

Whether 30 you go for the auto-incremented integer 29 or determine the next primary key yourself, there 28 will be complications. With the auto-incremented 27 method, you can insert the record easily 26 and let it assign its own key, but you may 25 have trouble identifying exactly what key 24 your record was given (and getting the max 23 key isn't guaranteed to return yours).

I 22 tend to go for the self-assigned key because 21 you have more control and, in sql server, you 20 can retrieve your key from a central keys 19 table and ensure nobody else gets the same 18 key, all in one statement:


UPDATE  KeyTable
WITH    (rowlock)
SET @Key = LastKey = LastKey + 1
WHERE   KeyType = 'Product'

The table records 17 the last key used. The sql above increments 16 that key directly in the table and returns 15 the new key, ensuring its uniqueness.

Why you should avoid alphanumeric primary keys:

Three 14 main problems: performance, collation and 13 space.

Performance - there is a performance 12 cost though, like Razzie below, I can't 11 quote any numbers, but it is less efficient 10 to index alphanumerics than numbers.

Collation 9 - your developers may create the same key 8 with different collations in different tables 7 (it happens) which leads to constantly using 6 the 'collate' commands when joining these 5 tables in queries and that gets old really 4 quickly.

Space - a nine-character SKU like 3 David's takes nine bytes, but an integer 2 takes only four (2 for smallint, 1 for tinyint). Even 1 a bigint takes only 8 bytes.

Score: 4

The ever present danger with natural keys 16 is that either your initial assumptions 15 will be proven wrong now or in the future 14 when some change is made outside your control, or 13 at some place you'll need to reference a 12 record where passing a meaningful field 11 is not desired (ex. a web application that 10 uses an employee's social security number 9 as the primary key, and then has to use 8 urls like /employee.php?ssn=xxxxxxx)

From 7 my own personal experience with "unique" SKU's 6 and vendor data feeds - are you absolutely sure they are 5 sending you a feed with complete, unique, well 4 formed SKUs?

I've had to personally deal 3 with all of the following when getting feeds 2 from vendors who have varying levels of 1 IT and clerical competence:

  • Products are missing their SKU entirely ("")
  • Clerks have used placeholder SKUs in their database like 999999999 and 00000000 and never corrected them
  • Those doing the data entry or importation have confused between various product numbers, mixing up things like UPC with SCC, or even finding ways to mangle them together (I've seen SCC codes with impossible check digits at the end, because they just copied the UPC and added 01 or 10, without correcting the check digit)
  • For special reasons, or just incompetence, the vendor has entered the same product twice in their database (for example rev. 1 and rev. 2 of the same motherboard have the same SKU, but exist as 2 records in the vendors database and data feed because rev 2. has new features)
Score: 2

I'd also go with an auto-increment primary 6 key. The performance impact for having an 5 alphanumeric primary key are there, though 4 I don't dare name any numbers. However, if 3 performance is important in your application, all 2 the more reason to go with the autoincrement 1 primary key column.

Score: 1

I'd advice on having an autoincremented 4 "meaningless" integer as primary key. Should 3 someone come up with the idea of reorganizing 2 product IDs, at least your DB won't get 1 into trouble.

Score: 1

Pretty similar to my question a few months 2 ago...

Should I have a dedicated primary key field?

I went with an auto-incrementing PK 1 in the end.

Score: 1

Since you're dealing with data from multiple 5 vendors outside of your control, I would 4 use a surrogate key. You don't want to 3 have to rearchitect your database design 2 one day when one of them happens to send 1 you a duplicate.

Score: 1

A surrogate key (auto increment INT field) will 14 uniquely identify a row in the table. On 13 the other hand, a Unique Natural key (productName) will 12 prevent duplicate product data from entering 11 the table.

With a unique Natural key field, two 10 or more rows can never have same data.

With 9 a surrogate key field, Rows can be unique 8 because of the auto increment INT field 7 but data in rows will not be unique because 6 the surrogate key has no relation to the 5 data.

Lets take an example of a User table, the 4 table's Natural key field (userName) will 3 prevent same user from registering twice 2 but the auto increment INT field (userId) will 1 not.

Score: 0

If every product will have a SKU and the 3 SKU is unique to each product, I don't see 2 why you wouldn't want to use that for a 1 possible primary key.

Score: 0

You could always take a hash of the SKU which 10 would get rid of the alphas. You'd have 9 to code for possible collisions (which should 8 be very rare) which is an added complication.

I'd 7 use the hash to populate the primary key 6 and make the inital import easy but when 5 using it in the dB always treat it as if 4 it were a random number. That way the primary 3 key will loose it's meaning (and have all 2 the advantages of an auto-incremented key) allowing 1 flexibility in the future.

More Related questions