[ACCEPTED]-Why is it not good to have a primary key on a join table?-junction-table

Accepted answer
Score: 57

Some notes:

  1. The combination of category_id and post_id is unique in of itself, so an additional ID column is redundant and wasteful
  2. The phrase "not good to have a primary key" is incorrect in the screencast. You still have a Primary Key -- it is just made up of the two columns (e.g. CREATE TABLE foo( cid, pid, PRIMARY KEY( cid, pid ) ). For people who are used to tacking on ID values everywhere this may seem odd but in relational theory it is quite correct and natural; the screencast author would better have said it is "not good to have an implicit integer attribute called 'ID' as the primary key".
  3. It is redundant to have the extra column because you will place a unique index on the combination of category_id and post_id anyway to ensure no duplicate rows are inserted
  4. Finally, although common nomenclature is to call it a "composite key" this is also redundant. The term "key" in relational theory is actually the set of zero or more attributes that uniquely identify the row, so it is fine to say that the primary key is category_id, post_id
  5. Place the MOST SELECTIVE column FIRST in the primary key declaration. A discussion of the construction of b(+/*) trees is out of the scope of this answer ( for some lower-level discussion see: http://www.akadia.com/services/ora_index_selectivity.html ) but in your case, you'd probably want it on post_id, category_id since post_id will show up less often in the table and thus make the index more useful. Of course, since the table is so small and the index will be, essentially, the data rows, this is not very important. It would be in broader cases where the table is wider.


Score: 4

It is a bad idea not to have a primary key 9 on any table, period (if the DBMS is a relational 8 DBMS - or an SQL DBMS). Primary keys are 7 a crucial part of the integrity of your 6 database.

I suppose if you don't mind your 5 database being inaccurate and providing 4 incorrect answers every so often, then you 3 could do without...but most people want 2 accurate answers from their DBMS and for 1 such people, primary keys are crucial.

Score: 3

A DBA would tell you that the primary key 4 in this case is actually the combination 3 of the two FK columns. Since Rails/ActiveRecord 2 doesn't play nice with composite PKs (by 1 default, at least), that may be the reason.

Score: 3

The combination of foreign keys can be a 10 primary key (called a composite primary 9 key). Personally I favour using a technical 8 primary key instead of that (auto number 7 field, sequence, etc). Why? Well, it makes 6 it much easier to identify the record, which 5 you may need to do if you're going to delete 4 it.

Think about it: if you're going to present 3 a Webpage of all the linkages, having a 2 primary key to identify the record makes 1 it much easier.

Score: 3

Basically because there's no need for it. The 16 combination of the two foreign key field 15 adequately uniquely identifies any row.

But 14 that merely says why it's not a Good Idea.... but 13 why would it be a Bad Idea?

Consider the 12 overhead adding a identity column would 11 add. The table would take up 50% more disk 10 space. Worse is the index situation. With 9 a identity field, you have to maintain the 8 identity count, plus a second index. You'll 7 be tripling the disk space and tripling 6 the work the needs to be performed on every 5 insert. With the only advantage being 4 a slightly shorter WHERE clause in a DELETE 3 command.

On the other hand, If the composite 2 key fields are the entire table, then the 1 index can be the table.

Score: 2

Placing the most selective column first 14 should only be relevant in the INDEX declaration. In 13 the KEY declaration, it should not matter 12 (because, as has been correctly pointed 11 out, the KEY is a SET, and inside a set, order 10 doesn't matter - the set {a1,a2} is the 9 same set as {a2,a1}).

If a DBMS product is 8 such that ordering of attributes inside 7 a KEY declaration makes a difference, then 6 that DBMS product is guilty of not properly 5 distinguishing between the logical design 4 of a database (the part where you do the 3 KEY declaration) and the physical design 2 of the database (the part where you do the 1 INDEX declaration).

Score: 2

I wanted to comment on the following comment 21 : "It is not correct to say zero or more".

I 20 wanted to remark that the text to which 19 this comment was added simply did not contain 18 the text "zero or more", so the author of 17 the comment I wanted to comment on was criticizing 16 someone else for something that hadn't been 15 said.

I also wanted to comment that it is 14 not correct to say that it is not correct 13 say "zero or more". Relational theory as 12 commonly known today among the few people 11 who still bother to study the details of 10 that theory, actually REQUIRES the possibility 9 of a key with no attributes.

But when I pressed 8 the button "comment", the system responded 7 to me that commenting requires a reputation 6 score of 50 (or some such).

A sad illustration 5 of how the world seems to have forgotten 4 that science is not democracy, and that 3 in science, the truth is not determined 2 by whoever happens to be the majority, nor 1 by whoever happens to have "enough reputation".

Score: 1

Pros of having a single PK

  • Uniquely identifies a row with a single value
  • Makes it easy to reference the relationship from elsewhere if needed
  • Some tools want you to have a single integer value pk

Cons of having a single PK

  • Uses more disk space
  • Need 3 indexes rather than 1
  • Without a unique constraint you could end up with multiple rows for the same relationship


  • You need to define a unique constraint if you want to avoid duplicates
  • In my opinion don't use the single pk if you're table is going to be huge, otherwise trade off some disk space for the convenience. Yes it's wasteful, but who cares about a few MB on disk in real world applications.


More Related questions