[SQL] Joinning V.S. Inner Select

This reminds me of those days, when I was taking DB course in NTU. One of the thing we learnt was inner joining. I still remmeber how my lecturer manages to came out with all sorts of tricky questions. Solutions aren’t straight forward, and they ofter require >1 layer of inner selections.

Suppose I are have a database scheme that stores your Blog application (popular “Hello World” tutorials for web frameworks). There is a Posts table and a Tags table. Post and Tag have HABTM relationship. I have also created a 3rd table called Posts_Tags, which stores paris of post’s id and tag’s id.

Give a tag_id, i want to count the number of posts that are tagged with this tag. Using SQL, there are more than 1 way of doing it. Today I am going to compare Joining against Inner Select.

This is how I will do a join:

This is how I will do an inner select:

Let’s start with the inner select. Only 1 record is selected from Posts_Tags, the query can be very fast if tag_is is indexed. Suppose there are 1 million posts. 1 million Posts records will be retrieved and compared against a single Posts_Tags.post_id. Hmm… Comparing 1 millions records doesn’t sound good. Can we do better?

I used EXPLAIN to check what MySQL is doing. I have found out that joining is actually more efficient. MySQL is smart enough to reduce the number of records been retrieved for comparision. In my experiment, <100 records were retrieved from Posts and Posts_tags. This is definitely much faster then inner select.

When will inner select be faster than joining? Lazy to find out… Maybe next time ;)

Through this experiment, I have learned 2 features: EXPLAIN and BENCHMARK. These 2 tools are very helpful in analyzing what the MySQL is doing.

Posted in Tao Of Programming at February 27th, 2009. No Comments.