There are a whole bunch of articles, blog entries and tutorials that seek to explain how SQL JOINS work. Some of them are excellent, and others are just confusing. The reason I am adding my go at trying to clarify JOINS to the mix is to highlight how proper use of the tools available to you can seriously reduce the chances of getting the JOIN syntax or type wrong. Since JOINS are all about how to relate data from one table to another, I thought it appropriate to illustrate the subject using Parents and Children (who may, or may not be related to eachother). So let's meet the families.
Peter has 2 children: Jo and Roger
Mary has 1 child: Susan
Alice has 2 children: Kevin and Rachel
John has 1 child: David
Jimmy has no children. We don't know why. Perhaps he enjoys the batchelor's life too much. Or maybe he's a Jaffa (a seedless fruit of the orange variety).
Sam and Ian have no parents. They are orphans. Aaaaah.
This is how they appear in database tables: Parents and Children. Children are related to Parents by the AdultID.
INNER JOIN - All parents with children
If we open up the Query Designer in Sql Server Management Studio, and add the 2 tables, they are joined on the AdultID by default using an INNER JOIN. If they are not automatically joined (by the line that appears between the tables) you have not set AdutlID in the Children table as a foreign key. However, you can simply drag AdultID in the Children table, and drop it onto the AdultID in the Parents table to create the relationship.
Selecting AdultName and ChildName in the repsective columns generates the following SQL:
And when the SQL is executed, all parents with children are returned:
So who's missing? Batchelor Jimmy is nowhere to be seen (because he has no children) and nor are our orphans, Sam and Ian. INNER JOIN only returns rows from the two tables where there are matches based on the JOIN-predicate, which in this case is where Parents.AdultID has a matching record in Children.AdultID.
LEFT OUTER JOIN - All Parents, and their children if they have any
Right-clicking on the diamond in the middle of the Join line in the query designer reveals a context menu with some options for the JOIN. If we select "Select All Rows from Parents", the left hand side of the diamond fills out, and some new SQL is generated:
When this is executed, the LEFT OUTER JOIN returns all the parents (including Jimmy), irrespective of whether they have children or not:
It also returns all the children that have parents. In the case of Jimmy, he has NULL against the ChildName column, because he has no children.
RIGHT OUTER JOIN - All children, and their parents if they have any
We deselect "Select All Rows from Parents", and select "Select All Rows from Children" instead. The right hand side (or perhaps the RIGHT OUTER side?)of the diamond is now filled out, and the SQL changes to reflect this:
When executed, our orphans appear in the ChildName column, with NULL against the parent column. Jimmy is nowhere to be seen. I think he just hates kids.
FULL OUTER JOIN - All Parents and all children
Selecting both options from the JOIN property menu gives us a FULL OUTER JOIN:
This returns all records from both sides of the relationship, regardless of whether there are matching records:
CROSS JOIN - Cartesian product
If we remove the JOIN line:
we get a CROSS JOIN, which results, when executed, in the Cartesian product of both tables - 5 parents x 8 children = 40 rows. That means that every row in one table is matched to every row in the other table:
Notice the absence of a JOIN-predicate. There are some obscure uses for CROSS JOINS, including the creation of test data, but generally, they are not used.
Using JOINS to Find Unmatched records
If you study the results of, for example, the LEFT OUTER JOIN, you will notice that Jimmy was returned with a value of NULL in the Children table side of the resultset. Applying this as a filter to a JOIN query is very useful to finding records in one table that have no related records in the second table.
If we type "IS NULL" in the Filter column against the AdultID column of the Children table to the LEFT OUTER JOIN example, we end up with the following diagram and SQL:
and executing this results in Jimmy being returned on his own as the only record in the Parents table that doesn't have a matching record in the Children table:
Conversely, doing the same thing to the AdultID in the Adults table in the RIGHT OUTER JOIN example:
leads to this SQL and diagram:
which when executed results in both Orphans being returned, because again, they have no matching record in the Parents table:
From the questions posted to newsgroups and forums, it seems to me that plenty of people are either unaware of the existence of the Query Designer, or don't use it because they consider it some kind of cheating. For the former group, hopefully this article will show you something new. For the latter group, there is nothing wrong with using the tools available to you. It speeds up development and reduces your chance of errors.
Date Created: 28/11/2007 13:19:01
Date Last Amended: 30/11/2007 09:15:49
Posted By: Mikesdotnetting