Combining Information
Terms defined: cross join, full outer join, join, join condition, left outer join, right outer join
select *
from work cross join job;
| person | job | name | billable |
|--------|-----------|-----------|----------|
| mik | calibrate | calibrate | 1.5 |
| mik | calibrate | clean | 0.5 |
| mik | clean | calibrate | 1.5 |
| mik | clean | clean | 0.5 |
| mik | complain | calibrate | 1.5 |
| mik | complain | clean | 0.5 |
| po | clean | calibrate | 1.5 |
| po | clean | clean | 0.5 |
| po | complain | calibrate | 1.5 |
| po | complain | clean | 0.5 |
| tay | complain | calibrate | 1.5 |
| tay | complain | clean | 0.5 |
- A join combines information from two tables
- cross join constructs their cross product
- All combinations of rows from each
- Result isn't particularly useful:
jobandnamevalues don't match- I.e., the combined data has records whose parts have nothing to do with each other
Inner Join
select *
from work inner join job
on work.job = job.name;
| person | job | name | billable |
|--------|-----------|-----------|----------|
| mik | calibrate | calibrate | 1.5 |
| mik | clean | clean | 0.5 |
| po | clean | clean | 0.5 |
- Use
table.columnnotation to specify columns- A column can have the same name as a table
- Use
on conditionto specify join condition - Since
complaindoesn't appear injob.name, none of those rows are kept
Exercise
Re-run the query shown above using where job = name instead of the full table.name notation.
Is the shortened form easier or harder to read
and more or less likely to cause errors?
Aggregating Joined Data
select
work.person,
sum(job.billable) as pay
from work inner join job
on work.job = job.name
group by work.person;
| person | pay |
|--------|-----|
| mik | 2.0 |
| po | 0.5 |
- Combines ideas we've seen before
- But Tay is missing from the table
- No records in the
jobtable withtayas name - So no records to be grouped and summed
- No records in the
Left Join
select *
from work left join job
on work.job = job.name;
| person | job | name | billable |
|--------|-----------|-----------|----------|
| mik | calibrate | calibrate | 1.5 |
| mik | clean | clean | 0.5 |
| mik | complain | | |
| po | clean | clean | 0.5 |
| po | complain | | |
| tay | complain | | |
- A left outer join keeps all rows from the left table
- Fills missing values from right table with null
Aggregating Left Joins
select
work.person,
sum(job.billable) as pay
from work left join job
on work.job = job.name
group by work.person;
| person | pay |
|--------|-----|
| mik | 2.0 |
| po | 0.5 |
| tay | |
- That's better, but we'd like to see 0 rather than a blank
Coalescing Values
select
work.person,
coalesce(sum(job.billable), 0.0) as pay
from work left join job
on work.job = job.name
group by work.person;
| person | pay |
|--------|-----|
| mik | 2.0 |
| po | 0.5 |
| tay | 0.0 |
coalesce(val1, val2, …)returns first non-null value
Full Outer Join
- Full outer join is the union of left outer join and right outer join
- Almost the same as cross join, but consider:
create table size (
s text not null
);
insert into size values ('light'), ('heavy');
create table weight (
w text not null
);
select * from size full outer join weight;
| s | w |
|-------|---|
| light | |
| heavy | |
- A cross join would produce empty result
Exercise
Find the least time each person spent on any job.
Your output should show that mik and po each spent 0.5 hours on some job.
Can you find a way to show the name of the job as well
using the SQL you have seen so far?