Aggregating and Grouping

Terms defined: aggregation, aggregation function, group

See also "Essential SQL Techniques"

select sum(body_mass_g) as total_mass
from penguins;
| total_mass |
|------------|
| 1437000.0  |

Common Aggregation Functions

select
    max(bill_length_mm) as longest_bill,
    min(flipper_length_mm) as shortest_flipper,
    avg(bill_length_mm) / avg(bill_depth_mm) as weird_ratio
from penguins;
| longest_bill | shortest_flipper |   weird_ratio    |
|--------------|------------------|------------------|
| 59.6         | 172.0            | 2.56087082530644 |

Exercise

What is the average body mass of penguins that weigh more than 3000.0 grams?

Counting

select
    count(*) as count_star,
    count(sex) as count_specific,
    count(distinct sex) as count_distinct
from penguins;
| count_star | count_specific | count_distinct |
|------------|----------------|----------------|
| 344        | 333            | 2              |

Exercise

How many different body masses are in the penguins dataset?

Grouping

select avg(body_mass_g) as average_mass_g
from penguins
group by sex;
|  average_mass_g  |
|------------------|
| 4005.55555555556 |
| 3862.27272727273 |
| 4545.68452380952 |

Behavior of Unaggregated Columns

select
    sex,
    avg(body_mass_g) as average_mass_g
from penguins
group by sex;
|  sex   |  average_mass_g  |
|--------|------------------|
|        | 4005.55555555556 |
| FEMALE | 3862.27272727273 |
| MALE   | 4545.68452380952 |

Arbitrary Choice in Aggregation

select
    sex,
    body_mass_g                   
from penguins
group by sex;
|  sex   | body_mass_g |
|--------|-------------|
|        |             |
| FEMALE | 3800.0      |
| MALE   | 3750.0      |

Exercise

Explain why the output of the previous query has a blank line before the rows for female and male penguins.

Write a query that shows each distinct body mass in the penguin dataset and the number of penguins that weigh that much.

Filtering Aggregated Values

select
    sex,
    avg(body_mass_g) as average_mass_g
from penguins
group by sex
having average_mass_g > 4000.0;
| sex  |  average_mass_g  |
|------|------------------|
|      | 4005.55555555556 |
| MALE | 4545.68452380952 |

Readable Output

select
    sex,
    round(avg(body_mass_g), 1) as average_mass_g
from penguins
group by sex
having average_mass_g > 4000.0;
| sex  | average_mass_g |
|------|----------------|
|      | 4005.6         |
| MALE | 4545.7         |

Filtering Aggregate Inputs

select
    sex,
    round(
        avg(body_mass_g) filter (where body_mass_g < 4000.0),
        1
    ) as average_mass_g
from penguins
group by sex;
|  sex   | average_mass_g |
|--------|----------------|
|        | 3362.5         |
| FEMALE | 3417.3         |
| MALE   | 3729.6         |

Exercise

Write a query that uses filter to calculate the average body masses of heavy penguins (those over 4500 grams) and light penguins (those under 3500 grams) simultaneously. Is it possible to do this using where instead of filter?

Check Understanding

box and arrow diagram of concepts related to aggregation in SQL
Figure 1: Aggregation Concepts