Reshaping data with PostgreSQL pivot tables
Originally posted on Medium
Image credit: Midjourney
I have been using PostgreSQL for a while and have grown fond of it over time. As a part of the Rails app I was working with, we have been logging API requests to a table. The target systems which call the API do so in bursts, and we wanted to find the frequency of these calls based on the hour of the day.
All the API calls were logged in a table with the following structure.
| id | xml_content | created_at | updated_at |
|---|---|---|---|
| 1 | <somecontent> | timestamp | timestamp |
| 2 | <somecontent> | timestamp | timestamp |
| 3 | <somecontent> | timestamp | timestamp |
| 4 | <somecontent> | timestamp | timestamp |
To better understand when we receive these requests, I wanted to segregate them by the hour for a given month. This is fairly simple, and we can write a query as below.
select (date_part('year',created_at)*100 + date_part('month',created_at)) yearmonth,
date_part('hour',created_at) hour_of_day,
count(1) no_of_requests
from call_logs
group by date_part('year',created_at)*100 + date_part('month',created_at),
date_part('hour',created_at)
order by yearmonth desc, hour_of_day
The results of the query look a bit like this.
| yearmonth | hour_of_day | no_of_requests |
|---|---|---|
| 201504 | 00 | 2500 |
| 201504 | 01 | 2200 |
| 201504 | 02 | 2123 |
| … | … | … |
| 201503 | 11 | 1282 |
| … | … | … |
Although this gives the necessary data that we need, it would be easier if the hours appeared as columns rather than rows. Our Postgres query will have to change the shape of the result set based on the number of hours returned in the query.
Postgres has an extension that provides crosstab functionality to create pivot table-like structures called tablefunc. The syntax is a bit verbose, but essentially the crosstab function has the following signature. There are other crosstab signatures, but I will leave it as an exercise for the reader to explore the other signatures.
crosstab(dataset_query::text, category_query::text)
-- returns setof records
We have already written our query to return the dataset. The dataset query should always produce rows with 3 columns. The first column takes the row key. The values in the second column become actual columns on which that table is pivoted. The values in the third column are associated with the values in the second column and appear as values under the column bearing the same value in the crosstab record set.
As this returns a set of records, we will have to typecast the returned records set as a table. If you have a lot of categories returned by the second query, it becomes a bit of a pain to describe it. In our case, since the second column is hour numbers, it can take values from 0 to 23. So we use generate_seriesfunction to generate the values.
Since our query has single quotes, we need to escape them properly. Postgres has dollar-quoted strings that let us express queries within $tag$ tags and it auto-escapes the quotes for us. This makes the query quite readable. So the resulting final query looks a bit like this.
select * from crosstab(
-- dataset query
$q$select (date_part('year',created_at)*100 +
date_part('month',created_at)) yearmonth, date_part('hour',created_at)
hour_of_day, count(1) no_of_requests
from call_logs
group by date_part('year',created_at)*100 + date_part('month',created_at),
date_part('hour',created_at) order by yearmonth desc, hour_of_day$q$,
-- category query
$q$ select * from generate_series(0,23)$q$)
as (datepart float, h00 bigint, h01 bigint, h02 bigint,
h03 bigint, h04 bigint, h05 bigint, h06 bigint,
h07 bigint, h08 bigint, h09 bigint, h10 bigint,
h11 bigint, h12 bigint, h13 bigint, h14 bigint,
h15 bigint, h16 bigint, h17 bigint, h18 bigint,
h19 bigint, h20 bigint, h21 bigint, h22 bigint,
h23 bigint);
This produces a result that looks like this.
| yearmonth | h00 | h01 | h02 | h03 | … | h22 | h23 |
|---|---|---|---|---|---|---|---|
| 201504 | 2312 | 2323 | 2323 | 2311 | … | 2223 | 1132 |
| 201503 | 2322 | 2323 | 2263 | 2311 | … | 2223 | 1132 |
This extension makes it very easy to visualize tabular data with PostgreSQL using pivot tables.