Reshaping data with PostgreSQL pivot tables

Image credit: Midjourney

I have been using PostgreSQL for a while and have grown fond of it over time. As a part of the Rails app I was working with, we have been logging API requests to a table. The target systems which call the API do so in bursts, and we wanted to find the frequency of these calls based on the hour of the day.

All the API calls were logged in a table with the following structure.

id	xml_content	created_at	updated_at
1	<somecontent>	timestamp	timestamp
2	<somecontent>	timestamp	timestamp
3	<somecontent>	timestamp	timestamp
4	<somecontent>	timestamp	timestamp

To better understand when we receive these requests, I wanted to segregate them by the hour for a given month. This is fairly simple, and we can write a query as below.

select (date_part('year',created_at)*100 + date_part('month',created_at)) yearmonth,
       date_part('hour',created_at) hour_of_day,
       count(1) no_of_requests
from call_logs
group by date_part('year',created_at)*100 + date_part('month',created_at),
         date_part('hour',created_at)
order by yearmonth desc, hour_of_day

The results of the query look a bit like this.

yearmonth	hour_of_day	no_of_requests
201504	00	2500
201504	01	2200
201504	02	2123
…	…	…
201503	11	1282
…	…	…

Although this gives the necessary data that we need, it would be easier if the hours appeared as columns rather than rows. Our Postgres query will have to change the shape of the result set based on the number of hours returned in the query.

Postgres has an extension that provides crosstab functionality to create pivot table-like structures called tablefunc. The syntax is a bit verbose, but essentially the crosstab function has the following signature. There are other crosstab signatures, but I will leave it as an exercise for the reader to explore the other signatures.

crosstab(dataset_query::text, category_query::text)
-- returns setof records

We have already written our query to return the dataset. The dataset query should always produce rows with 3 columns. The first column takes the row key. The values in the second column become actual columns on which that table is pivoted. The values in the third column are associated with the values in the second column and appear as values under the column bearing the same value in the crosstab record set.

As this returns a set of records, we will have to typecast the returned records set as a table. If you have a lot of categories returned by the second query, it becomes a bit of a pain to describe it. In our case, since the second column is hour numbers, it can take values from 0 to 23. So we use generate_seriesfunction to generate the values.

Since our query has single quotes, we need to escape them properly. Postgres has dollar-quoted strings that let us express queries within $tag$ tags and it auto-escapes the quotes for us. This makes the query quite readable. So the resulting final query looks a bit like this.

select * from crosstab(
  -- dataset query
  $q$select (date_part('year',created_at)*100 +
    date_part('month',created_at)) yearmonth, date_part('hour',created_at)
  hour_of_day, count(1) no_of_requests 
  from call_logs
  group by date_part('year',created_at)*100 + date_part('month',created_at),
  date_part('hour',created_at) order by yearmonth desc, hour_of_day$q$,
  -- category query
  $q$ select * from generate_series(0,23)$q$) 
as (datepart float, h00 bigint, h01 bigint, h02 bigint, 
                   h03 bigint, h04 bigint, h05 bigint, h06 bigint, 
                   h07 bigint, h08 bigint, h09 bigint, h10 bigint, 
                   h11 bigint, h12 bigint, h13 bigint, h14 bigint,
                   h15 bigint, h16 bigint, h17 bigint, h18 bigint, 
                   h19 bigint, h20 bigint, h21 bigint, h22 bigint, 
                   h23 bigint);

This produces a result that looks like this.

yearmonth	h00	h01	h02	h03	…	h22	h23
201504	2312	2323	2323	2311	…	2223	1132
201503	2322	2323	2263	2311	…	2223	1132

This extension makes it very easy to visualize tabular data with PostgreSQL using pivot tables.