How to Pivot or Unpivot Data in SQL?
1. Introduction
Different people may have different idea on "pivot data":
If you are an Excel expert, you may think it as creating an excel pivot table and dragging data tab with mouse.
If you use R, you may think it as making a long table to wide or making a wide table to long. This process can be done by pivot_wider() or pivot_longer().
If you use Python, this can be done by pd.pivot_table() function. I think they also have a function called pd.melt() but I don't use it much.
"Pivot" is only a terminology. What actually happens here is we want to change a single column of values into table's column names. Or vice versa, change a part of the table's column names into a single column.
For example, our data has a column names "day", and it can only range from "Monday" to "Sunday".
Then this is a vaild column waitting to be pivoted. All data in this "day" column can be presented as column's name, after the changing process.
We will naturally think, this column of data should be finite countable, otherwise how can they later be presented as column names? (you won't expect a table with infinite columns, will you?)
This is why only those column are string or label or categorical should be pivoted but not those numeric columns.
It is a very true idea about "pivot" and our job in this article will all base on it.
2. The problem
Suppose we have a table like below. Let's call it Products:
The other table is like below. Let's call it Results:
Two problem is:
1) How can we change Products to Results?
2) How can we change Results to Products?
3. Pivot
How can we change Products into Results? That is to say, we want to spread values in column "strore" into table's column name.
3.1 conditional summary
A pivot table is nothing miracle. It is only a few independent "group by" summarise results join together form left to right.
We can repeat this process step by step.
select product_id, sum(price) as store1 from Products where store = 'store1' group by product_id
Now we will have:
We can repeat this process for each store one by one. This is what actually a pivot program or a pivot function from some library will do under the hood.
A "group by" summarise with "where" clause is so call "conditional summarise". Another way of writting "conditional summarise" equally is using "case when" of "if".
For example, the first query, we can rewrite it as:
select product_id, sum(case store when 'store1' then price else null end) as store1 from Products group by product_id
Now we will have a same result:
This method is a little bit more useful because it can help us avoid using "where" clause for each case independently. So that we can write them in one query.
select product_id, sum(case store when 'store1' then price end) as store1, sum(case store when 'store2' then price end) as store2, sum(case store when 'store3' then price end) as store3 from Products group by product_id
As we talked at the beginning, the column can be pivoted is finite countable, so we can always manually do this conditional summary process without any worry.
3.2 pivot() function from MS SQL Server
As long as the process can be done with same logic, people will try to write down it as a function.
There is function call pivot() in MS SQL Server, it can be used like below:
select * from Products pivot(sum(price) for store in (store1, store2, store3)) pt
We will have our expecting result as well:
What pivot() actually do is just like what we will do manually. In the function it requires a for-loop at finite countable range to repeat the peocess.
# notice the for-loop in pivot() pivot(sum(price) for store in (store1, store2, store3))
This is a useful function but do not use it before you really understand what pivot is doing, otherwise you will confuse youself easily.
You can see how this short function is hidding big infomation to us.
4. Unpivot
The backward question is: how can we change Results to Products?
That is to say, we want to retrieve part of the column names(store1, store2, store3) into a single column.
4.1 union and union and union
It is tricky but after you know the process you will fell easy to reuse it in future.
In Results table, what actually stored in column "store1" is price.
We can pick "store1" column and rename it back to what they should be named. Besides, we want to show this is price from "store1" so we can manually make a new column with a dumb string.
select product_id, "store1" as store, store1 as price from Results where store1 is not null
We can repeat the same process for "store2" and "store3". As we have discussed at the beginning, a table's column number is finite so we can do it manually.
select product_id, "store1" as store, store1 as price from Results where store1 is not null union select product_id, "store2" as store, store2 as price from Results where store2 is not null union select product_id, "store3" as store, store3 as price from Results where store3 is not null
4.2 unpivot() function from MS SQL Server
There is function call unpivot() in MS SQL Server, it can be used like below:
select * from Results unpivot(price for store in (store1, store2, store3)) as t
This is all the same as we will do manually. It requires a for-loop inside, we can pick which part of columns we want to retrieve.
Unlike pivot(), we additionally need to specify a new name to store the retrieved column names. In our example, it is "price".