Adding A Summary Row For Table

 

 

Introduction

In daily analysis we will face a simple chanllenge: how to add a summary row for our table?

# A tibble: 5 × 2
  cut       avg_pri
  <ord>       <dbl>
1 Fair        4359.
2 Good        3929.
3 Very Good   3982.
4 Premium     4584.
5 Ideal       3458.

In most of my experience, I just export them to Excel and manually add a summary row. This is not too bad, because after all I still have to export them to Excel for further communication, like preparing PowerPoint or deliver them with mail.

Do we have a more easy and robust solution with R?

 

How About add_row()?

As an example, we will use diamonds dataset. If you installed and import tidyverse package you can use it directly.

library(tidyverse)

diamonds %>%
  group_by(cut) %>%
  summarise(avg_pri = mean(price))

 

# A tibble: 5 × 2
  cut       avg_pri
  <ord>       <dbl>
1 Fair        4359.
2 Good        3929.
3 Very Good   3982.
4 Premium     4584.
5 Ideal       3458.

A The first plain idea is using a function called add_row() because we want to add a row indeed. This function allows you to build tibble row by row, so that we can add a summary row as we want.

When you use add_row(), you are not able to access the original dataframe columns. Instead, you need to use dataset$columname.

# use it separately
cut_analysis <-
diamonds %>%
  group_by(cut) %>%
  summarise(avg_pri = mean(price))

cut_analysis %>%
  add_row(cut = "All", avg_pri = mean(diamonds$price))

# or, just use it in our pipline data flow
diamonds %>%
  group_by(cut) %>%
  summarise(avg_pri = mean(price)) %>%
  add_row(cut = "All", avg_pri = mean(diamonds$price)) 
  # avg_pri = mean(price) is not working

 

# A tibble: 6 × 2
  cut       avg_pri
  <chr>       <dbl>
1 Fair        4359.
2 Good        3929.
3 Very Good   3982.
4 Premium     4584.
5 Ideal       3458.
6 All         3933.

This is a little step foreward compare to "Excel method" I daily used. But still it can be frustrating when the output columns are more than handful numbered.

For example:

# using add_row() can be frustrating when the output columns are more than handful
diamonds %>%
  group_by(cut) %>%
  summarise(n = n(),
            avg_carat = mean(carat),
            sum_depth = sum(depth),
            sum_table = sum(table),
            avg_price = mean(price),
            sd_x = sd(x),
            sd_y = sd(y),
            sd_z = sd(z))

  

# A tibble: 5 × 9
  cut           n avg_carat sum_depth sum_table avg_price  sd_x  sd_y  sd_z
  <ord>     <int>     <dbl>     <dbl>     <dbl>     <dbl> <dbl> <dbl> <dbl>
1 Fair       1610     1.05    103107.    95077.     4359. 0.964 0.956 0.652
2 Good       4906     0.849   305967    287956.     3929. 1.06  1.05  0.655
3 Very Good 12082     0.806   746888.   700226.     3982. 1.10  1.10  0.730
4 Premium   13791     0.892   844901.   810167.     4584. 1.19  1.26  0.731
5 Ideal     21551     0.703  1329899.  1205814.     3458. 1.06  1.07  0.658

  

How about bind_rows()?

Another idea is to use bind_rows(). It can add rows to dataframe as well.

Not like in add_row() we have to specify columns by ourselves. bind_row() gives us a chance to copy-paste our aggregation code.

# do it separately
summary_row <-
diamonds %>%
  summarise(avg_pri = mean(price))

cut_analysis %>%
  bind_rows(summary_row)

# or, just use it as functional
bind_rows(
  diamonds %>% group_by(cut) %>% summarise(avg_pri = mean(price)),
  diamonds %>% summarise(avg_pri = mean(price))
)

 

# A tibble: 6 × 2
  cut       avg_pri
  <ord>       <dbl>
1 Fair        4359.
2 Good        3929.
3 Very Good   3982.
4 Premium     4584.
5 Ideal       3458.
6 NA          3933.

It is a great improvment becuase even when the summary columns are massive, we can use it by copying our summary code without much effort.

bind_rows(
  # grouped summary
  diamonds %>%
  group_by(cut) %>%
  summarise(n = n(),
            avg_carat = mean(carat),
            sum_depth = sum(depth),
            sum_table = sum(table),
            avg_price = mean(price),
            sd_x = sd(x),
            sd_y = sd(y),
            sd_z = sd(z)),
    
  # total summary
  # it is just copy-and-paste and delete group_by() clause
  diamonds %>%
  summarise(n = n(),
            avg_carat = mean(carat),
            sum_depth = sum(depth),
            sum_table = sum(table),
            avg_price = mean(price),
            sd_x = sd(x),
            sd_y = sd(y),
            sd_z = sd(z)),
)

  

# A tibble: 6 × 9
  cut           n avg_carat sum_depth sum_table avg_price  sd_x  sd_y  sd_z
  <ord>     <int>     <dbl>     <dbl>     <dbl>     <dbl> <dbl> <dbl> <dbl>
1 Fair       1610     1.05    103107.    95077.     4359. 0.964 0.956 0.652
2 Good       4906     0.849   305967    287956.     3929. 1.06  1.05  0.655
3 Very Good 12082     0.806   746888.   700226.     3982. 1.10  1.10  0.730
4 Premium   13791     0.892   844901.   810167.     4584. 1.19  1.26  0.731
5 Ideal     21551     0.703  1329899.  1205814.     3458. 1.06  1.07  0.658
6 NA        53940     0.798  3330763.  3099240.     3933. 1.12  1.14  0.706

  

One deeper layer problem

One more layer deeper problem is that, when we grouped more than just one variable, how can we add summary rows?

For example, how can we add summary rows to each cut group, or to each color group?

diamonds %>%
  group_by(cut, color) %>%
  summarise(avg_pri = mean(price))

 

# A tibble: 35 × 3
# Groups:   cut [5]
   cut   color avg_pri
   <ord> <ord>   <dbl>
 1 Fair  D       4291.
 2 Fair  E       3682.
 3 Fair  F       3827.
 4 Fair  G       4239.
 5 Fair  H       5136.
 6 Fair  I       4685.
 7 Fair  J       4976.
 8 Good  D       3405.
 9 Good  E       3424.
10 Good  F       3496.
# … with 25 more rows

Using bind_rows() like above can solve this problem as well. We just need to choose which group are we want to summary.

For example, I want to add summary rows about variable cut:

bind_rows(
diamonds %>%
  group_by(cut, color) %>%
  summarise(avg_pri = mean(price)),

diamonds %>%
  group_by(cut) %>%
  summarise(avg_pri = mean(price))
)

Or, I just want to add summary rows about variabe color:

bind_rows(
diamonds %>%
  group_by(cut, color) %>%
  summarise(avg_pri = mean(price)),

diamonds %>%
  group_by(color) %>%
  summarise(avg_pri = mean(price))
)

  

What if I don't want to copy-and-paste my code?

Another more refreshing method is using bind_rows() but first with a transformation to our dataset.

We can bind the dataset to itself, but with the column we want to group renamed. This is making our data redundancy but the process of output will directly contains a total summary row.

bind_rows(
  diamonds,
  diamonds %>% mutate(cut = "All")
) %>%
  group_by(cut) %>%
  summarise(avg_pri = mean(price))

  

# A tibble: 6 × 2
  cut       avg_pri
  <chr>       <dbl>
1 All         3933.
2 Fair        4359.
3 Good        3929.
4 Ideal       3458.
5 Premium     4584.
6 Very Good   3982.

There is a open source function written by camille, making a lot of works underhood. You can use it directly in the pipline data flow. This is creator's web page:

https://github.com/camille-s/camiller/blob/main/R/bind_self.R

# using bind_self() function by camille
diamonds %>%
  group_by(cut) %>%
  bind_self(group = "cut", new_value = "All") %>%
  summarise(avg_pri = mean(price))

# if you are facing more than just one variable is grouped, you
# can still use bind_self
diamonds %>%
  group_by(cut, color) %>%
  bind_self(group = "cut", new_value = "All") %>%
  summarise(avg_pri = mean(price)) 

I think this is a useful function towords this problem. However, still it can be risky, especially when your dataset is very large. Remember you are making double data redundancy and this may make your computer's memroy overflow.

 

Final thought

Sometimes I will think, add a summary row to the result table is not making our view more clear. Especially when the summary table is grouped under more than only one variable, like in section One deeper layer problem.

One solution is that we can just use two smaller tables to show a separated result, or we can use written words to describle a total result.

Regardless, knowing how to use bind_rows()/ bind_self()/ add_row() is like a tool always packed at our back pocket.

  

  

  

  

  

    

posted @ 2022-03-13 11:16  DrVonGoosewing  阅读(54)  评论(0编辑  收藏  举报