Lots of columns? Use this gt trick!
Creating 538-style captions with intuitive multi-column tables
The What
Dealing with multiple tables in gt
is a pain, whether it be with gt_two_column_layout or some messy htmltools hacking. A clever way to address this is to transform your data into a wide format before you pass it to gt
. This tutorial covers the latter and teaches you how to add what I’m calling a “538-style” caption (two captions separated by a black line at the bottom of the table).
Using these ideas, we will be creating the table below – average team performance vs. top 100 opponents over the past five seasons in men’s college basketball.
The How
For this table, we will need:
library(tidyverse)
library(cbbdata)
library(cbbplotR)
library(gt)
library(gtExtras)
library(glue)
library(magick)
The Data
Grabbing the data
For this visualization, we will be pulling data from Barttorvik using the cbbdata package.
data <- cbd_torvik_game_stats() %>%
filter(year >= 2020) %>%
left_join(cbd_torvik_ratings_archive() %>% select(opp = team, date, barthag, rank)) %>%
filter(rank <= 100) %>%
summarize(mean_score = mean(game_score),
games = n(),
.by = team) %>%
filter(games >= 25) %>%
slice_max(mean_score, n = 100) %>%
arrange(desc(mean_score)) %>%
left_join(cbd_teams() %>% select(team = common_team, logo))
Let’s break this down, we’re:
Pulling game-level data for every contest over the past five seasons
Appending the opponent’s Barttorvik rank entering the game
Filtering to include games vs. opponents inside the top 100
Averaging each team’s game score (you can think of this as game-level performance) and summing the number of contests played
Only considering teams who have played at least 25 games
Arranging by highest average score
Joining on team logos
Creating an HTML column
Next, we’re going to create two important columns: group
and formatted_score
.
group is an indicator column on which we will split our data. For our table, we want to show the top 100 performing teams – and we’re going to do so with five 20-team columns. A simple way to indicate which team will be in which column is to create a new variable where teams 1-20 will be 1, teams 21-40 will be 2, and so on (any differentiation works). We can do this with rep(1:5, each = 20). The rep function will replicate the indicated set of values, in this case the sequence of [1, 2, 3, 4, 5], and the each parameter repeats each previous element 20 times. You can think of using the general form, rep(1:number_of_columns, each = total_rows_in_each_column).
formatted_score is an HTML column which will be rendered in using gt. Per usual, I’m not going to dive too deep into inline CSS or HTML. The basic functionality of this column is to show the team’s ranking, logo, average game score, and number of games played in a single cell. Our inline CSS forces the average game score to be bold and larger than the games text. The mean_score change simply rounds the average performance and formats it to show trailing zeros.
data <- data %>%
mutate(rank = row_number(),
group = rep(1:5, each = 20),
mean_score = round(mean_score, 1),
mean_score = ifelse(nchar(mean_score) == 2, glue("{mean_score}.0"), mean_score),
formatted_score = glue(
"<div style='display: flex; align-items: center;'>
{rank}.
<img src='{logo}' alt='Logo' style='height:25px; margin-right: 8px;'>
<span style='font-weight: bold; font-size: 1.2em; color: black; vertical-align: middle;'>{mean_score}</span>
<span style='font-size: 0.8em; color: gray; vertical-align: middle;'> ({games})</span>
</div>"
))
Splitting our data
Now that we have our group and formatted_score column, we can pivot our data wider by splitting on our group number and binding the resulting lists together. The group_split function returns a list of tibbles that are separated by the associated group (i.e., the group number). The map_dfc function iterates over each list, selects columns containing “formatted_score,” and column-binds them into a single tibble – leaving us with a 20-row, 5-column data frame as intended.
data <- data %>%
group_split(group) %>%
map_dfc(~ select(.x, contains("formatted_score")))
Onto the table!
The Table
The base
The base of our table uses the gt_theme_pl function from my cbbplotR package. The fmt_markdown function renders our HTML columns, cols_label removes all column labels, and cols_align center-aligns all columns. If you run this as-is and slap on a table header and caption, you’re honestly off to a pretty good start as the formatted_score columns and the gt_theme_pl function does much of the heavy lifting.
data %>%
gt(id = "table") %>%
gt_theme_pl() %>%
fmt_markdown(contains("formatted_score")) %>%
cols_label(everything() ~ "") %>%
cols_align(columns = everything(), "center")
Adding dividers and adjusting fonts
To spice things up a bit, we will use gt_add_dividers to mimic row borders and the gt_set_font function from cbbplotR to choose a new table font. We can make our table a bit more compact by dropping the data_row.padding option to 1. By default, the table font in gt_theme_pl is a dark purple, and we can adjust this by setting our font color inside rows to black with tab_style and the cells_body location helper.
... %>%
gt_add_divider(columns = -last_col(), color = "black", weight = px(1.5), include_labels = FALSE) %>%
gt_set_font("Barlow") %>%
tab_options(data_row.padding = 1) %>%
tab_style(locations = cells_body(), cell_text(color = "black"))
Annotations
You’re going to have to bear with me here. The way that I create the 538-style captions is by adding the top-level caption line, the one above the bottom border in the final table, as a footnote and later include some necessary CSS to make it work. Perhaps there is a way to do this by just using tab_source_note, and I bet there is, but I’m assuming that is a bit messier than my solution.
Importantly, if you do want this caption design, you’re going to have to sacrifice the ability to use table footnotes. You can easily write your footnote information in this top-level caption, but you won’t see the footnote marks.
... %>%
tab_footnote(locations = cells_column_labels(), footnote = md("Opponent rank is determined by Barttorvik T-Rank on game date. Game score is game-level Barthag performance (the<br>probability that you beat an average team on a neutral floor). Total games played is next to mean score.")) %>%
tab_header(
title = "Performance against top 100 teams over the past five seasons",
subtitle = "Average T-Rank game score vs. then-top 100 opponents from 2020-2024; min. total 25 games"
) %>%
tab_source_note("Data by Barttorvik + cbbdata || Viz. + Analysis by @andreweatherman")
Necessary CSS
As promised, this table will rely on some CSS. I’m not really going to touch on anything but the footnote additions.
#table .gt_footnote adds a solid black line with width 1px below the table footnote, which we are treating as our top-level caption, and adjusts the font size.
#table .gt_footnote_marks hides the footnote marks in the column headers as they are irrelevant in this design.
#table .gt_sourcenote aligns our bottom-level caption, defined with tab_source_note, to the right of our table, mimicking the popular 538 table design.
... %>%
opt_css(
"
#table .gt_column_spanner {
border-bottom-style: none !important;
display: none !important;
}
#table .gt_subtitle {
line-height: 1.2;
padding-top: 0px;
padding-bottom: 0px;
}
#table .gt_footnote {
border-bottom-style: solid;
border-bottom-width: 1px;
border-bottom-color: #000;
font-size: 12px;
}
#table .gt_footnote_marks {
display: none !important;
}
#table .gt_sourcenote {
text-align: right;
}
#table .gt_row {
border-top-color: black;
}
"
)