Game performance tables with gt and cbbplotR!

Create a 'Four Factor' game performance table using gt and new features in cbbplotR!

Jan 05, 2024

Welcome to the third installment of Buckets & Bytes — a (hopefully) weekly series where we use R to create engaging visualizations using college basketball data. I want to preface: I hope that this code can provide a general framework for creating similar visualizations, so if college basketball isn’t your thing, you can easily adapt it to different data!

Today, we will be using my new package, {cbbplotR}, which provides {ggplot2} and {gt} extensions for visualizing college basketball team and conference logos + player headshots.

Specifically, we will be creating this neat game performance table, highlighting team-wide four factors on a game-by-game basis — and underscoring the powerful combination of {cbbdata} and {cbbplotR} in the process!

Getting the data

This visualization uses data from {cbbdata}. If you have not yet installed the package and created an account, follow the steps outlined here. As a caveat: This visualization will utilize KenPom data. To access this data, you need to authorize your KenPom account on CBBData, which can be done using this function.

If you do not have an active KenPom subscription, don’t worry! The data that we need is public and front-facing, and I will include the necessary script to pull the data so you can produce the same table.

The process needed to grab our data involves using a few joins, so to simplify the process, we will walk through each “segment” individually. The full source code, included at the bottom of this post, will be more concise.

This code relies on v0.2 of both {cbbdata} and {cbbplotR}, which were released on January 5th. Please run this chunk to load the required libraries and update both packages.

pak::pak(c('andreweatherman/cbbdata', 'andreweatherman/cbbplotR'))
pkgs <- c('cbbdata', 'cbbplotR', 'gt', 'gtExtras', 'tidyverse', 'glue')
invisible(lapply(pkgs, library, character.only = TRUE))

Data: Game Stats and NET

The most important piece of our table, evidently, is game data. We are plotting “Four Factors,” which is a group of statistics defined by Dean Oliver as integral to winning basketball games. It’s a decades-old concept that has held true against the test of time. Using {cbbdata}, we can quickly get these data on a per-game level (caveat: D-1 vs. D-1 games only).

We also want to include the score of the game, and we can use the {glue} package to build the score by concatenate two columns. We are also going to adjust our game date to be more readable.

Finally, we want to include NET rankings and quadrant boundaries. As hinted earlier, the new update to {cbbdata} includes a function to do this for us. If you are not familiar with the NET or quadrants, you can learn more here.

Since we are plotting Duke’s performance, we only need to request the Blue Devils’ game results. If you wish to plot another team (or year), remember to switch out all instances of Duke for your own team (or year). The process is the same.

factors_net <- cbd_torvik_game_stats(year = 2024, team = 'Duke') %>% 
    arrange(date) %>%
    cbd_add_net_quad() %>%
    mutate(score = glue('{pts_scored}-{pts_allowed}'),
           date = format(date, '%b. %e'))  %>% 
    select(date, result, opp, score, off_ppp, off_efg,
           off_to, off_or, off_ftr, def_ppp, def_efg, def_to,
           def_or, def_ftr, game_score, net, quad)

Data: Relative Performance

Barttorvik includes a neat stat called Game Score, which is denoted as game_score in our data above. Game score can be thought of as a composite look at how well your team played in a given game. In a nutshell, game score is a per-game Barthag rating, which — while typically viewed across the aggregate where it down-weights mismatches — is an estimation of a team’s winning chances vs. the average team on a neutral floor. Game score is judged on a [0, 100] scale and is positive (scores closer to 100 are better).

With this information, we can create a data column that says, “In this game, my team played similar to how the T-Rank #X team would be expected to play against the same opponent.” Another appropriate interpretation would be a quick view of team consistency; this can be judged by looking at both the game_score column and our new one. Obviously, game score is pretty volatile at a per-game level, but it’s still nice to look at.

To do this, we need to create a function that pulls the closest current national ranking that corresponds to that game score.

find_closest_rank <- function(scores) {
  map_int(scores, function(score) {
    differences <- abs(ratings$barthag - score / 100)
    closest_index <- which.min(differences)
    ratings$barthag_rk[closest_index]
  })
}

ratings <- cbd_torvik_ratings(year = 2024) # get current rankings

Now, we can apply it to our data.

factors_net <- factors_net %>% 
  mutate(closest_rank = find_closest_rank(game_score))

Data: KenPom Rankings

Finally, we want to include KenPom rankings in our table. If you have an active KenPom subscription, I recommend authorizing it using {cbbdata} to make the process much more streamlined. But if you do not have one, current rankings are public information, and I will include the necessary code below.

KenPom Rankings through {cbbdata}

If you have authorized your account, here is how you would grab current rankings and join them over.

current_kp <- cbd_kenpom_ratings(year = 2024) %>% 
  select(opp = team, rank = rk)

factors_net <- left_join(factors_net, current_kp, by = 'opp')

KenPom Rankings with {rvest}

If you do not have a KenPom account, you can still grab current rankings. Since KenPom is a static site and current rankings are public-facing (no paywall), we can use a combination of {rvest}, {janitor}, and {tidyverse} to retrieve and clean them.

Since not all KenPom team names natively match over to conventions found in {cbbdata}, we need to create a matching dictionary. Then, we use {rvest} to parse the KenPom index page (current data) and pull the static HTML table. html_table will natively return a list, even if only one table is found, so we use the pluck function from {purrr} to “lift” the first element from the nested list. We then use a few functions from {janitor} to clean the data and later employ our matching dictionary created at the start of the chunk.

teams <- cbd_teams() %>% select(team = common_team, kp = kp_team)

team_matching <- teams %>% pull('team') %>% 
  rlang::set_names(cbbdata::cbd_teams()$kp)

current_kp <- rvest::read_html('https://kenpom.com') %>% 
  rvest::html_table() %>% 
  pluck(1) %>% 
  janitor::row_to_names(1) %>% 
  janitor::clean_names() %>% 
  select(opp = team, rank = rk) %>% 
  mutate(opp = team_matching[opp]) %>% 
  filter(!is.na(opp))

factors_net <- left_join(factors_net, current_kp, by = 'opp')

This isn’t terribly complicated code, but you can see the evident advantage to using {cbbdata} — which also includes daily KenPom rating archives back to 2011-12 for subscribers.

Visualizing in {gt}

{cbbplotR} utility functions

Our table will use a few functions from {cbbplotR} to aid in creating our table. Here is a quick overview of each.

Adding opponent logos

Our table includes logos of each Duke opponent, which is a nice way of quickly identifying any game of interest. This process involves mashing together some HTML, but {cbbplotR} includes a function called gt_cbb_teams that will do this for us!

Our table will have some dark fill colors for wins and losses, so let’s use dark mode logos for our opponents. By default, gt_cbb_teams pulls normal logos, but you can set logo_color = “dark” to get dark ones.

We also want to include the KenPom rank of each opponent, but we need to add it after we have called gt_cbb_teams. Then, let’s create a new frame called table_data.

table_data <- factors_net %>% 
  # we want to add HTML in the opp col. and rewrite it -> so: opp, opp
  gt_cbb_teams(opp, opp, logo_color = 'dark') %>% 
  mutate(opp = glue('{opp} (#{rank})'))

Creating the title

If you notice, our table header is pretty cool. It includes Duke’s logo, which is a nice way of quickly identifying the subject of our data. Making a header like this includes tinkering with some HTML — but luckily, {cbbplotR} ships with a function that will build the header for us! More specifically, it will allow us to include a team logo, conference logo, player headshot, or a custom image by passing through an external link. As you’ll see later in the code, we will eventually need to wrap this object in HTML.

To include a logo for Duke, we set the value to Duke and the type to “team”. We can set a table title + subtitle and adjust the fonts, weights, and line-heights of both as well (which we won’t do).

gt_title <- gt_cbb_logo_title(
  title = 'Game-by-game efficiency performance for Duke in 2023',
  subtitle = 'D1 vs. D1 only. Data, rankings, and quadrants are current
  as of Jan. 5.',
  value = 'Duke',
  type = 'team'
)

Coloring win/loss rows

{cbbplotR} ships with another utility function, gt_color_results, that will take a column of game results — either W/L characters or 1/0 binaries — and fill each row relative to the game result. It’s a tidy way of replicating two tab_style calls in a single line. You can also adjust the win/loss_color (fill) and the wins/loss_text_color. By default, the font color is white, which we will keep.

Setting the table font

The final {cbbplotR} utility function is gt_set_font, which is a quick and dirty way of changing the font in all customizable parts of your table. You’ll notice that we will still use tab_style to adjust the weights of our column labels, however, and it should be noted that gt_set_font does not yet offer customization aside from changing the font family. You can think of it as a nice way to test different fonts in your table.

Building the table

Now that we have briefly explored each {cbbplotR} function, created our table header, and finalized our data, let’s throw it over to {gt}!

Okay, there’s a lot going on here. If you’re relatively new to {gt}, I really recommend that you step through each line. It might seem overwhelming, but many of the functions are intuitively named, and running the code line-by-line should help you understand what’s happening.

Optional CSS

I’m not going to walk through every function, but I did want to briefly mention the opt_css line at the end. opt_css is a way of adding CSS to your tables, which really extends table possibilities. In fact, {gt} is just converting everything to HTML — which is why we can get some neat customization with our table header.

In our table, specifically, we use this CSS to decrease the spacing between each footnote and our caption lines. To make this work, we need to set a table ID, which we do in the second line with gt(id = 'duke') and then reference that ID as a selector.

table_data %>% 
  gt(id = 'duke') %>% 
  gt_theme_538() %>% 
  fmt_markdown(opp) %>% 
  cols_move(date, opp) %>% 
  cols_move_to_end(quad) %>% 
  cols_hide(c(result, rank, net)) %>% 
  cols_align(columns = everything(), 'center') %>% 
  cols_align(columns = opp, 'left') %>% 
  cols_label(opp = 'opponent (KenPom Rk.)', off_ppp = 'PPP', 
             def_ppp = 'PPP',off_efg = 'eFG%', off_to = 'TOV%', 
             off_or = 'Reb%',off_ftr = 'FTA/FGA', def_efg = 'eFG%',
             def_to = 'TOV%', def_or = 'Reb%', def_ftr = 'FTA/FGA',
             game_score = 'Eff. Score', quad = 'NET Quad',
             closest_rank = 'Like #') %>% 
  gt_color_results() %>% 
  tab_style(locations = cells_column_labels(),
            style = cell_text(font = 'Oswald', weight = 'bold')) %>% 
  tab_style(locations = cells_title(), style =
              cell_text(font = 'Oswald')) %>% 
  tab_options(table.font.names = 'Oswald', data_row.padding = 2) %>% 
  gt_add_divider(score, include_labels = FALSE, color = 'black') %>% 
  gt_add_divider(off_ftr, include_labels = FALSE, color = 'black') %>% 
  gt_add_divider(def_ftr, include_labels = FALSE, color = 'black') %>% 
  tab_spanner(off_ppp:off_ftr, label = 'Offensive Performance',
              id = 'offense') %>% 
  tab_spanner(def_ppp:def_ftr, label = 'Defensive Performance',
              id = 'defense') %>% 
  tab_footnote(cells_column_spanners(spanner = c('offense', 'defense')),
               footnote = "Points per possession + 'Four Factors'
               (effective FG%, turnover rate, off/def rebound rate,
               and FTA per 100 FGA)") %>% 
  tab_footnote(cells_column_labels(columns = game_score),
               footnote = 'This value is used to calculate the
               proceeding column and can be viewed as a [0-100]
               composite game performance score') %>%
  tab_footnote(cells_column_labels(columns = closest_rank),
               footnote = 'This game performance is roughly
               equivalent to how #X would be expected to play in the
               same game (Barttorvik)') %>% 
  tab_header(title = html(gt_title)) %>% 
  tab_source_note(md('Data by cbbdata + cbbplotR
                     <br>Viz. + Analysis by @andreweatherman')) %>% 
  opt_css(
    '
      #duke .gt_sourcenote{
        line-height: 1.2;
        padding-top: 9px !important;
      }
      #duke .gt_footnote {
        padding-top: 7px !important;
        padding-bottom: 7px !important;
        line-height: 0.2;
      }
      '
  )

Full Code

Loading libraries

pak::pak(c('andreweatherman/cbbdata', 'andreweatherman/cbbplotR'))
pkgs <- c('cbbdata', 'cbbplotR', 'gt', 'gtExtras', 'tidyverse', 'glue')
invisible(lapply(pkgs, library, character.only = TRUE))

Finding closest rank function

find_closest_rank <- function(scores) {
  map_int(scores, function(score) {
    differences <- abs(ratings$barthag - score / 100)
    closest_index <- which.min(differences)
    ratings$barthag_rk[closest_index]
  })
}
ratings <- cbd_torvik_ratings(year = 2024)

KenPom data

If you have authorized your KenPom account on {cbbdata}.

current_kp <- cbd_kenpom_ratings(year = 2024) %>% 
  select(opp = team, rank = rk)

If you have not authorized your KenPom account.

teams <- cbd_teams() %>% select(team = common_team, kp = kp_team)

team_matching <- teams %>% pull('team') %>%
  rlang::set_names(cbbdata::cbd_teams()$kp)

current_kp <- read_html('https://kenpom.com') %>%
  html_table() %>%
  pluck(1) %>%
  row_to_names(1) %>%
  clean_names() %>%
  select(opp = team, rank = rk) %>% 
  mutate(opp = team_matching[opp]) %>%
  filter(!is.na(opp))

Table data

table_data <- cbd_torvik_game_stats(year = 2024, team = 'Duke') %>% 
  arrange(date) %>%
  cbd_add_net_quad() %>% 
  mutate(score = glue('{pts_scored}-{pts_allowed}'),
         closest_rank = find_closest_rank(game_score),
         date = format(date, '%b. %e'),
         location = NULL) %>% 
  select(date, result, opp, score, off_ppp, off_efg,
         off_to, off_or, off_ftr, def_ppp, def_efg, def_to,
         def_or, def_ftr, game_score, closest_rank, net, quad) %>% 
  left_join(current_kp, by = 'opp') %>% 
  gt_cbb_teams(opp, opp, logo_color = 'dark') %>%
  mutate(opp = glue('{opp} (#{rank})'))

Table header

gt_title <- gt_cbb_logo_title(
  title = 'Game-by-game efficiency performance for Duke in 2023',
  subtitle = 'D1 vs. D1 only. Data, rankings, and quadrants are current
  as of Jan. 5.',
  value = 'Duke',
  type = 'team'
)

Creating the table

table <- table_data %>% 
  gt(id = 'duke') %>% 
  gt_theme_538() %>% 
  fmt_markdown(opp) %>% 
  cols_move(date, opp) %>% 
  cols_move_to_end(quad) %>% 
  cols_hide(c(result, rank, net)) %>% 
  cols_align(columns = everything(), 'center') %>% 
  cols_align(columns = opp, 'left') %>% 
  cols_label(opp = 'opponent (KenPom Rk.)', off_ppp = 'PPP', 
             def_ppp = 'PPP',off_efg = 'eFG%', off_to = 'TOV%', 
             off_or = 'Reb%',off_ftr = 'FTA/FGA', def_efg = 'eFG%',
             def_to = 'TOV%', def_or = 'Reb%', def_ftr = 'FTA/FGA',
             game_score = 'Eff. Score', quad = 'NET Quad',
             closest_rank = 'Like #') %>% 
  gt_color_results() %>% 
  tab_style(locations = cells_column_labels(),
            style = cell_text(font = 'Oswald', weight = 'bold')) %>% 
  tab_style(locations = cells_title(), style =
              cell_text(font = 'Oswald')) %>% 
  tab_options(table.font.names = 'Oswald', data_row.padding = 2) %>% 
  gt_add_divider(score, include_labels = FALSE, color = 'black') %>% 
  gt_add_divider(off_ftr, include_labels = FALSE, color = 'black') %>% 
  gt_add_divider(def_ftr, include_labels = FALSE, color = 'black') %>% 
  tab_spanner(off_ppp:off_ftr, label = 'Offensive Performance',
              id = 'offense') %>% 
  tab_spanner(def_ppp:def_ftr, label = 'Defensive Performance',
              id = 'defense') %>% 
  tab_footnote(cells_column_spanners(spanner = c('offense', 'defense')),
               footnote = "Points per possession + 'Four Factors'
               (effective FG%, turnover rate, off/def rebound rate,
               and FTA per 100 FGA)") %>% 
  tab_footnote(cells_column_labels(columns = game_score),
               footnote = 'This value is used to calculate the
               proceeding column and can be viewed as a [0-100]
               composite game performance score') %>%
  tab_footnote(cells_column_labels(columns = closest_rank),
               footnote = 'This game performance is roughly
               equivalent to how #X would be expected to play in the
               same game (Barttorvik)') %>% 
  tab_header(title = html(gt_title)) %>% 
  tab_source_note(md('Data by cbbdata + cbbplotR
                     <br>Viz. + Analysis by @andreweatherman')) %>% 
  opt_css(
    '
      #duke .gt_sourcenote{
        line-height: 1.2;
        padding-top: 9px !important;
      }
      #duke .gt_footnote {
        padding-top: 7px !important;
        padding-bottom: 7px !important;
        line-height: 0.2;
      }
      '
  )

Saving the table

Sometimes, gtsave_extra can be a bit finicky. If your table is not saving, try restarting your R Session (you’ll then need to re-load the libraries).

gtsave_extra(table, 'team_performance.png')

If you found this walkthrough and code useful, please consider subscribing below and sharing the post!