Lecture 27 – Visualizing Two Numerical Variables

Data 94, Spring 2021

In [1]:
from datascience import *
import numpy as np
Table.interactive_plots()

Our first dataset today comes from Basketball Reference. It contains per-game averages of players in the 2019-2020 NBA season.

Run the cell below to load it in, select the relevant columns, and do some data cleaning.

Note: Most of the interesting data comes from the "better" players in the league; we will only look at players who averaged at least 10 points per game in the season. This isn't perfect, since there were plenty of good players who averaged less than 10 points per game.

In [2]:
nba = Table.read_table('data/nba-2020.csv') \
           .select('Player', 'Pos', 'Tm', 'PTS', 'TRB', 'AST', '3PA', '3P%') \
           .where('3PA', are.not_equal_to(0))

def remove_code(name):
    return name[:name.index('\\')]

def get_court(pos):
    if 'G' in pos:
        return 'Guard'
    else:
        return 'Forward'

nba = nba.with_columns('Player', nba.apply(remove_code, 'Player'),
                       'Pos', nba.apply(get_court, 'Pos')) \
         .where('PTS', are.above(10))
In [3]:
nba
Out[3]:
Player Pos Tm PTS TRB AST 3PA 3P%
Bam Adebayo Forward MIA 15.9 10.2 5.1 0.2 0.143
LaMarcus Aldridge Forward SAS 18.9 7.4 2.4 3 0.389
Jarrett Allen Forward BRK 11.1 9.6 1.6 0.1 0
Giannis Antetokounmpo Forward MIL 29.5 13.6 5.6 4.7 0.304
Carmelo Anthony Forward POR 15.4 6.3 1.5 3.9 0.385
OG Anunoby Forward TOR 10.6 5.3 1.6 3.3 0.39
D.J. Augustin Guard ORL 10.5 2.1 4.6 3.5 0.348
Deandre Ayton Forward PHO 18.2 11.5 1.9 0.3 0.231
Marvin Bagley III Forward SAC 14.2 7.5 0.8 1.7 0.182
Lonzo Ball Guard NOP 11.8 6.1 7 6.3 0.375

... (163 rows omitted)

A description of each column:

  • 'Player': name
  • 'Pos': general position (either Forward or Guard)
  • 'Tm': abbreviated team
  • 'PTS': average number of points scored per game
  • 'TRB': average number of rebounds per game (a player receives a rebound when they grab the ball after someone misses)
  • 'AST': average number of assists per game (a player receives an assist when they pass the ball to someone who then scores)
  • '3PA': average number of three-point shots attempted per game (a three point shot is one from behind a certain line, which is between 22-24 feet from the basket)
  • '3P%': average proportion of three-point shots that go in

Review – bar charts and histograms

Bar charts

In [4]:
nba.group('Pos', np.mean).select('Pos', 'PTS mean', 'TRB mean', 'AST mean')
Out[4]:
Pos PTS mean TRB mean AST mean
Forward 15.6297 6.68901 2.41099
Guard 16.7463 4.00244 4.45244
In [5]:
nba.group('Pos', np.mean).select('Pos', 'PTS mean', 'TRB mean', 'AST mean').barh('Pos')