Lecture 1 – Introduction

Data 94, Spring 2021

This is a Jupyter notebook. We'll write all of our code in this class in a Jupyter notebook.

Today, don't worry about how any of this works. Throughout the semester, we'll learn how each of these pieces work.

Note: The maps in this notebook will not load correctly in Safari if you're on a Mac; use Chrome.

In [1]:
!pip install googlemaps
Requirement already satisfied: googlemaps in /opt/miniconda3/lib/python3.8/site-packages (4.4.2)
Requirement already satisfied: requests<3.0,>=2.20.0 in /opt/miniconda3/lib/python3.8/site-packages (from googlemaps) (2.24.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/miniconda3/lib/python3.8/site-packages (from requests<3.0,>=2.20.0->googlemaps) (2020.6.20)
Requirement already satisfied: chardet<4,>=3.0.2 in /opt/miniconda3/lib/python3.8/site-packages (from requests<3.0,>=2.20.0->googlemaps) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/miniconda3/lib/python3.8/site-packages (from requests<3.0,>=2.20.0->googlemaps) (1.25.11)
Requirement already satisfied: idna<3,>=2.5 in /opt/miniconda3/lib/python3.8/site-packages (from requests<3.0,>=2.20.0->googlemaps) (2.10)
In [2]:
from datascience import *
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.graph_objects as go
import googlemaps

# "KEY" is sensitive (it's almost like a password).
# I've removed mine in this posted version of the notebook, so some of the code won't run for you.
KEY = 'xxx'

California universities

Here, we'll load in data about all public universities in California. The data comes from this Wikipedia article.

In [3]:
uni = Table.read_table('data/california_universities.csv')

uni = uni.with_columns(
    'Enrollment', uni.apply(lambda s: int(s.replace(',', '')), 'Enrollment'),
    'Founded', uni.apply(lambda s: int(s.replace('*', '')), 'Founded')
)

Data is often stored in tables. In about a month, we'll become very, very familiar with how tables work. But for now, let's just observe.

In [4]:
uni.show(15)
Name City County Enrollment Founded
University of California, Berkeley Berkeley Alameda 42519 1869
University of California, Davis Davis Yolo 39152 1905
University of California, Irvine Irvine Orange 35220 1965
University of California, Los Angeles Los Angeles Los Angeles 45428 1882
University of California, Merced Merced Merced 8544 2005
University of California, Riverside Riverside Riverside 23278 1954
University of California, San Diego San Diego San Diego 38798 1960
University of California, Santa Barbara Santa Barbara Santa Barbara 24346 1891
University of California, Santa Cruz Santa Cruz Santa Cruz 19700 1965
California State University Maritime Academy Vallejo Solano 1017 1929
California Polytechnic State University San Luis Obispo San Luis Obispo 21812 1901
California State Polytechnic University, Pomona Pomona Los Angeles 26443 1938
California State University, Bakersfield Bakersfield Kern 10493 1965
California State University Channel Islands Camarillo Ventura 7095 2002
California State University, Chico Chico Butte 17488 1887

... (17 rows omitted)

Let's start asking questions.

What's the largest public university in California?

In [5]:
uni.sort('Enrollment', descending = True)
Out[5]:
Name City County Enrollment Founded
University of California, Los Angeles Los Angeles Los Angeles 45428 1882
University of California, Berkeley Berkeley Alameda 42519 1869
California State University, Fullerton Fullerton Orange 39774 1957
University of California, Davis Davis Yolo 39152 1905
University of California, San Diego San Diego San Diego 38798 1960
California State University, Northridge Northridge Los Angeles 38716 1958
California State University, Long Beach Long Beach Los Angeles 36846 1949
University of California, Irvine Irvine Orange 35220 1965
San Diego State University San Diego San Diego 34881 1897
San Jose State University San Jose Santa Clara 32828 1857

... (22 rows omitted)

In [6]:
uni.sort('Enrollment', descending = True).barh('Name', 'Enrollment')

What's the oldest public university in California? 🤔

In [7]:
uni.sort('Founded')
Out[7]:
Name City County Enrollment Founded
San Jose State University San Jose Santa Clara 32828 1857
University of California, Berkeley Berkeley Alameda 42519 1869
University of California, Los Angeles Los Angeles Los Angeles 45428 1882
California State University, Chico Chico Butte 17488 1887
University of California, Santa Barbara Santa Barbara Santa Barbara 24346 1891
San Diego State University San Diego San Diego 34881 1897
San Francisco State University San Francisco San Francisco 29586 1899
California Polytechnic State University San Luis Obispo San Luis Obispo 21812 1901
University of California, Davis Davis Yolo 39152 1905
California State University, Fresno Fresno Fresno 24995 1911

... (22 rows omitted)

In [8]:
uni_copy = uni.sort('Founded').with_columns('Total Universities', np.arange(1, uni.num_rows + 1))
uni_copy.plot('Founded', 'Total Universities')

Let's add some spice.

In [9]:
fig = go.Figure()

fig.add_trace(
    go.Scatter(x = uni_copy.column('Founded'), 
               y = uni_copy.column('Total Universities'), 
               hovertext = uni_copy.column('Name'),
               mode = 'markers',
              )
)

fig.add_trace(
    go.Scatter(x = uni_copy.column('Founded'), 
               y = uni_copy.column('Total Universities'),
               line = dict(color = 'blue'),
              )
)

fig.update_layout(title = 'Total Number of Public Universities in California by Year',
                  xaxis_title = 'Year',
                  yaxis_title = 'Total Universities')

fig.show()

Where are the public universities in California located?

In [10]:
uni
Out[10]:
Name City County Enrollment Founded
University of California, Berkeley Berkeley Alameda 42519 1869
University of California, Davis Davis Yolo 39152 1905
University of California, Irvine Irvine Orange 35220 1965
University of California, Los Angeles Los Angeles Los Angeles 45428 1882
University of California, Merced Merced Merced 8544 2005
University of California, Riverside Riverside Riverside 23278 1954
University of California, San Diego San Diego San Diego 38798 1960
University of California, Santa Barbara Santa Barbara Santa Barbara 24346 1891
University of California, Santa Cruz Santa Cruz Santa Cruz 19700 1965
California State University Maritime Academy Vallejo Solano 1017 1929

... (22 rows omitted)

What if we want to plot these on a map?

It turns out that we can use Google Maps to do it – using code!

In [11]:
gmaps = googlemaps.Client(key = KEY)

Let's try a location that we all know and love...

In [12]:
geocode_result = gmaps.geocode('taco bell southside berkeley')
geocode_result
Out[12]:
[{'address_components': [{'long_name': '2528',
    'short_name': '2528',
    'types': ['street_number']},
   {'long_name': 'Durant Avenue',
    'short_name': 'Durant Ave',
    'types': ['route']},
   {'long_name': 'Southside',
    'short_name': 'Southside',
    'types': ['neighborhood', 'political']},
   {'long_name': 'Berkeley',
    'short_name': 'Berkeley',
    'types': ['locality', 'political']},
   {'long_name': 'Alameda County',
    'short_name': 'Alameda County',
    'types': ['administrative_area_level_2', 'political']},
   {'long_name': 'California',
    'short_name': 'CA',
    'types': ['administrative_area_level_1', 'political']},
   {'long_name': 'United States',
    'short_name': 'US',
    'types': ['country', 'political']},
   {'long_name': '94704', 'short_name': '94704', 'types': ['postal_code']}],
  'formatted_address': '2528 Durant Ave, Berkeley, CA 94704, USA',
  'geometry': {'location': {'lat': 37.8678161, 'lng': -122.2576611},
   'location_type': 'ROOFTOP',
   'viewport': {'northeast': {'lat': 37.86916508029149,
     'lng': -122.2563121197085},
    'southwest': {'lat': 37.8664671197085, 'lng': -122.2590100802915}}},
  'place_id': 'ChIJ743dES98hYARPpAuXjAmG64',
  'plus_code': {'compound_code': 'VP9R+4W Berkeley, CA, USA',
   'global_code': '849VVP9R+4W'},
  'types': ['establishment',
   'food',
   'meal_takeaway',
   'point_of_interest',
   'restaurant']}]
In [13]:
def school_to_lat_lon(school):
    res = gmaps.geocode(school)[0]
    lat = res['geometry']['location']['lat']
    lng = res['geometry']['location']['lng']
    return lat, lng

Given the latitude and longitude of a location, we can plot it on a map!

In [14]:
# lat_lon = uni.apply(lambda name: school_to_lat_lon(name), 'Name')
In [15]:
# uni_locations = uni.with_columns(
# 'Latitude', lat_lon[:, 0],
# 'Longitude', lat_lon[:, 1]).select('Latitude', 'Longitude', 'Name').relabeled('Name', 'labels')
In [16]:
uni_locations = Table.read_table('data/uni_locations.csv')
In [17]:
Marker.map_table(uni_locations, 
                 marker_icon='info-sign')
Out[17]:
Make this Notebook Trusted to load map: File -> Trust Notebook

It would be nice if this were color-coded based on UC vs. CSU. We can do that!

In [18]:
uni_locations_separate = uni_locations.with_columns(
    'colors', uni_locations.apply(lambda s: 'blue' if 'State' not in s else 'red', 'labels')
)
In [19]:
Marker.map_table(uni_locations_separate, marker_icon='info-sign')
Out[19]:
Make this Notebook Trusted to load map: File -> Trust Notebook