Lecture 30 – Perception, Case Study

Data 94, Spring 2021

In [1]:
from datascience import *
import numpy as np
Table.interactive_plots()
import plotly.express as px
In [2]:
sky = Table.read_table('data/skyscrapers.csv') \
           .where('status.current', are.contained_in(['completed', 'under construction'])) \
           .select('name', 'location.city', 'location.latitude', 'location.longitude', 
                   'statistics.floors above', 'statistics.height', 'status.completed.year') \
           .relabeled(['location.city', 'location.latitude', 'location.longitude',
                       'statistics.floors above', 'statistics.height', 'status.completed.year'],
                      ['city', 'latitude', 'longitude', 'floors', 'height', 'year']) \
           .where('height', are.above(0)) \
           .where('floors', are.above(0))

sky
Out[2]:
name city latitude longitude floors height year
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019
Empire State Building New York City 40.7484 -73.9856 102 381 1931
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973

... (2145 rows omitted)

Perception

In [3]:
sky.group('city') \
   .where('count', are.above_or_equal_to(40)) \
   .sort('count', descending = True) \
   .barh('city', title = 'Number of Skyscrapers Per City')
In [4]:
# Remember, you're not responsible for the code here.
px.pie(sky.group('city').where('count', are.above_or_equal_to(40)).to_df(),
       values = 'count',
       names = 'city',
       title = 'Number of Skyscrapers Per City (Top 10 Only)'
)

Case Study – Skyscrapers

In [5]:
sky.shuffle()
Out[5]:
name city latitude longitude floors height year
Sheraton Denver Downtown Hotel Denver 39.7419 -104.989 22 66.45 1960
52 East End Avenue New York City 40.7723 -73.9471 40 125 1987
Kaleida Health GVI and UB CRTC Buffalo 42.9016 -78.866 10 57.53 2012
Opera Tower Miami 25.7924 -80.1872 56 165.5 2007
Austin Hilton Covention Center Hotel Austin 30.2654 -97.7379 31 114.91 2004
Paraiso Bayviews Miami 25.8069 -80.1874 44 152.4 2018
West Ocean Condominiums I Long Beach 33.7667 -118.197 29 105.16 2007
North Harbor Tower Chicago 41.8854 -87.6154 55 169.47 1988
One Rockefeller Plaza New York City 40.758 -73.9789 34 149 1937
Museum Tower Chicago 41.8661 -87.6213 38 124.64 2006

... (2145 rows omitted)

Which cities have the most skyscrapers?

In [6]:
sky.group('city') \
   .where('count', are.above_or_equal_to(20)) \
   .sort('count', descending = True)
Out[6]:
city count
New York City 558
Chicago 376
Miami 107
San Francisco 65
Houston 61
Seattle 60
Honolulu 56
Los Angeles 51
Las Vegas 43
Minneapolis 39

... (9 rows omitted)

In [7]:
sky.group('city') \
   .where('count', are.above_or_equal_to(20)) \
   .sort('count', descending = True) \
   .barh('city', title = 'Number of Skyscrapers Per City (Min. 20)')

Do any of the above cities stick out to you?

What is the distribution of skyscraper heights?

In [8]:
sky.column('height').min()
Out[8]:
35.970001220703004
In [9]:
sky.column('height').max()
Out[9]:
541.29998779297
In [10]:
sky.hist('height', density = False, bins = np.arange(0, 600, 25),
        title = 'Distribution of Skyscraper Heights')

Let's zoom in a little more.

In [11]:
sky.where('height', are.below(300)) \
   .hist('height', density = False, bins = np.arange(0, 310, 10),
        title = 'Distribution of Skyscraper Heights Below 300m')

What's the distribution of short vs. tall skyscrapers in each city?

In [12]:
sky
Out[12]:
name city latitude longitude floors height year
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019
Empire State Building New York City 40.7484 -73.9856 102 381 1931
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973

... (2145 rows omitted)

Let's say a skyscraper is "short" if its height is less than or equal to 150 meters; otherwise, it's "tall".

In [13]:
def height_cat(height):
    if height <= 150:
        return 'short'
    return 'tall'
In [14]:
sky.apply(height_cat, 'height')
Out[14]:
array(['tall', 'tall', 'tall', ..., 'short', 'short', 'short'],
      dtype='<U5')
In [15]:
sky = sky.with_columns('height category', sky.apply(height_cat, 'height'))
sky
Out[15]:
name city latitude longitude floors height year height category
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014 tall
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019 tall
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974 tall
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018 tall
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015 tall
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009 tall
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019 tall
Empire State Building New York City 40.7484 -73.9856 102 381 1931 tall
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009 tall
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973 tall

... (2145 rows omitted)

We can use pivot to draw a bar chart of the number of short and tall skyscrapers per city.

Quick Check 1

Fill in the blanks to create the table short_and_tall, which has two columns, 'short' and 'tall', and one row for each city with at least 5 short and 5 tall skyscrapers. The first five rows of short_and_tall are shown below.

city short tall
New York City 341 217
Chicago 268 108
Miami 58 49
Houston 34 27
San Francisco 43 22
short_and_tall = sky.pivot(__(a)__, __(b)__) \
                    .where(__(c)__, are.above_or_equal_to(5)) \
                    .where('tall', are.above_or_equal_to(5)) \
                    .sort('tall', descending = True)
In [16]:
# short_and_tall = sky.pivot(__(a)__, __(b)__) \
#                     .where(__(c)__, are.above_or_equal_to(5)) \
#                     .where('tall', are.above_or_equal_to(5)) \
#                     .sort('tall', descending = True)
In [17]:
# short_and_tall.barh('city', title = 'Number of Short and Tall Skyscrapers Per City (Min. 5 Each)')

It seems like most cities have roughly twice as many "short" skyscrapers as they do "tall" skyscrapers.

What if we want to look at the distribution of the number of floors per skyscraper, separated by height category?

In [36]:
sky.hist('floors', group = 'height category', 
         density = False,
         bins = np.arange(0, 150, 5),
         title = 'Distribution of Number of Floors Per Skyscraper')

Since there is overlap between the two histograms, we have that there are some short skyscrapers (below 150m) with more floors than some tall skyscrapers!

What's the relationship between height and number of floors?

In [19]:
sky
Out[19]:
name city latitude longitude floors height year height category
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014 tall
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019 tall
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974 tall
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018 tall
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015 tall
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009 tall
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019 tall
Empire State Building New York City 40.7484 -73.9856 102 381 1931 tall
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009 tall
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973 tall

... (2145 rows omitted)

In [20]:
sky.scatter('height', 'floors',
           s = 30,
           group = 'height category',
           title = 'Number of Floors vs. Height', 
           yaxis_title = 'Number of Floors')
In [21]:
sky.where('height', are.above(300)) \
   .scatter('height', 'floors',
            s = 50,
            labels = 'name',
            title = 'Number of Floors vs. Height (Min. 300m)')

How many skyscrapers were built per year?

In [22]:
sky
Out[22]:
name city latitude longitude floors height year height category
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014 tall
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019 tall
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974 tall
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018 tall
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015 tall
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009 tall
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019 tall
Empire State Building New York City 40.7484 -73.9856 102 381 1931 tall
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009 tall
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973 tall

... (2145 rows omitted)

In [23]:
sky.group('year')
Out[23]:
year count
0 7
1861 1
1888 1
1889 1
1892 1
1893 1
1895 2
1896 2
1897 1
1898 1

... (111 rows omitted)

This is obviously an error in our data.

In [24]:
sky.where('year', 0)
Out[24]:
name city latitude longitude floors height year height category
Miami River Miami 0 0 57 192.03 0 tall
461 Dean Street New York City 40.682 -73.9755 32 105.77 0 short
640 North Wells Street Chicago 41.8936 -87.6344 22 85.34 0 short
SkyHouse Channelside Tampa 27.9479 -82.4469 23 83.82 0 short
SkyHouse Dallas Dallas 0 0 24 80 0 short
2950 North Sheridan Road Chicago 41.9359 -87.6398 19 65.2 0 short
Anthem Blue Cross Los Angeles 34.1806 -118.599 13 57.91 0 short
In [25]:
sky.where('year', are.not_equal_to(0)) \
   .group('year') \
   .plot('year', title = 'Number of Skyscrapers Built Per Year')

What if we want to look at the number of skyscrapers per year built in different cities?

In [26]:
sky.where('city', are.contained_in(['New York City', 'Chicago'])) \
   .where('year', are.not_equal_to(0)) \
   .pivot('city', 'year')
Out[26]:
year Chicago New York City
1888 1 0
1889 1 0
1893 1 0
1895 1 1
1896 0 1
1899 1 2
1902 0 2
1903 1 2
1904 1 0
1905 0 1

... (100 rows omitted)

In [27]:
sky.where('city', are.contained_in(['New York City', 'Chicago'])) \
   .where('year', are.not_equal_to(0)) \
   .pivot('city', 'year') \
   .plot('year',
        title = 'Number of Skyscrapers Built Per Year in NYC and Chicago')

Where on a map are most skyscrapers located?

In [28]:
sky
Out[28]:
name city latitude longitude floors height year height category
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014 tall
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019 tall
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974 tall
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018 tall
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015 tall
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009 tall
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019 tall
Empire State Building New York City 40.7484 -73.9856 102 381 1931 tall
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009 tall
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973 tall

... (2145 rows omitted)

In [29]:
Circle.map_table(sky.select('latitude', 'longitude'),
                line_color = None,
                fill_opacity = 0.65,
                area = 75,
                color = 'orange')
Out[29]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Let's look at a map of tall skyscrapers in New York City.

In [30]:
ny_tall = sky.where('city', 'New York City') \
             .where('height category', 'tall') \
             .select('latitude', 'longitude', 'name', 'height') \
             .relabeled(['name', 'height'], ['labels', 'color_scale'])

ny_tall
Out[30]:
latitude longitude labels color_scale
40.7131 -74.0134 One World Trade Center 541.3
40.7664 -73.9809 Central Park Tower 541.02
40.7648 -73.9775 111 West 57th Street 438.3
40.7616 -73.9719 432 Park Avenue 425.5
40.754 -74.0008 30 Hudson Yards 386.61
40.7484 -73.9856 Empire State Building 381
40.7554 -73.9844 Bank of America Tower 365.8
40.7109 -74.0116 3 World Trade Center 328.88
40.7618 -73.9782 53 West 53rd 320.04
40.7516 -73.9753 Chrysler Building 318.9

... (207 rows omitted)

In [31]:
Circle.map_table(ny_tall,
                line_color = None,
                fill_opacity = 0.65,
                area = 150,
                color_scale = None)
Out[31]:
Make this Notebook Trusted to load map: File -> Trust Notebook

It seems like most skyscrapers in NYC are either in the financial district or in Midtown. The circles for One World Trade Center and the Empire State Building are bright.

Lastly, what if we want to look at where short and tall skyscrapers are throughout the country?

In [32]:
sky
Out[32]:
name city latitude longitude floors height year height category
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014 tall
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019 tall
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974 tall
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018 tall
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015 tall
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009 tall
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019 tall
Empire State Building New York City 40.7484 -73.9856 102 381 1931 tall
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009 tall
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973 tall

... (2145 rows omitted)

There are two solutions here.

  1. Create a function that takes in 'short' or 'tall' and returns the desired color. (We did this in Lecture 28.)
  2. Create a table with two columns, one with 'short' and 'tall' and the other with the desired colors, and join this table with sky.

We will use the second approach here.

In [33]:
sky_to_color = Table().with_columns(
    'category', np.array(['short', 'tall']),
    'colors', np.array(['orange', 'green'])
)

sky_to_color
Out[33]:
category colors
short orange
tall green
In [34]:
sky_with_colors = sky.join('height category', sky_to_color, 'category') \
                     .select('latitude', 'longitude', 'colors')

sky_with_colors
Out[34]:
latitude longitude colors
40.7624 -73.9718 orange
25.8459 -80.1197 orange
39.9516 -75.1602 orange
37.7918 -122.396 orange
44.9736 -93.2763 orange
40.7792 -73.9878 orange
25.7628 -80.1926 orange
40.7439 -73.9926 orange
33.765 -84.3881 orange
40.7612 -73.9788 orange

... (2145 rows omitted)

In [35]:
Circle.map_table(sky_with_colors,
                line_color = None,
                fill_opacity = 0.7)
Out[35]:
Make this Notebook Trusted to load map: File -> Trust Notebook

While there seem to be short skyscrapers (orange) throughout the country, tall skyscrapers generally seem to be concentrated in larger cities.