We've seen a few in-built Python functions so far.
int('-14') # Evaluates to -14
abs(-14) # Evaluates to 14
max(-14, 15) # Evaluates to 15
print('zoology') # Prints zoology, evaluates to None
We don't currently have a good way to prevent our code from getting repetitive. For example, if we want to determine whether or not different students are ready to graduate:
units_1 = 104
year_1 = 'sophomore'
ready_to_graduate_1 = (year_1 == 'senior') and (units_1 >= 120)
ready_to_graduate_1
units_2 = 121
year_2 = 'senior'
ready_to_graduate_2 = (year_2 == 'senior') and (units_2 >= 120)
ready_to_graduate_2
units_3 = 125
year_3 = 'junior'
ready_to_graduate_3 = (year_3 == 'senior') and (units_3 >= 120)
ready_to_graduate_3
Here's a better solution:
def ready_to_graduate(year, units):
return (year == 'senior') and (units >= 120)
ready_to_graduate(year_1, units_1)
ready_to_graduate(year_2, units_2)
ready_to_graduate(year_3, units_3)
By using a function, we only had to write out the logic once, and could easily call it any number of times.
Other function examples:
# This function has one parameter, x.
# When we call the function, the value we pass in
# as an argument will replace x in the computation.
def triple(x):
return x*3
triple(15)
triple(-1.0)
# Functions can have zero parameters!
def always_true():
return True
# The body of a function can be
# longer than one line.
def pythagorean(a, b):
c_squared = a**2 + b**2
return c_squared**0.5
always_true()
# Good
def square(x):
return x**2
# Bad
def square(x):
return x**2
def eat(zebra):
return 'ate ' + zebra
eat('lionel')
zebra
N = 15
def half(N):
return N/2
half(0)
half(12)
half(N)
N = 15
def addN(x):
return x + N
addN(0)
addN(3)
triple(15)
triple(1/0)
triple(3, 4)
print('my', 'name', 'is', 300)
def add_and_print(a, b):
total = a + b
print(total)
total = add_and_print(3, 4)
total
print(total)
Nothing after the return
keyword is run.
def odd(n):
return n % 2 == 1
print('this will never be printed!')
odd(15)
total = 8
odd(2)
'isaac'.upper()
s = 'JuNiOR12'
s.upper()
s.lower()
s.replace('i', 'iii')
Let's load in the same Wikipedia countries data from this week's earlier lectures. But this time, we will write some of the data cleaning functions ourself.
from datascience import *
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import numpy as np
data = Table.read_table('data/countries.csv')
data = data.take(np.arange(0, data.num_rows - 1))
data = data.relabeled('Country(or dependent territory)', 'Country') \
.relabeled('% of world', '%') \
.relabeled('Source(official or UN)', 'Source')
data = data.with_columns(
'Country', data.apply(lambda s: s[:s.index('[')].lower() if '[' in s else s.lower(), 'Country'))
def first_letter(s):
return s[0]
def last_letter(s):
return s[-1]
data
Let's look at the 'Population'
column.
# ignore
china_pop = data.column('Population').take(0)
china_pop
We want these numbers to be integers, so that we can do arithmetic with them or plot them. However, right now they are not.
Let's write a function that takes in a string with that format, and returns the corresponding integer. But first, proof that the int
function doesn't work here (it doesn't like the commas):
int(china_pop)
china_pop
def clean_population_string(pop):
no_comma = pop.replace(',', '')
return int(no_comma)
china_pop_clean = clean_population_string(china_pop)
china_pop_clean
Cool!
Using techniques we haven't yet learned, we can apply this function to every element of the 'Population'
column, so that when we visualize it, things work.
# ignore
data = data.with_columns('Population', data.apply(clean_population_string, 'Population'))
data
The '%'
column is also a little fishy.
china_pct = data.column('%').take(0)
china_pct
Percentages should be floats, but here they're strings.
Let's suppose we want to have the proportion of the total global population that lives in a given country as a column in our table. Proportions are decimals/fractions between 0 and 1. We can do this two ways:
clean_population_string
, that correctly extracts the proportion we need'Population'
Let's do... both!
def clean_pct_string(pct):
no_symbol = pct.replace('%', '')
prop = float(no_symbol) / 100
return prop
clean_pct_string(china_pct)
Nice! The other way requires adding together all of the values in the 'Population'
column. We haven't covered how to do that just yet, so ignore the code for it and assume it does what it should.
total_population = data.column('Population').sum()
total_population
Assume this is the total population of the world. How would you calculate the proportion of people living in one country?
def compute_proportion(population):
return population / total_population
china_pop_clean
compute_proportion(china_pop_clean)
Pretty close to clean_pct_string(china_pct)
. The difference is likely due to some countries not being included in one column or the other.
Hopefully this gives you a glimpse of the power of functions!