Table of contents
- Communication 💬
- Logistics 📆
- About 🧐
- Course Structure 🍎
- Technology 🖥
- Policies ✏️
- Acknowledgements 🙏
Note: This page is still being finalized, and the information here is tentative. Also, see here for information about enrollment.
This semester, we’ll be using Ed, a new communication tool. Ed is where you will see all announcements and get help from staff and other students on assignments and concepts. You will be added to Ed automatically; email Suraj if you’re not sure how to access it.
We will not be using bCourses at all in this class; this website and Ed serve as replacements.
Lecture: Mondays, Wednesdays, and Fridays, 11AM-12PM
Lab: Fridays, 12PM-1PM
Office Hours: Throughout the week; see Ed
Lectures, labs, and office hours will be hosted on Zoom. See this post on Ed for the link. (We won’t make the Zoom link public, so that we don’t get Zoom-bombed 💣.)
From the course catalog: This course is an introduction to computational thinking and quantitative reasoning, designed to prepare students for further coursework in data science, computer science, and statistics (in particular, Foundations of Data Science, Data C8). This course emphasizes the use of computation to gain insight about quantitative problems with real data from the social sciences.
Data 94 uses the Python language to teach computation. It also uses the Jupyter Notebook environment, which makes it easy to get started with programming without needing to use a text editor and terminal and is very popular in data science applications.
This class serves a different purposes than several other classes that may sound similar. Specifically:
- Data 8: Data 94 does not cover nearly as much statistics and inference as Data 8. Instead, it dives deeper into Python and its applications in data science. After taking this class, you will be well-equipped to take Data 8 and focus on the inference.
- CS 10: While CS 10 is also an introductory computing class, it focuses less on Python and data science, and more on abstract ideas in computing. It is a fantastic alternative to Data 94.
- CS 61A and CS 88: While these courses also teach Python, they serve a slightly different purpose - namely, they are designed to introduce students to computer science, not to computing in data science. They cover the Python language in far greater detail than we will, but they do not cover how to work with real-world data. They are also substantially more fast-paced than this course.
If you have already taken any of these courses, Data 94 is not the right course for you. But if you haven’t – welcome! You’re in the right place 😎. What also makes Data 94 different is its small size – Data 94 will have just 30 students in Spring 2021.
The rough topic breakdown is as follows:
- Weeks 1-4: Python basics in the Jupyter notebook environment.
- Weeks 4-8: Working with real-world tabular data using
datascience(the library used in Data 8).
- Weeks 9-11: Data visualization.
- Weeks 13-15: Probability and simulation. Special topics, as time permits.
Slides and code will be posted after each lecture, and they will cover everything you are required to know for the course. There is no one textbook that covers the content of this course the way we intend on covering it, though we will link supplementary readings.
Also, note that the course will emphasize the use of real-world data. Some possible datasets include
- Data from media markets in Pennsylvania and data on Congress members’ ages
- California housing prices data
- COVID cases
- Bay Area bike sharing usage data
- Vehicle fuel efficiency data
- Sports data
You will leave the course being able to independently apply the skills you’ve learned to datasets of your own choosing.
Course Structure 🍎
There will be three lectures a week. In lecture, we’ll introduce you to new ideas and concepts in programming and data science. Lecture attendance is a part of your grade; the specifics are explained in the Policies section below. However, lectures will be recorded and posted after class for you to review in the future. All lecture resources (slides, code, supplemental readings) will be linked on the course website. Note: Lecture recordings will only be accessible to students in the course; if you click the “Lecture recordings” link on the course homepage you’ll be brought to a page on Ed that only enrolled students can view.
During each lecture, there will be a few points at which we stop and ask you to answer a short question. We call these questions Quick Checks. They serve two purposes:
- For us to get a gauge of how well the class understands the material we’re currently covering
- For you to get a gauge of how well you understand the material we’re currently covering
Quick Checks are hosted on Ed using its “Lessons” feature, and will also be linked on the course website under each lecture. Quick Checks are graded on completion, not correctness. It’s not important to get these questions right on your first try – but it’s important to try them. You will be given time in lecture to answer them. If you have to miss a lecture for whatever reason, just answer that lecture’s Quick Check whenever you catch up on lecture.
Additionally, in some lecture notebooks, we will post optional practice problems. These are not required, but we recommend that you complete them.
There is also one lab section a week, that follows immediately after the Friday lecture. In lab, we’ll spend the first ~15 minutes going over some demos that are relevant to that week’s material. While there may be a notebook accompanying this demo (that we will post on the course website), there is no lab assignment. You’ll spend the remaining ~35 minutes working on that week’s homework with the help of your peers and course staff. The hope is that by participating in lab, you will be able to finish your homework quicker.
Lab attendance is a part of your grade.
You learn data science by doing data science, not by listening or reading about it. As such, homework assignments will be your primary source of learning in this class.
Homeworks primarily consist of programming problems. You will apply the skills you learned in recent lectures to accomplish tasks involving real data. Autograder tests in your notebook will tell you if you’re on the right track or not. Most homeworks will also include a few “written” problems, where you have to type your answer in text. These problems will be manually graded by a human.
Homeworks, like all course material, can be accessed by clicking the correct link on the course website. Clicking on the “Homework 3” link, for example, will bring you to a copy of the Homework 3 notebook in your own DataHub. This is where you will work on the assignment. Once you’re done, you will run the very last cell in the assignment to generate a
.zip file, which you will then upload to Gradescope so that we can grade it. This process will be walked through in lecture and in the first assignment.
There will be 11 homework assignments, which corresponds to roughly one per week. In general, homework assignments will be released on Thursday afternoon, and will be due the following Wednesday at 11:59PM. See the Policies section for our extensions and late submissions policy, as well as our homework drop policy.
Homework assignments are meant to be completed individually, but we encourage you to discuss approaches with others; see our Academic Honesty policy below. We may have a couple group-based or presentation-based homework assignments; this is TBD.
Office Hours and Ed
In addition to lecture and lab, we will host three office hours per week. In office hours, you’ll get a chance to ask questions about and (hopefully) work with your peers on assignments. You’ll also be able to ask conceptual questions about lecture material.
While office hours are not mandatory, we highly recommend attending them regularly as they’ll very likely cut down on the time you’ll need to spend on homeworks.
Furthermore, you’re encouraged to ask and answer questions about assignments and concepts on Ed.
Quizzes and Exams
In lieu of a midterm, we will have three small quizzes held during lecture, each worth 5% of your grade (this way, one bad quiz will not significantly impact your grade). Each quiz will focus on the material that was not assessed on the previous quiz. The scheduling for these is on the course homepage; the tentative dates are
- Quiz 1: Friday, February 12th [UPDATED]
- Quiz 2: Friday, March 19th [UPDATED]
- Quiz 3: Friday, April 16th [UPDATED]
We will have a final exam during the campus-assigned slot: Tuesday May 11th, 7-10PM. Unlike the quizzes, the final exam will be cumulative.
More relevant logistics for quizzes and exams will be announced on Ed.
A note on units
The official Berkeley policy is that a 3 unit class should consist of an average of 9 hours of work per week including class time (source). The breakdown in our class looks like this:
- 3 hours of lecture
- 1 hour of lab
- 5 hours of homework
In some weeks you will have to spend time studying for quizzes; in those weeks we will try to keep homework assignments short. We really want to make sure you don’t exceed this 9 hour mark on average; if you are, please let us know.
We will be using several websites this semester. Here’s what they’re all used for:
- Course Website: where all content will be posted.
- Ed: discussion forum where all announcements will be sent, and where all student-staff and student-student communication will occur. Also where Quick Checks are hosted and submitted.
- DataHub: where all assignments will be hosted. (You will not usually have to navigate here manually; assignment links on the course homepage bring you to the right place automatically.)
- Gradescope: where all homeworks are submitted and all grades live. (Not bCourses! 🙅)
Here’s how we will compute your grade.
|Quick Checks||5%||no drops|
|Weekly Surveys||5%||no drops|
|Homeworks||50%||11, with 1 drop (5% each)|
|Quizzes||15%||3, 5% each|
Starting in the second week of class, attendance will be taken. Each week, there are four class sessions – three lectures and one lab. Each class session you attend earns you 1 point.
There are a total of 52 available attendance points (4 points/week x 13 weeks, not including Week 1 or Spring Break). 39 attendance points are required for full credit, and attaining more than 39 attendance points will give you extra credit. So for instance:
- A student who earns exactly 39 points will have an attendance score of 5% * (39 / 39) = 5%.
- A student who earns 20 attendance points will have an attendance score of 5% * (20/39) = 2.56%.
- A student who earns 52 attendance points will have an attendance score of 5% * (52 / 39) = 6.67%.
This means that you can miss one class session per week on average and still receive full credit for attendance. We expect you to attend all class sessions; this policy is meant to provide leniency for the times that you’re unable to make it.
Given the state of our universe right now, we want to check in with you each week to hear how you’re doing, both academically and personally. Furthermore, since this is a new class, we’re very interested in receiving your feedback as to how it’s going and how we can improve.
As such, we will have feedback surveys for you to fill out roughly each week. These will coincide with homework assignments, e.g. Survey 2 and Homework 2 will come out and be due at the same time. These will be hosted on Google Forms, and will be posted on both the course homepage and on Ed. They will generally not be anonymous, so that we can reach out to you if we feel the need to based on your responses. However, there will be a few points in the semester for you to provide us with anonymous feedback about the course.
There are no drops for these (so you need to do them all for full credit), but we will be lenient with their deadlines.
There will be 11 homework assignments. We will drop your lowest homework assignment score, meaning your top 10 homework assignments will be graded. This means each homework is worth 5% of your overall grade in this class.
Late Policy and Extensions
Homework assignments are due to Gradescope at 11:59PM on the day that they are due, which will typically be Wednesday. We will have a small, undisclosed grace period to account for any technical difficulties; if you face any issues while submitting, please post on Ed ASAP (ideally before the deadline).
If you submit your homework late, and do not have an extension (see below), we will still accept your submission but you will lose 20% of the credit you earned per day late, at a maximum of two days late. So if you scored a 90% on a homework and submitted it a day late, your score will drop to a 72%, and if you submitted it two days late, your score will drop to a 54%. We will not accept homeworks past two days after the submission deadline.
Extensions: We know this is a stressful time, and we don’t want to penalize you because of circumstances that are out of your control. To request an extension on a homework, please email both Suraj and Isaac with the reason for your request and number of days you’re requesting an extension for (1 or 2). As long as your request is within reason, there’s a good chance of it being granted. Students with DSP accommodations that allow for late assignment submissions will still need to email Suraj and Isaac for extensions, but not with a reason.
This class does not satisfy any requirements for any program (other than that it counts towards the 120 unit minimum needed to graduate). As such, you’re not taking it to get a good grade – you’re taking it to learn!
Data science is a collaborative activity. As such, we encourage you to discuss homework assignments at a high level with others, and we even give you class time to do this in lab. With that said, we ask that you write your solutions individually in your own words. Rather than copying someone else’s work, ask for help. You are not alone in this course! We’re here to help you succeed. If you invest the time to learn the material and complete the assignments, you won’t need to copy any answers (taken from 61A). If you use code you found online, please cite it in a comment.
A note on letter grades
The following is adapted from CSE 160 at the University of Washington.
Grading for this class is not curved in the sense that the average is set at (say) a B+ and half of the class must receive a grade lower than that. If everyone does well and shows mastery of the material, everyone can receive an A (this would be awesome!). If no one does well (this is unlikely), then everyone can receive a C.
Grading for this class is curved in the sense that we do not have a pre-defined mapping from homework and exam scores to a final GPA. There is no pre-determined score (e.g., 90% of all possible points) that earns an A or a B or a C or any other grade. To determine the final grade, we will ask questions like “Did this student master the material?”.
Try your best not to worry about them, and we’ll reciprocate by being fair and lenient. We’re in this together 😎.
This class is loosely based on Data C6, taught by Ian Castro in Summer 2020 at UC Berkeley. That class was based on Data 8R, taught by Henry Milner in Summer 2017, also at UC Berkeley. Both classes were based on Data 8 at UC Berkeley.
When creating Data 94, we’ve referred to the materials of several other courses:
- Data 8, CS 10, and CS 61A at UC Berkeley
- CS 106A at Stanford
- CSE 160 at the University of Washington
The website uses Just the Class.