layout | title | nav_exclude | permalink | seo | ||||
---|---|---|---|---|---|---|---|---|
home |
CS 506 |
true |
index.html |
|
- Join Piazza and Discord
- Create a GitHub account
- Create a Kaggle account
- Fill out this form with your GitHub and Kaggle account username
- Install Python and Jupyter Notebook
- Sign up for GradeScope (code: NXZ56X)
The goal of this course is to provide students a hands-on understanding of classical data analysis techniques and to develop proficiency in applying these techniques in modern programming languages (Python) while also learning about the social and ethical challenges of collecting and mining data by studying real world examples.
The course introduces students to a wide range of techniques that are commonly used in the analysis of data, such as clustering, classification, regression, and network analysis. Broadly speaking, the course breaks down into three main components, which we will take in order of increasing complication: (a) unsupervised methods; (b) supervised methods; and (c) methods for structured data.
Lectures will present the fundamentals of each technique and aim to help students understand the practical settings in which these theoretical/analytical methods are useful. In class, we will also study use cases and go over relevant Python packages that will enable students to perform hands-on experiments with their data. Class discussion will, for the most part, be extended office hours, review, or extra coding exercises. However, this is not a Python course, so self-study will be necessary for those students who do not already know the language.
Students taking this class must have some prior familiarity with programming at the level of CS 105, 108, or 111, or equivalent. CS 132 or equivalent (MA 242, MA 442) is required. CS 112 is also helpful.
If emailing the CS506 staff, or creating a private Piazza post, please always CC or include the instructor, the CF, and all TAs.
BU Spark! offers students an opportunity to work on technical projects provided by companies or organizations in the Greater Boston area through our experiential learning lab (X-Lab). For this semester, Spark! has partnered with CS506 to offer a diverse selection of external data science projects scoped to support the course’s learning outcomes and enhance the student experience. To learn more about Spark!, please visit their website.
Spark! projects are a great opportunity for students to get real-world project experience to highlight on their github and CV. These projects have already been curated and will be presented during “Pitch Day”. Project descriptions will be made available at the start of the semester. BU Spark! projects will be led by one of the Spark! Project managers. Each project will be assigned a Spark! Technical Engineer to review PRs, review code, and provide technical support.
Teams will have 3-5 students. Students that decide to take on the role of team lead will receive extra credit. Teams will be formed based on availability and a project preference form that you will be asked to submit after Pitch Day.
Please read the following guide for details regarding roles and responsibilities as well as best practices for navigating BU Spark! projects.
Expectations of Spark! project deliverables are outlined here.
The homework assignments will be due throughout the semester as relevant material is covered. There will be 7 assignments in total. Clarifying questions (in class or on Piazza) are encouraged or make reasonable assumptions and justify your decisions.
Late homeworks will be accepted up to 48h after the due date. Late homeworks will incur a 12% penalty for the first 24h and a 20% penalty for next 24h. No points will be awarded after 48h. The lowest homework grade will be dropped at the end of the semester.
You may discuss questions but you must submit individual code. You must list your collaborators in the homework.
If you notice an issue with a grade you’ve received, you must submit a regrade on Gradescope within 48h of receiving the grade. Anything beyond 48h will not be accepted for a re-grade.
The midterm will be a Kaggle Data Science competition among the students in the class with a live leaderboard. Students will need to submit predictions based on a training dataset and a report detailing the methods used and decisions made.
Every lecture is accompanied with a worksheet. You may submit them up to 24h after the lecture. Worksheets are not mandatory but will contribute to extra credit. To receive credit, you need to reasonably complete them reasonably well. There will be no partial credit for worksheets. These are meant to help you develop practical skills learned in class.
Grading |
---|
20% midterm |
40% assignments |
40% final project |
5% extra credit |
Letter | Grade |
---|---|
A | 95% + |
A- | 90% - 95% |
B+ | 87% - 90% |
B | 83% - 87% |
B- | 80% - 83% |
C+ | 77% - 80% |
C | 73% - 77% |
C- | 70% - 73% |
D | 60% - 70% |
F | below 60% |
Extra credit can be earned by consistently:
- Attending class.
- Submitting completed worksheets.
- Asking and answering questions on Piazza.
- Submitting PRs to our class repository with code or class notes.
- Contributing to our class repository or course website (by fixing typos, providing clarification edits etc.)