you and data science

Capstone Project: You and Data Science

Due: Tuesday, July 27, 2021 at 6 pm CDT

Throughout this semester, you have grown into an amazing Data Scientist! You are analyzing datasets in Python, performing advanced statistical tests, and finding the answers to complex questions using data. You have seen dozens of datasets we have provided. For the final project, we want you to teach us something about your Chicago community - we want to learn about something you are passionate about!

For this final project in Discovering Data Science, you will use Data Science to explore something you are passionate about or interested in learning more about in your Chicago community. At the end, you will write a small paper telling us about what you found and teaching us something! We only have a few minimal requirements:

  • You must use a non-trivial dataset. The dataset must have at least 200 data points (this could be 20 rows with 10 columns, 50 rows with 4 columns, etc).
  • You must do some analysis using Python. You will turn in your code. You must do something, but it could be anything.
  • You must submit a paper/report that provides a summary of what you found and teach us about your passion/interest. The paper must be at least 1 page (and single-spaced), but up to half of that page can be figures/graphs. Full details below. We are excited for everything we are going to learn from you! :)

Setting Up Your Project Workspace

To complete this project, there is no starter code – you are building it from scratch! However, we do want to check out your work so make sure you keep your code, dataset, and other files together in one place to share or submit your project when you turn it in in Week 5.


Dataset

Our hope is that you will use a dataset you are passionate about that relates to your Chicago community. It can be anything – it can be a dataset used from another class (eg: think if you had any data you get in Excel), it can be a dataset you found online, or it can be a dataset you gather yourself. Some ideas include:

  • A dataset about a hobby you’re interested in (eg: vacation destinations, best beaches, fashion trends, instagram, etc)
  • A dataset about something you enjoy doing or watching (eg: swimming, volleyball, Rocket League, Chicago Bears, etc)
  • A dataset about your a topic related to your major (economics, communications, political science, etc)
  • Any dataset that means something to you.

The dataset must relate to Chicago. A great place to find lots of Chicago data is the Chicago Data Portal. You should be able to find plenty of datasets in the Chicago Data Portal, but if you want to look in other places for Chicago related datasets you can search through these other free resources that contain millions of datasets:


Project Report

The major deliverables for this project is a small paper or report over what you found and your code. We want to learn something from you about your interest/passion in the Chicago community, so tell us a story about what you discovered!

The only requirements are:

  1. Your report must be at least one page. (It can be more, use enough space to tell us what amazing things you found.)
  2. Your report must be single spaced. (The default settings on Word or Google Docs is great, using line spacing of up to 1.15; the real-world is not double-spaced.)
  3. Your font size should not be greater than 12. (The default settings on most applications is 11, which seems great.) Feel free to include images, diagrams, figures, etc! The only requirement is that we want at least half a page of text in your report (you can have 3 pages of diagrams so long as there’s at least half a page of text somewhere in it all.)
  4. Your audience is going to be the class. You do not need to explain Python or Data Science to us, but you should not assume we know anything about your specific interest/passion.
  5. Your report must include four sections (This only needs to be at least one page with at least half a page being text):
    1. Introduction: Why is the dataset important to you? What do you want to discover about it?
      • Must include source of the dataset.
    2. Methods: Briefly summarize what steps you took to analyze the data. This would include processing, cleaning, modifying, grouping, using algorithms, etc. - write about what you did but keep it concise.
    3. Results: A summary of the exciting discovery you made! This will be when you write about your discovery - you can show any interesting plots here as well.
    4. Conclusion: What would you like to do next? Did this analysis inspire you to go and discover new things? Tell us about the next analysis you’d like to do! (It can be about the same topic or even a new one that this project got you thinking about!)

Project Presentation

You need to prepare a short 2-5 minute presentation (i.e. google slides, prezi, or any other presentation tool you wnat to use) to share your exciting discovery with your class! You don’t have to explain any of the Python or Data Science to us, but you should not assume we know anything about your specific interest/passion.

Presentations will take place during the last two days of classes (i.e. Wednesday and Thursday, July 28-29). Time of presentations will be announced later.


Submission

When you are ready to submit, there are three things you will submit.

  1. Dataset Source Link
  2. Code
  3. Report

You can submit or share your deliverables in any method you prefer (i.e. Google Drive link, GitHub, or other methods) but you must make a submission in Gradescope indicating the method with your files or link if applicable. If submission fails, then email your files or link to Jonas at wreger2@illinois.edu

We can’t wait to read your project and see your presentation! :)


LinkedIn

Once you have submitted your project and recieved feedback, you can upload your project description and link to your LinkedIn Profile! Also, make sure the link is viewable by anyone so your future employers can see them! :)


Modified from Wade Fagen-Ulmschneider & Karle Flanagan’s STAT 107 - Fall 2019/Spring 2020 guides with permission.