Coder Social home page Coder Social logo

msartortt / project-week-6 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ta-data-lis/project-week-6

2.0 0.0 0.0 6.61 MB

Hypothesis testing for a movie database

Python 100.00%
hypothesis-testing webscraping-data data-visualization data-cleaning data-formatting

project-week-6's Introduction

Ironhack Logo

Welcome to Your Own Project!

This project is completely up to you!* *terms and conditions may apply. Consult your TA or lead teacher for full details of this limited offer

Content

Project Description

In this project, you will think of a topic and problem, collect experimental data, complete an end-to-end analysis and present the results, all by yourself.

First, choose a topic of interest to you and understand what research has already been done in that area. What are some interesting questions that remain? Can you turn those questions into a product (i.e. can you extract value out of answering those questions)?

You will then collect some data you think could help answer those questions. Choose your main source of data wisely, since in this project you have a restriction that tries to emulate a common corporate setting: you won't have access to a census of the universe of your choice. You must collect the data yourself in such a way that the universe of datapoints available to you is limted. For example, you may be limited by time (e.g. watching and categorizing Youtube videos or Instagram pictures), by cost (e.g. querying Google Maps for public transport routes via the gui, without paying for the API access) or by access (e.g. surveying people on their preferences). In the end, you should aim at collecting between 30 and 100 observations (rows) and between 5 and 10 features (columns) per observation.

Once you have your data, complete an analysis that answers your original question and/or related ancilliary questions. Please make sure that the main observations you make hold to scientific scrutiny at some level of significance. You can and should supplement your analysis with visual intuition and highlights of hypothesis that the data seems to support, even if you are not necessarily able to hold those insights to the same level of scrutiny as your main question.

You can enrich your limited dataset with information from richer sources that you can obtain trough any means you've learned before (e.g. you may web scrape the weights of car models if that is one of your observations).

Like in the previous project, package your results with a product or service mindset. You will present your findings in a presentation (possibly supported by an interactive visualization) where you should evidence principles of dashboarding and storytelling.

Project Goals

  • Research, collect and analyse data on a topic of interest to you.
  • Feel free to use additional data to enrich your dataset, maybe using an API or web scraping.
  • Apply the statistical techniques we have learned, along with techniques from EDA.
  • Create useful and easily-interpretable plots.
  • Prepare a presentation keeping in mind the finer points of storytelling.
  • Communicate the results of your analysis clearly, accurately and engagingly.

Requirements

  • You must plan your project. That is why creating a Kanban or Trello Board is mandatory. You have a template for Trello here.
  • You CAN'T CODE until you project is planned.
  • Create a .gitignore file and include it in your repository.

Deliverables

  • All the scripts you used for your analysis.
  • Slides and a 5 minute presentation in the classroom.
  • Repository with your workflow + documentation + code. Even if you are working alone, you need to maintain good practices!
  • A short report including your motivation, methodology and results.

Mentoring

One of the TAs will be your mentor! Your mentor will:

  • Follow your project in general.
  • Check if you are following the tasks, your blockers, etc
  • Help/support you in specific questions.

Schedule

Monday

  • Think about a topic and propose some core questions.
  • Choose data that is relevant to your questions and devise ways of collecting such data.
  • Choose ancilliary data that would allow you to acheive your strech goals.
  • Look for documentation to give context to your project.
  • Write the README file in your repository.
  • Get approval for your project
  • DO NOT START CODING
  • Start collecting the data for your core questions

NO CODE UNTIL HERE

  • Tuesday - Thursday morning*
  • Data entry, cleaning and transformation.
  • Start the analysis. Remember all the techniques you have learned!
  • Prepare a draft of your first slides presentation (no analysis or conclusions yet): title, motivation, context, ...

Thursday afternoon

  • Rehearsal. Take the feedback and use it!
  • Finish the analysis. Finish the slides.
  • Final improvements!

Friday

  • Presentation!

Presentation

Presentations for this project will be in the classroom! Presentations will be EXACTLY 5 minutes long, with 2 additional minutes for questions. We will stop you!

Tips & Tricks

  • Organize yourself (don't get lost!).
  • Ask for help vs Google is your friend.
  • Define a simple approach first. You never know how the data can betray you ;)
  • Learn about your subject and understand what other research has been done before you.
  • You can use data from the projects your partners did in the last weeks.
  • Before making a graph, think about what you want to represent.

Resources

Here are some data sources that could be interesting to you:

project-week-6's People

Contributors

chriszapp avatar madrizml avatar msartortt avatar ta-data-lis avatar

Stargazers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.