Coder Social home page Coder Social logo

statistical-analysis-project's Introduction

IronHack Logo

Project: Statistical Analysis

Overview

The goal of this project is for you to practice statistical analysis using the iterative data analysis process. For this project, you will use this Housing Prices dataset we choose for you. You need to download the train.csv dataset then use your statistical analysis skills to analyze this dataset. The goal of your analysis is to identify the most important features of houses that affect the sale prices.

You will be working individually for this project, but we'll be guiding you along the process and helping you as you go.


Technical Requirements

The technical requirements for this project are as follows:

  • Try to apply everything you have learned so far about data analysis (in creative ways if you can) such as data cleaning, data manipulation, data visualization, and various statistical analysis methods.

  • Apply the iterative data analysis process -- setting expectations, collecting information, and reacting to data / revising expectations.

  • Conduct your analysis in Jupyter Notebook using Pandas, Numpy, Scipy, Matplotlib, Seaborn, Plotly, and other Python libraries you have learned, as necessary.

Necessary Deliverables

The following deliverables should be pushed to your Github repo for this project.

  • A Jupyter Notebook (statistical-analysis.ipynb) containing your Python codes, outputs, and data visualizations. Make sure to include explanations for each of your steps in Markdown cells or Python comments.

  • [optional] A README.md file containing any additional information.

Suggested Ways to Get Started

  1. Explore data and understand what the fields mean.

  2. Examine the relationships between the sales price and other features in the dataset. Use data visualization techniques to help you gain intuitive understanding of the relationships.

  3. Make informed guess on which features should be investigated in depth.

  4. Data cleaning & manipulation. Apply the following techniques as appropriate:

    • Adjust skewed data distribution.
    • Remove columns with high proportion of missing values.
    • Remove records with missing values.
    • Feature reduction.
    • Convert categorical data to numerical.
  5. Compute field relationship scores with the chosen statistical model.

  6. Present your findings in statistical summary and/or data visualizations.

Project Feedback + Evaluation

  • Technical Requirements: Did you deliver a project that met all the technical requirements? Given what the class has covered so far, did you build something that was reasonably complex?

  • Creativity: Did you add a personal spin or creative element into your project submission? Did you incorporate domain knowledge or unique perspective into your analysis.

  • Code Quality: Did you follow code style guidance and best practices covered in class?

  • Total: Your instructors will give you a total score on your project between:

    Score Expectations
    0 Does not meet expectations
    1 Meets expectactions, good job!
    2 Exceeds expectations, you wonderful creature, you!

This will be useful as an overall gauge of whether you met the project goals, but the more important scores are described in the specs above, which can help you identify where to focus your efforts for the next project!

statistical-analysis-project's People

Contributors

evankiske avatar ta-data-mex avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.