Coder Social home page Coder Social logo

statistical_analysis's Introduction

Statistical Analysis

An exploration of different datasets using statistical methods and Python data science libraries.

Contents

  1. Descriptive Statistics
  2. Simple Linear Regression
  3. Multiple Linear Regression
  4. Logistic Regression
  5. Time Series Decomposition
  6. Clustering

Descriptive Statistics

An exploration of a star dataset used for stellar classification using the Python pandas library.

Descriptive Statistics - A Sample of the Stars

Simple Linear Regression (SLR)

A small dataset containing salary information of employees at a company is explored in this section using simple linear regression.

I make my initial explorations using pandas and then plot a scatter graph and line of best fit using matplotlib.

I then create a statistical model using the Object Modelling System (OMS) from statsmodels. I test the validity of this stats model using a machine learning module: sklearn.

Finally, I use this model to make a prediction on salary based on number of years worked at the company.

Single Linear Regression - Salaries

Multiple Linear Regression (MLR)

Here, I use MLR to analyse a dataset containing information about cars.

Using pandas, matplotlib and seaborn, I isolate the independent variables which most strongly influence the price of a car.

I then test for collinearity and use the results from these tests to refine my statistical model.

Finally, I use sklearn to validate my model, concluding that engine size and horsepower are the two most influential factors when predicting price.

Multiple Linear Regression - Cars

Logistic Regression

In this analysis, I look at the effects of sleep and hours spent studying on student pass rates.

I first explore the data using seaborn, pandas and matplotlib. I then create a logistic statistical model using statsmodels and test it with sklearn.

I further validate the model using the following methods:

  • Confusion matrix
  • True positive and true negative rate
  • Receiver Operating Characteristic (ROC) curve
  • Area Under the ROC Curve (AUC) method

Finally, I use the model to make a prediction.

Logistic Regression - Student Pass Rates

Time Series Decomposition

A breakdown of data taken from the French stock market over time. To provide more insight, the data is decomposed into four underlying components: level, trend, seasonal and noise.

Time Series Decomposition - French Stock Exchange

Clustering

In this section, I explore the relationship between different species of iris flowers, using clustering to group the data based on characteristics such as sepal length and petal width.

I first explore the dataset using pandas. I then set up a KMeans object using the sklearn library.

Using the KMeans object, seaborn and matplotlib, I then plot the data to visualise how the iris flowers can be grouped together.

Next, I perform a silhouette analysis, in order to find the optimal number of clusters for the dataset. With this new information, I create a new KMeans object and plot it on another scatter graph.

Clustering - Iris Flowers


Licensed under MIT.

statistical_analysis's People

Contributors

rob-writes-code avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.