Coder Social home page Coder Social logo

lenamax2355 / data-explorer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mbrady4/data-explorer

1.0 0.0 0.0 4.23 MB

A collection of functions to accelerate exploration of a Dataset

License: MIT License

Python 13.08% Jupyter Notebook 86.92%

data-explorer's Introduction

Data Explorer

A class with helper methods to accelerate exploration of a Dataset

An instance of the explorer class is initialized with a Pandas DataFrame of features (i.e. independent variables) and a Pandas Series containing the target values (i.e., dependent variables).

The following helper methods are available:

  • data_dict(self): Creates a DataFrame intended to orient the user to a dataset. For each column of the DataFrame, the following is summarized: datatype, cardinality, null values, correlation w/target, skew, outlier count, list of outlier values, count of suspicious values, and a list of suspicious values.
  • get_cardinality(self, include_numeric): Determines the cardinality of each column.
  • corr_with_target(self): Determines correlations of numeric columns with a specified target.
  • detect_suspicious(self, custom_vals): Searches a DataFrame for values that should possibly be considered np.nan.
  • detect_outliers(self, method, threshold): Identify outliers in numeric columns using the specified outlier detection method.
  • outlier_modified_z_score(self, threshold): Identify outliers in numeric columns using the modified z score range method. A modified z score uses the median rather than the mean.
  • outlier_z_score(self, threshold): Identify outliers in numeric columns using the z-score method.
  • outlier_iqr(self, threshold): Identify outliers in numeric columns using the interquartile range method.

Installation:

pip install -i https://test.pypi.org/simple/ data-explorer

Implementation:

The following code block returns a useful summary that can inform further data exploration, cleaning, and engineering:

# Import dataset to use for demonstration purposes
import pandas as pd
king = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/kc_house_data.csv')

# Assumes the data_explorer package has already been installed
from Data_Explorer.explorer import explorer

# Initialize an instance of the explorer class 
ex = explorer(king, king['price'])

# Call the data_dict() method
ex.data_dict()

Data_Dict Output

data-explorer's People

Contributors

mbrady4 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.