Coder Social home page Coder Social logo

themrityunjaypathak / featureengineering Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 275 KB

Feature Engineering with Python

License: MIT License

Jupyter Notebook 100.00%
iqr outlier-removal zscore data-standardization dummy-variables imbalanced-data modified-zscore

featureengineering's Introduction

Feature Engineering

  • Feature engineering is a machine learning technique that leverages data to create new variables that aren’t in the training set.

  • It can produce new features for both supervised and unsupervised learning, with the goal of simplifying and speeding up data transformations while also enhancing model accuracy.

  • Feature engineering is required when working with machine learning models.

  • Regardless of the data or architecture, a terrible feature will have a direct impact on your model.

Importance of Feature Engineering

  • Feature Engineering is a very important step in machine learning.

  • Feature engineering refers to the process of designing artificial features into an algorithm.

  • These artificial features are then used by that algorithm in order to improve its performance.

Getting Started

  • Clone the repository to your local machine using the following command :
git clone https://github.com/TheMrityunjayPathak/FeatureEngineering.git

Different Feature Engineering Techniques

Dummy Variable

  • Dummy variables are qualitative variables or discrete variables that represent categorical data and can take the values as 0 or 1 to indicate the absence or presence of a specified attribute respectively.

Inter Quartile Range

  • In Descriptive Statistics, the Interquartile Range tells you the spread of the middle half of your distribution.

  • Quartiles segment any distribution that’s ordered from low to high into four equal parts.

  • The interquartile range (IQR) contains the second and third quartiles, or the middle half of your data set.

The Interquartile Range is found by subtracting the Q1 value from the Q3 value :

  • IQR = Q3 - Q1

  • Q3 = 3rd quartile or 75th percentile

  • Q1 = 1st quartile or 25th percentile

  • Q1 is the value below which 25 percent of the distribution lies, while Q3 is the value below which 75 percent of the distribution lies.

Z-Score

  • Z-score is a statistical measurement that describes a value's relationship to the mean of a group of values.

  • Z-score is measured in terms of standard deviations from the mean.

  • Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean.

  • The statistical formula for a value's z-score is calculated using the following formula:

Z-Score = ( x - μ ) / σ

where :

  • z = Z-score
  • x = the value being evaluated
  • μ = the mean
  • σ = the standard deviation

Modified Z-Score

  • A Modified Z-Score is more robust because it uses the median to calculate z-scores as opposed to the mean, which is known to be influenced by outliers.

Modified Z-Score = 0.6745(xi – x̃) / MAD

where :

  • xi = A single data value

  • x̃ = The median of the dataset

  • MAD = The median absolute deviation of the dataset

  • Value's with Modified Z-Scores less than -3.5 or greater than 3.5 be labeled as potential outliers.

Data Standardization

  • Standardization is a scaling method where the values are centered around the mean with a unit standard deviation.

  • This means that the mean of the attribute becomes zero, and the resultant distribution has standard deviation equal to 1.

Handling Imbalance Dataset

  • Imbalanced data refers to those types of datasets where the target class has an uneven distribution of observations.

  • In an Imbalance Data one class label has a very high number of observations and the other has a very low number of observations.

Scroll to Top ⬆️

featureengineering's People

Contributors

themrityunjaypathak avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.