Coder Social home page Coder Social logo
  • 👋 Hi, I’m @Abhik35
  • 👀 Interested in Data Science, Machine Learning and Artificial Intelligence
  • 🌱 I’m currently learning Python, Tableau, R, MySQL, Machine learing, Artificial intelligence and Deep learning
  • 💞️ I’m looking to collaborate on all topics related to Data Science, Machine Learning and Artificial Intelligence.
  • 📫 How to reach me on my email id [email protected] and linkedin id www.linkedin.com/in/sourajit-dey-3774661

Sourajit Dey's Projects

abhik35 icon abhik35

Config files for my GitHub profile.

arya.ai-binary-classification icon arya.ai-binary-classification

This is the assignment solution for the datascience role at Arya.ai. I have attempted a binary classification problem given the data, and have attempted feature selection, training (with validation) and presented the predictions

assignment-04-simple-linear-regression-2 icon assignment-04-simple-linear-regression-2

Salary_hike -> Build a prediction model for Salary_hike Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python.

assignment-association-rules-books icon assignment-association-rules-books

Association-Rules-Data-Mining-Books. Apriori Algorithm, Association rules with 10% Support and 70% confidence, Association rules with 20% Support and 60% confidence, Association rules with 5% Support and 80% confidence, visualization of obtained rule.

assignment-association-rules-my_movies icon assignment-association-rules-my_movies

Apriori Algorithm Association rules with 10% Support and 70% confidence Association rules with 5% Support and 90% confidence Lift Ratio > 1 is a good influential rule in selecting the associated transactions visualization of obtained rule

assignment-clustering-hierarchical-airlines icon assignment-clustering-hierarchical-airlines

Perform clustering (hierarchical) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained. Data Description: The file EastWestAirlinescontains information on passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers ID --Unique ID Balance--Number of miles eligible for award travel Qual_mile--Number of miles counted as qualifying for Topflight status cc1_miles -- Number of miles earned with freq. flyer credit card in the past 12 months: cc2_miles -- Number of miles earned with Rewards credit card in the past 12 months: cc3_miles -- Number of miles earned with Small Business credit card in the past 12 months: 1 = under 5,000 2 = 5,000 - 10,000 3 = 10,001 - 25,000 4 = 25,001 - 50,000 5 = over 50,000 Bonus_miles--Number of miles earned from non-flight bonus transactions in the past 12 months Bonus_trans--Number of non-flight bonus transactions in the past 12 months Flight_miles_12mo--Number of flight miles in the past 12 months Flight_trans_12--Number of flight transactions in the past 12 months Days_since_enrolled--Number of days since enrolled in flier program Award--whether that person had award flight (free flight) or not

assignment-dbscan-clustering-crimes- icon assignment-dbscan-clustering-crimes-

Perform Clustering for the crime data and identify the number of clusters formed and draw inferences. Data Description: Murder -- Muder rates in different places of United States Assualt- Assualt rate in different places of United States UrbanPop - urban population in different places of United States Rape - Rape rate in different places of United States

assignment-decision_tree-company_data icon assignment-decision_tree-company_data

Assignment About the data: Let’s consider a Company dataset with around 10 variables and 400 records. The attributes are as follows: Sales -- Unit sales (in thousands) at each location Competitor Price -- Price charged by competitor at each location Income -- Community income level (in thousands of dollars) Advertising -- Local advertising budget for company at each location (in thousands of dollars) Population -- Population size in region (in thousands) Price -- Price company charges for car seats at each site Shelf Location at stores -- A factor with levels Bad, Good and Medium indicating the quality of the shelving location for the car seats at each site Age -- Average age of the local population Education -- Education level at each location Urban -- A factor with levels No and Yes to indicate whether the store is in an urban or rural location US -- A factor with levels No and Yes to indicate whether the store is in the US or not The company dataset looks like this: Problem Statement: A cloth manufacturing company is interested to know about the segment or attributes causes high sale. Approach - A decision tree can be built with target variable Sale (we will first convert it in categorical variable) & all other variable will be independent in the analysis.

assignment-decision_trees-fraudcheck icon assignment-decision_trees-fraudcheck

Use decision trees to prepare a model on fraud data treating those who have taxable_income <= 30000 as "Risky" and others are "Good" Data Description : Undergrad : person is under graduated or not Marital.Status : marital status of a person Taxable.Income : Taxable income is the amount of how much tax an individual owes to the government Work Experience : Work experience of an individual person Urban : Whether that person belongs to urban area or not

assignment-k-means-clustering-airlines- icon assignment-k-means-clustering-airlines-

Perform clustering (K means clustering) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained. Data Description: The file EastWestAirlinescontains information on passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers ID --Unique ID Balance--Number of miles eligible for award travel Qual_mile--Number of miles counted as qualifying for Topflight status cc1_miles -- Number of miles earned with freq. flyer credit card in the past 12 months: cc2_miles -- Number of miles earned with Rewards credit card in the past 12 months: cc3_miles -- Number of miles earned with Small Business credit card in the past 12 months: 1 = under 5,000 2 = 5,000 - 10,000 3 = 10,001 - 25,000 4 = 25,001 - 50,000 5 = over 50,000 Bonus_miles--Number of miles earned from non-flight bonus transactions in the past 12 months Bonus_trans--Number of non-flight bonus transactions in the past 12 months Flight_miles_12mo--Number of flight miles in the past 12 months Flight_trans_12--Number of flight transactions in the past 12 months Days_since_enrolled--Number of days since enrolled in flier program Award--whether that person had award flight (free flight) or not

assignment-knn-glass icon assignment-knn-glass

Prepare a model for glass classification using KNN Data Description: RI : refractive index Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes 4-10) Mg: Magnesium AI: Aluminum Si: Silicon K:Potassium Ca: Calcium Ba: Barium Fe: Iron Type: Type of glass: (class attribute) 1 -- building_windows_float_processed 2 --building_windows_non_float_processed 3 --vehicle_windows_float_processed 4 --vehicle_windows_non_float_processed (none in this database) 5 --containers 6 --tableware 7 --headlamps

assignment-pca-data-mining-wine- icon assignment-pca-data-mining-wine-

Perform Principal component analysis and perform clustering using first 3 principal component scores (both heirarchial and k mean clustering(scree plot or elbow curve) and obtain optimum number of clusters and check whether we have obtained same number of clusters with the original data (class column we have ignored at the begining who shows it has 3 clusters) PCA Implementation Checking with other Clustering Algorithms 1.Hierarchical Clustering 2.K-Means Clustering Build Cluster algorithm using K=3

assignment-random_forest-company_data icon assignment-random_forest-company_data

Random Forest Assignment About the data: Let’s consider a Company dataset with around 10 variables and 400 records. The attributes are as follows:  Sales -- Unit sales (in thousands) at each location  Competitor Price -- Price charged by competitor at each location  Income -- Community income level (in thousands of dollars)  Advertising -- Local advertising budget for company at each location (in thousands of dollars)  Population -- Population size in region (in thousands)  Price -- Price company charges for car seats at each site  Shelf Location at stores -- A factor with levels Bad, Good and Medium indicating the quality of the shelving location for the car seats at each site  Age -- Average age of the local population  Education -- Education level at each location  Urban -- A factor with levels No and Yes to indicate whether the store is in an urban or rural location  US -- A factor with levels No and Yes to indicate whether the store is in the US or not The company dataset looks like this: Problem Statement: A cloth manufacturing company is interested to know about the segment or attributes causes high sale. Approach - A Random Forest can be built with target variable Sales (we will first convert it in categorical variable) & all other variable will be independent in the analysis.

assignment-recommendation-system-data-mining-books- icon assignment-recommendation-system-data-mining-books-

Recommend a best book based on the ratings: Sort by User IDs number of unique users in the dataset number of unique books in the dataset converting long data into wide data using pivot table Replacing the index values by unique user Ids Impute those NaNs with 0 values Calculating Cosine Similarity between Users on array data Store the results in a dataframe format Set the index and column names to user ids Nullifying diagonal values Most Similar Users extract the books which userId 162107 & 276726 have watched extract the books which userId 276729 & 276726 have watched

assignments-forecasting-airlines-data icon assignments-forecasting-airlines-data

Forecast Airlines Passengers data set. Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting

assignments-forecasting-coca_cola_sales_rawdata icon assignments-forecasting-coca_cola_sales_rawdata

Forecast the CocaCola prices data set. Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.

assignments-hypothesis-costomer-orderform icon assignments-hypothesis-costomer-orderform

TeleCall uses 4 centers around the globe to process customer order forms. They audit a certain % of the customer order forms. Any error in order form renders it defective and has to be reworked before processing. The manager wants to check whether the defective % varies by centre. Please analyze the data at 5% significance level and help the manager draw appropriate inferences

assignments-hypothesis-testing-cutlets icon assignments-hypothesis-testing-cutlets

A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions.

assignments-hypothesis-testing-labtat icon assignments-hypothesis-testing-labtat

A hospital wants to determine whether there is any difference in the average Turn Around Time (TAT) of reports of the laboratories on their preferred list. They collected a random sample and recorded TAT for reports of 4 laboratories. TAT is defined as sample collected to report dispatch. Analyze the data and determine whether there is any difference in average TAT among the different laboratories at 5% significance level.

assignments-neural-networks-gas-turbines icon assignments-neural-networks-gas-turbines

The dataset contains 36733 instances of 11 sensor measures aggregated over one hour (by means of average or sum) from a gas turbine. The Dataset includes gas turbine parameters (such as Turbine Inlet Temperature and Compressor Discharge pressure) in addition to the ambient variables. Problem statement: predicting turbine energy yield (TEY) using ambient variables as features. Attribute Information: The explanations of sensor measurements and their brief statistics are given below. Variable (Abbr.) Unit Min Max Mean Ambient temperature (AT) C –6.23 37.10 17.71 Ambient pressure (AP) mbar 985.85 1036.56 1013.07 Ambient humidity (AH) (%) 24.08 100.20 77.87 Air filter difference pressure (AFDP) mbar 2.09 7.61 3.93 Gas turbine exhaust pressure (GTEP) mbar 17.70 40.72 25.56 Turbine inlet temperature (TIT) C 1000.85 1100.89 1081.43 Turbine after temperature (TAT) C 511.04 550.61 546.16 Compressor discharge pressure (CDP) mbar 9.85 15.16 12.06 Turbine energy yield (TEY) MWH 100.02 179.50 133.51 Carbon monoxide (CO) mg/m3 0.00 44.10 2.37 Nitrogen oxides (NOx) mg/m3 25.90 119.91 65.29

assignments-text-mining-elon_musk icon assignments-text-mining-elon_musk

For Text Mining assignment ONE: 1) Perform sentimental analysis on the Elon-musk tweets (Exlon-musk.csv) TWO: 1) Extract reviews of any product from ecommerce website like amazon 2) Perform emotion mining

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.