Coder Social home page Coder Social logo

Social banner for Subhanjan

A Full-Stack Data Professional, experienced: 🛠️ Data Engineer | 👨🏻‍💻 Developer | 🕵🏻 Data Analyst | 🧬 Data Scientist & 🤖 AI+ML

Being a creative tech enthusiast, I love working + learning new softwares, tools, technologies & platforms: ChatGPT ML Ops

👨  My Background

I am a postgrad student of Business Analytics with over a year of professional experience in eCommerce and Internet Services Industry.

I started in 2020 with Python, making simple data exploration projects and expanding my knowledge over time. Around mid-to-end 2021, I started to learn Machine Learning and Deep Learning concepts with Python libraries like SciKitLearn, Keras, TensorFlow to create predictive models. During this time I also started with my Analytics post graduate program and learned Big Data tools like Apache hadoop with Hive and Pig for web scraping and Business Intelligence tools like Tableau, Power BI and IBM Cognos. I am currently working at Tucows as a Customer Intelligence Researcher, building a strong foundation in data analytics and reporting.

Over the last year, my knowledge and experience with Business Intelligence tools have expanded, as has my interest. I am proficient in using Tableau and Power BI with Python and SQL environment, as well as Google Cloud Platform. I also have a solid understanding of Mathematics and Statistics, and am able to work with large and complex datasets. My goal with data analytics, visualization and Reporting is to help others. I enjoy being able to create something that stakeholders can use to make their decisions easier and data driven.

✨  My Portfolio
  • Data Visualization and Dashboarding: Tableau Power BI Google Analytics Looker Alteryx

    • E-Commerce Sales Analysis | Minimal Overview Dashboard -
      Built a dashboard using Tableau that analyzes credit card complaints data. The dashboard allows for a comprehensive analysis of the data through the use of custom calculations and parameters. This enables users to identify patterns and trends in the data, and make data-driven decisions. The visualizations in the dashboard are interactive and visually appealing, making it easy to understand and interpret the data. The purpose of the project is to improve customer satisfaction and reduce complaints by gaining a better understanding of the complaints data.
    • Modern Retail Sales Dashboard | Aesthetic Light and Dark Themes -
      This Tableau dashboard presents a modern and aesthetic analysis of retail sales, with light and dark themes for user preference. Key performance indicators (KPIs) are displayed with current and previous year sparklines and min-max indicators, and users can customize the dashboard with global filters. An interactive text summary of sales by region allows for a quick and easy view of performance by location.
    • A 100 Years of Earthquakes - Analysis of a century of Earthquakes | Story Book using Tableau -
      This Tableau dashboard provides a comprehensive analysis of 100 years of earthquakes, presenting a visual representation of the data by year and magnitude, as well as a distribution of the earthquakes by class and magnitude. The dashboard also features an interactive earthquake map with filters for magnitude, damages, injuries, number of houses destroyed, number of missing, and number of deaths, allowing users to gain deeper insights into the impact of earthquakes over the past century.
    • Bank and Credit Card Complaints Analysis using Tableau -
      Built a dashboard using Tableau that analyzes credit card complaints data. The dashboard allows for a comprehensive analysis of the data through the use of custom calculations and parameters. This enables users to identify patterns and trends in the data, and make data-driven decisions. The visualizations in the dashboard are interactive and visually appealing, making it easy to understand and interpret the data. The purpose of the project is to improve customer satisfaction and reduce complaints by gaining a better understanding of the complaints data.
    • Employee Attrition - What makes employees quit? | Futuristic Tableau and Power BI Dashboards -
      This is an in-depth project that utilizes Tableau, Power BI, Python, Pig Latin, and Hadoop to gain a deeper understanding of IBM's workforce. The project meticulously investigates the Key Risk Indicators (KRIs) that influence employee attrition by leveraging the power of big data analysis. The project's results, in the form of recommendations, aim to aid IBM in enhancing employee retention and minimizing turnover rates. The project exemplifies the capability of advanced big data tools and visualization techniques to unveil actionable insights from large datasets.
  • Predictive Analytics and Machine Learning: Python TensorFlow PyTorch Pandas SAS SKLearn Keras R

    • Artificial Neural Networks for Fraud Detection in Supply Chain Analytics: A Study on MLPClassifier and Keras -
      This study was aimed to detect fraudulent activities in the supply chain through the use of neural networks. The study focused on building two machine learning models using the MLPClassifier algorithm from the scikit-learn library and a custom neural network using the Keras library in Python. Both models were trained and tested on the DataCo Supply Chain dataset. The results showed that the custom neural network achieved an accuracy of 97.67% in detecting fraudulent transactions, demonstrating its potential to minimize financial losses for organizations.
    • US Flight Delays Prediction Models based on Naïve Bayes, Regression Tree, and Logistic Regression Algorithms -
      This project uses Python and Scikit-learn library to predict flight delays in the United States using three machine learning algorithms (Naive Bayes, Regression Tree, and Logistic Regression). The data collected, preprocessed and divided into training and test sets to train and evaluate the prediction models. The Logistic Regression algorithm achieved the highest accuracy of 85.14% in predicting flight delays. The project serves as a valuable tool for airlines and airport management to improve flight schedules and reduce the number of flight delays for passengers.
    • Predicting Housing Prices Using Multiple Linear Regression and k-NearestNeighbours (kNN) -
      The objective of this project was to predict housing prices using two modeling techniques, multiple linear regression and k-Nearest Neighbours (kNN). The project aimed to construct accurate models to estimate real estate values by identifying relevant factors and their impact on the property's price. The multiple linear regression model was deemed to be the most suitable for prediction, with low Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). The kNN model with 10 nearest neighbors also performed well, with a low RMSE.
    • Supermarket Organic Product Purchase Prediction - Data Mining and Modeling with SAS -
      This project aimed to predict customer purchasing behavior for a supermarket's new line of organic products. Using data mining techniques, the customer loyalty program data was analyzed to identify factors affecting organic product purchases. The data was modeled using SAS Enterprise Miner to create accurate predictive models. The results of this study could assist the supermarket in understanding their customer base and effectively target marketing efforts.
  • DataBase Scripting, Querying and Analysis: SQL SQLite MariaDB Cassandra Neo4j NoSQL PostgreSQL

    • RDBMS to GraphDB - Big Data Analytics using Neo4j -
      This project involves migration from a traditional RDBMS to Neo4j for big data analytics. Using graph database technology, various business-critical questions are addressed, including identifying the employees who sold Tofu, the products sold with Tofu, the total number of products, top 5 products by sales, and the category with the highest sales. Neo4j's efficiency and effectiveness in managing big data provides valuable insights for decision making.
    • Data Analysis for Digital Music Store using SQL -
      This project is a data analysis of Chinook Digital Music Store using SQL queries and PostgreSQL database. The project aimed to identify and optimize business opportunities by analyzing customer and sales data, answering questions such as top-selling genres, top-selling artists, total value of sales by country. Data visualization techniques were used to present the results in an easy-to-understand format.
  • Big Data Analytics and Cloud: Azure AWS Docker Hadoop GCP

    • Worldwide Sales Data Analysis and Exploration using Zeppelin, HDFS and Spark -
      This project aimed to analyze and understand worldwide sales data through the use of Zeppelin and HDFS. The primary objective was to utilize Spark's basic Scala commands and SQL to query and manipulate the data, providing valuable insights and findings for the customer.
    • User, Occupation and Movies, Ratings Data Exploration using Apache Hive -
      In this project, the objective was to analyze the "User, Occupation, Movies, and Ratings" dataset using Apache Hive. The data was processed and analyzed using Hive's SQL-like query language and MapReduce framework, making it easier to handle large datasets. The focus of the analysis was to provide a comprehensive breakdown of the data and uncover key insights into user preferences and trends.
  • Advanced Excel, IBM SPSS Modler, IBM Cognos Analytics and Others: Excel SPSS Cognos

    • MoneyBall: Sports Predictive Analytics | Advance Excel and Data Analysis Toolpak -
      This project used advanced Excel tools such as Solver and Data Analysis ToolPak to optimize a baseball team's lineup and maximize the expected return to risk ratio while adhering to a set salary budget. Data on over 500 players was collected, cleaned and analyzed to identify the best players and positions. Data visualization techniques were used to present the results in an easy-to-understand format. The project provided valuable insights into building a winning team within a budget constraint
    • IBM SPSS - A Comprehensive Guide to Data Analysis and Data Modeling -
      IBM SPSS Modeler is a comprehensive data analysis and modeling tool. This repository is a compilation of exercises outlined in the "Introduction to IBM SPSS Modeler" document by IBM. It covers the essential steps of data import, preparation, visualization, and model building. The repository includes building decision trees and linear regression models, demonstrating the tool's modeling capabilities.
    • Telecomm Customer Churn - Data Modeling and Finding Main Drivers with IBM Cognos Analytics -
      In this project, IBM Cognos Analytics was used to analyze Telecomm customer churn data to determine the main drivers affecting customer churn. By answering questions such as what were the top three key drivers affecting churn, insights were gained on customer tenure with fiber optic, payment method, and internet service type. The results showed that customers with a tenure less than three months and fiber optic service, paying with electronic check, had the highest churn rate.

⏩   and many more

🛠️  My Stack  
  • 🛢 Databases || Db2, Redis, Dynamo, MongoDB, Postgres, Cassandra

  • 🧑🏻‍💻 Programming || Python, SQL, HiveQL, SAS, Scala, Shell/UNIX, R, C

  • 📶 BI Tools|| Tableau, Power BI, Looker, Cognos, Alteryx, SAS BI, GA4

  • 🔢 Big Data || Spark, Hadoop, Hive, Sqoop, HBase, Kafka, Impala, Hue

  • 💭 Azure Stack || ADLS, Databricks, Visual Studio, Synapse, ADF, AKS

  • 💭 AWS Stack || Glue, EC2, S3, Athena, Redshift, Lambda, IAM, RDS

  • 💭 GCP Stack || BigQuery, Looker, Pub/Sub, Cloud Storage, Dataproc

  • 🔗 DevOps || Docker, Kubernetes, Jenkins, Git, Azure, YAML, JSON

  • 🤖 AI/ML || Sklearn, Pytorch, TF, Keras, AzureML, SageMaker, AutoML

  • 🎯 SDLC || SAFe® Agile, Kanban, Jira, Confluence, Scrum, Waterfall

  • 📝 Code Management || Github, BitBucket, GitLab, AWS CodeCommit

  • 🧮 Mainframe || COBOL, JCL, VSAM, DB2, TSO/ISPF, TSYS TS2®, zOS

🔏  My Certifications 
🔬  My Publications 

⏩   and many more

Tableau Power BI python mysql java Hadoop Hive Scala java sqlite PyTorch TensorFlow IBM Cloud

👨‍💻 All of my projects are available at Github, Tableau Public, Kaggle


📄 To know about my experiences have a look at my resume


🔗  Connect with me

subhanjansd subhanjan-das subhanjan33

Handy : Tableau Python SQL DataBricks C++ Apache Spark Hadoop Hive Azure Kafka DynamoDB DataBricks Kotlin Flask

Subhanjan Das's Projects

-chess-queens---eda-using-plotly icon -chess-queens---eda-using-plotly

Introduction About the Game - Chess is a two-player strategy board game played on a checkered board with 64 squares arranged in an 8×8 square grid. Governing Body - The International Chess Federation (FIDE) governs international chess competition. FIDE used Elo rating system for calculating the relative skill levels of players. Dataset Details - The dataset contains details of Top women chess players in the world sorted by their Standard FIDE rating (highest to lowest above 1800 Elo) as updated in August 2020. The data includes all active and inactive players which can be identified by the Inactive_flag column. Note: All ratings are updated as published by FIDE in August 2020.

credit-eda-case-study icon credit-eda-case-study

This case study aims to give us an idea of applying EDA in a real business scenario. In this case study, we develop a basic understanding of risk analytics in banking and financial services and understand how data is used to minimise the risk of losing money while lending to customers.

data-cleaning-and-preparation-of-boston-housing-dataset---python-pandas icon data-cleaning-and-preparation-of-boston-housing-dataset---python-pandas

This project involves analysis of the Boston Housing Dataset using Python's Pandas library. Data cleaning is performed by dropping genuine outliers, resetting the index, and imputing missing values with the median of the columns. It is substituted with NaN for further analysis. The objective of this project is to clean and prepare the data

defcon27 icon defcon27

:octocat: Hello! This repository is for the welcome message on my Github Profile. ⭐ If you like it!

exploring-the-space-missions-eda- icon exploring-the-space-missions-eda-

Introduction This DataSet was scraped from https://nextspaceflight.com/launches/past/?page=1 and includes all the space missions since the beginning of Space Race (1957).

rdbms-to-graphdb---big-data-analytics-using-neo4j icon rdbms-to-graphdb---big-data-analytics-using-neo4j

This project involves migration from a traditional RDBMS to Neo4j for big data analytics. Using graph database technology, various business-critical questions are addressed, including identifying the employees who sold Tofu, the products sold with Tofu, the total number of products, top 5 products by sales, and the category with the highest sales.

simple-aged-cache icon simple-aged-cache

This project presents my implementation of a simple cache system that supports automatic expiration of entries.

soccer-match-predictor-end-to-end icon soccer-match-predictor-end-to-end

The Match Predictor is a web application that predicts the outcomes of soccer matches using various machine learning models. The backend is written in Python with Flask, and the frontend is built using TypeScript and React.

user-occupation-and-movies-ratings-data-exploration-using-apache-hive icon user-occupation-and-movies-ratings-data-exploration-using-apache-hive

In this project, the objective was to analyze the "User, Occupation, Movies, and Ratings" dataset using Apache Hive. The data was processed and analyzed using Hive's SQL-like query language and MapReduce framework, making it easier to handle large datasets. The focus of the analysis was to provide a comprehensive breakdown of the data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.