Coder Social home page Coder Social logo

vikrantdeshpande09876 / masterize_hospital_entities Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 2.0 39.42 MB

The goal was to maintain a ‘single version of truth’ for associated entities across the entire organization’s data sources. The RecordLinkage package is integrated with a wrapper recursive data-pipeline for de-duplicating of records and generating a master set. Similarity between two textual strings determines if they are a probabilistic match.

Home Page: https://github.com/vikrantdeshpande09876/Masterize_Hospital_Entities/blob/master/Documentation/Research_Paper_Work/Masterize_data_Text_similarity_scores.pdf

Jupyter Notebook 45.93% R 26.85% C 3.38% Python 23.84%
data-wrangling divide-and-conquer-approach etl-pipeline machine-learning pyspark python r text-similarity visualization master-data-management

masterize_hospital_entities's Introduction

Typing SVG

  • 🌱 Always learning and growing as a Data Scientistmeme, and Software Engineermeme.
  • 💬 Ask me about the world of data or astrophysics, and I promise something interesting.
  • 👯 Reach out if you're looking to collaborate on Machine Learning side projects, or need a partner for Leetcode problem-solving.
  • ✨ Insert random inspirational quote: "The sooner you start to code, the longer the program will take".
  • ⚡ I'm not always fun at parties:
programming_languages = [
                  🐍 Python: (Pandas, Numpy, Tensorflow, Scikit, Flask, PySpark, Airflow, BeautifulSoup, etc.),
                  📊 R (ggplot, Tidyverse, etc.),
                  🧮 SQL,
                  ☕ Java (SpringBoot),
                  🐧 Linux Bash Scripting,
                  。🇯‌🇸‌ Javascript
]
predictive_models = [LLMs, Prompt Engineering, NLP, Timeseries Forecasting, Classification, Regression, Clustering, Ensembling, Transformers, ...]
statistics_and_exploratory_analytics = [Hypothesis Testing, Power Analyses, Mixed Effect Modeling, Regression Analyses, A/B Testing, ANOVA, ...]
databases = [SQL Server, Postgres, MySQL, MongoDB, ...]
cloud_services = [Azure Machine Learning, Azure Functions, Azure Blob Storage, GCP Cloud Functions, Google Cloud Composer, Google Cloud Storage, AWS S3, RDS, Sagemaker, ...]
tools = [
                  🏷️ Git,
                  🐳 Docker,
                  ☸️ Kubernetes,
                  😊 HuggingFace.
                  🏗️ Tensorflow,
                  Tableau,
                  Heroku,
                  Kafka,
                  Airflow,
                  Informatica Workflows,
                  Jira,
                  Bitbucket,
                  Postman,
                  JMeter,
                  ...
]
cloud_certifications = [
                  Azure Certified Data Scientist,
                  AWS Cloud Certified Practitioner,
                  Deep Learning Specialization,
                  ...
]

Typing SVG

  • A Microsoft Teams Chatbot with a highly scalable backend for 90+ DAU using Azure OpenAI GPT4, HuggingFace gte4, intent-detection, advanced Retrieval Augmented Generation, and hybrid-search on a vector store (internal company project).
  • An NLP-based package to recommend cell-type annotations and help establish a controlled vocabulary in scRNA-seq datasets, using HuggingFace sentence-transformer models.
  • A multiple timeseries MLOps system for demand forecasting using ARIMA and FB Prophet models, and exogenous variables like discounts, price-hikes, number of housing-permits, consumer sentiment index, etc. (internship project).
  • A Credit Card fraud detection Random Forest based system deployed on GCP to monitor source-file changes using Cloud Functions and Airflow (Cloud Composer).
  • A distributed weather-reporting system deployed using microservices architecture in Kubernetes clusters for real-time streaming of inference-reports over Kafka topics.
  • An in-house master-data-management tool to recursively parse and merge subsets of data while tracking transitive dependencies, using PySpark, SQL Views, and Levenshtein similarity for string-comparisons.
  • An AI product to detect weapons in CCTV/webcam footage and immediately notify authorities using YOLOv3, Docker, Kubernetes, and Kafka.
  • Bronze medal for Revenue prediction via a Stacked ensemble-model of Gradient Boosting methods.

Typing SVG

Profile Views

GitHub stats Natural Polyglot

masterize_hospital_entities's People

Contributors

dependabot[bot] avatar prachighalsasi avatar vikrantdeshpande09876 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.