Coder Social home page Coder Social logo

Hi 👋, I'm Yu Hsin

✨ A passionate Data Scientist

🎓 Data Science Graduate @ University of San Francisco

💼 Currently working as a Data Science intern @ Salk Institute

🤖 Machine Learning: Deep Learning, Regression, Decision Tree, Clustering, Gradient Boosting, XGBoost, Random Forest

⚒ MLOps: MLflow, Weights & Biases, DVC, Great Expectations, Metaflow, Airflow, Evidently, Streamlit

📫 How to reach me: You can message me on LinkedIn or by Email

Languages and Tools

Python Google Cloud Apache Airflow MongoDB MySQL Postgres NumPy Pandas Plotly PyTorch scikit-learn SciPy Docker


Yuhsin Wang's Projects

adaboost-and-gradient-boosting icon adaboost-and-gradient-boosting

This repository contains two Python files that implement the Adaboost and Gradient Boosting algorithms. These algorithms are popular ensemble methods used in machine learning for both classification (Adaboost) and regression (Gradient Boosting) tasks.

alumni-profile-matching icon alumni-profile-matching

Alumni Profile Matching is a project aimed at facilitating networking between graduate students and alumni with similar backgrounds and career goals. By leveraging machine learning techniques and data processing pipelines, the project aims to provide graduate students with personalized recommendations of alumni profiles to connect with.

data-translation-pipeline icon data-translation-pipeline

This repository contains a set of Python scripts that allow you to convert CSV (Comma-Separated Values) files to different formats such as HTML, JSON, and XML.

decision-tree icon decision-tree

This repository contains a Python implementation of a decision tree algorithm. The decision tree is a popular machine learning algorithm used for both classification and regression tasks. This implementation provides classes for building decision trees for classification and regression purposes.

feature-importance-and-selection icon feature-importance-and-selection

Feature importance refers to a measure of how important each feature/variable is in a dataset to the target variable or the model performance. It can be used to understand the relationships between variables and can also be used for feature selection to optimize the performance of machine learning models.

hashtable icon hashtable

A HashTable is a Python class that implements a basic hash table data structure. A hash table, also known as a hash map, is a data structure that provides efficient storage and retrieval of key-value pairs. It is commonly used when there is a need for fast lookup of values based on a given key.

kmeans-algorithm icon kmeans-algorithm

This repository contains a Python implementation of the K-Means algorithm. The K-Means algorithm is an unsupervised machine learning algorithm used for clustering data points into groups or clusters. It is a popular algorithm for data analysis, pattern recognition, and image compression.

log_analytics icon log_analytics

This repository contains two Python scripts for log analytics: kafka_producer.py and spark_stream.py. These scripts are designed to work together to process log data using Apache Kafka and Apache Spark.

matrix-factorization icon matrix-factorization

This repository contains a Python script mf.py that implements Matrix Factorization for collaborative filtering. Collaborative filtering is a technique used in recommendation systems to predict user preferences by collecting information from many users. Matrix Factorization is one of the popular methods used in collaborative filtering.

mini-python-projects icon mini-python-projects

This repository contains a collection of 6 mini Python projects. Each project is a standalone script that demonstrates different aspects of Python programming.

naive-bayes-classifier icon naive-bayes-classifier

This repository contains a Python implementation of the Naive Bayes classifier. The classifier is trained on a collection of documents and can predict the class of new documents based on their word features.

performance-analysis-top-k-frequent-words icon performance-analysis-top-k-frequent-words

This project measures the performance of different text processing algorithms such as sorting, maxHeap, and bucketSort. It provides insights into the runtime, CPU usage, and memory usage of these algorithms when applied to tokenizing and processing text data.

random-forest icon random-forest

This repository contains a Python implementation of the Random Forest Regressor and Classifier. Random Forest is an ensemble learning method that combines multiple decision trees to make predictions. It is a powerful and widely used machine learning algorithm that can be applied to both regression and classification tasks.

regression icon regression

This repository contains a Python implementation of linear regression, logistic regression, and ridge regression algorithms. These algorithms are commonly used in machine learning and statistical modeling for various tasks such as predicting numerical values, classifying data into categories, and handling multicollinearity in regression models.

search-application icon search-application

This repository contains a search application implemented in Python that allows you to search for specific terms within a collection of text files. The search application offers three different search algorithms: linear search, indexed search, and hashtable-based search.

tfidf-text-summarization icon tfidf-text-summarization

This repository contains Python scripts for performing TF-IDF (Term Frequency-Inverse Document Frequency) based text summarization. TF-IDF is a widely used technique in natural language processing and information retrieval to identify the most important words or phrases in a document collection.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.