wangyuhsin Goto Github PK

followers: 1.0 following: 0.0 repos: 19.0 gists: 0.0

Name: Yuhsin Wang

Type: User

Company: University of San Francisco

Bio: ✨ A passionate Data Scientist

Location: San Francisco, CA

Hi 👋, I'm Yu Hsin

✨ A passionate Data Scientist

🎓 Data Science Graduate @ University of San Francisco

💼 Currently working as a Data Science intern @ Salk Institute

🤖 Machine Learning: Deep Learning, Regression, Decision Tree, Clustering, Gradient Boosting, XGBoost, Random Forest

⚒ MLOps: MLflow, Weights & Biases, DVC, Great Expectations, Metaflow, Airflow, Evidently, Streamlit

📫 How to reach me: You can message me on LinkedIn or by Email

Languages and Tools

Yuhsin Wang's Projects

adaboost-and-gradient-boosting

This repository contains two Python files that implement the Adaboost and Gradient Boosting algorithms. These algorithms are popular ensemble methods used in machine learning for both classification (Adaboost) and regression (Gradient Boosting) tasks.

alumni-profile-matching

Alumni Profile Matching is a project aimed at facilitating networking between graduate students and alumni with similar backgrounds and career goals. By leveraging machine learning techniques and data processing pipelines, the project aims to provide graduate students with personalized recommendations of alumni profiles to connect with.

data-translation-pipeline

This repository contains a set of Python scripts that allow you to convert CSV (Comma-Separated Values) files to different formats such as HTML, JSON, and XML.

decision-tree

This repository contains a Python implementation of a decision tree algorithm. The decision tree is a popular machine learning algorithm used for both classification and regression tasks. This implementation provides classes for building decision trees for classification and regression purposes.

feature-importance-and-selection

Feature importance refers to a measure of how important each feature/variable is in a dataset to the target variable or the model performance. It can be used to understand the relationships between variables and can also be used for feature selection to optimize the performance of machine learning models.

food-recognition-recipe-app

hashtable

A HashTable is a Python class that implements a basic hash table data structure. A hash table, also known as a hash map, is a data structure that provides efficient storage and retrieval of key-value pairs. It is commonly used when there is a need for fast lookup of values based on a given key.

kmeans-algorithm

This repository contains a Python implementation of the K-Means algorithm. The K-Means algorithm is an unsupervised machine learning algorithm used for clustering data points into groups or clusters. It is a popular algorithm for data analysis, pattern recognition, and image compression.

log_analytics

This repository contains two Python scripts for log analytics: kafka_producer.py and spark_stream.py. These scripts are designed to work together to process log data using Apache Kafka and Apache Spark.

matrix-factorization

This repository contains a Python script mf.py that implements Matrix Factorization for collaborative filtering. Collaborative filtering is a technique used in recommendation systems to predict user preferences by collecting information from many users. Matrix Factorization is one of the popular methods used in collaborative filtering.

mini-python-projects

This repository contains a collection of 6 mini Python projects. Each project is a standalone script that demonstrates different aspects of Python programming.

naive-bayes-classifier

This repository contains a Python implementation of the Naive Bayes classifier. The classifier is trained on a collection of documents and can predict the class of new documents based on their word features.

performance-analysis-top-k-frequent-words

This project measures the performance of different text processing algorithms such as sorting, maxHeap, and bucketSort. It provides insights into the runtime, CPU usage, and memory usage of these algorithms when applied to tokenizing and processing text data.

random-forest

This repository contains a Python implementation of the Random Forest Regressor and Classifier. Random Forest is an ensemble learning method that combines multiple decision trees to make predictions. It is a powerful and widely used machine learning algorithm that can be applied to both regression and classification tasks.

regression

This repository contains a Python implementation of linear regression, logistic regression, and ridge regression algorithms. These algorithms are commonly used in machine learning and statistical modeling for various tasks such as predicting numerical values, classifying data into categories, and handling multicollinearity in regression models.

search-application

This repository contains a search application implemented in Python that allows you to search for specific terms within a collection of text files. The search application offers three different search algorithms: linear search, indexed search, and hashtable-based search.

tfidf-text-summarization

This repository contains Python scripts for performing TF-IDF (Term Frequency-Inverse Document Frequency) based text summarization. TF-IDF is a widely used technique in natural language processing and information retrieval to identify the most important words or phrases in a document collection.

wangyuhsin Goto Github PK

Hi 👋, I'm Yu Hsin

Languages and Tools

Yuhsin Wang's Projects

Recommend Projects

Recommend Topics

Recommend Org