Yuhsin Wang's Projects
This repository contains two Python files that implement the Adaboost and Gradient Boosting algorithms. These algorithms are popular ensemble methods used in machine learning for both classification (Adaboost) and regression (Gradient Boosting) tasks.
Alumni Profile Matching is a project aimed at facilitating networking between graduate students and alumni with similar backgrounds and career goals. By leveraging machine learning techniques and data processing pipelines, the project aims to provide graduate students with personalized recommendations of alumni profiles to connect with.
This repository contains a set of Python scripts that allow you to convert CSV (Comma-Separated Values) files to different formats such as HTML, JSON, and XML.
This repository contains a Python implementation of a decision tree algorithm. The decision tree is a popular machine learning algorithm used for both classification and regression tasks. This implementation provides classes for building decision trees for classification and regression purposes.
Feature importance refers to a measure of how important each feature/variable is in a dataset to the target variable or the model performance. It can be used to understand the relationships between variables and can also be used for feature selection to optimize the performance of machine learning models.
A HashTable is a Python class that implements a basic hash table data structure. A hash table, also known as a hash map, is a data structure that provides efficient storage and retrieval of key-value pairs. It is commonly used when there is a need for fast lookup of values based on a given key.
This repository contains a Python implementation of the K-Means algorithm. The K-Means algorithm is an unsupervised machine learning algorithm used for clustering data points into groups or clusters. It is a popular algorithm for data analysis, pattern recognition, and image compression.
This repository contains two Python scripts for log analytics: kafka_producer.py and spark_stream.py. These scripts are designed to work together to process log data using Apache Kafka and Apache Spark.
This repository contains a Python script mf.py that implements Matrix Factorization for collaborative filtering. Collaborative filtering is a technique used in recommendation systems to predict user preferences by collecting information from many users. Matrix Factorization is one of the popular methods used in collaborative filtering.
This repository contains a collection of 6 mini Python projects. Each project is a standalone script that demonstrates different aspects of Python programming.
This repository contains a Python implementation of the Naive Bayes classifier. The classifier is trained on a collection of documents and can predict the class of new documents based on their word features.
This project measures the performance of different text processing algorithms such as sorting, maxHeap, and bucketSort. It provides insights into the runtime, CPU usage, and memory usage of these algorithms when applied to tokenizing and processing text data.
This repository contains a Python implementation of the Random Forest Regressor and Classifier. Random Forest is an ensemble learning method that combines multiple decision trees to make predictions. It is a powerful and widely used machine learning algorithm that can be applied to both regression and classification tasks.
This repository contains a Python implementation of linear regression, logistic regression, and ridge regression algorithms. These algorithms are commonly used in machine learning and statistical modeling for various tasks such as predicting numerical values, classifying data into categories, and handling multicollinearity in regression models.
This repository contains a search application implemented in Python that allows you to search for specific terms within a collection of text files. The search application offers three different search algorithms: linear search, indexed search, and hashtable-based search.
This repository contains Python scripts for performing TF-IDF (Term Frequency-Inverse Document Frequency) based text summarization. TF-IDF is a widely used technique in natural language processing and information retrieval to identify the most important words or phrases in a document collection.