Kaggle Machine Learning competition for Home Depot dataset
Competition link: https://www.kaggle.com/c/home-depot-product-search-relevance
- Full search text match
- Unigram search text match
- Combination of full search text and unigram search text match
- Synonym search using gensim
- Bigram search text match
- LDA topic model using NLTK
- Read data from mongoDB
- Tokenization using NLTK
- Using feature engineering to define similarity calculator for each model
- Train RandomForestRegressor & RandomForestClassifier model using PySpark
- Evaluate and make prediction using the model
- Store the prediction result in mongoDB