Coder Social home page Coder Social logo

portfolio's Introduction

Edmund's Data Science Portfolio

  • Developed models to classify if a given news article related to the USA Presidential Election 2016 is fake or reliable news. Positive class indicates fake news and negative class indicates reliable news.
  • Constructed the model based on the Kaggle Fake News Dataset which contains 20800 labeled news articles, of which about half are fake news articles and the other half are reliable news articles.
  • Inspected for inherent patterns using unsupervised learning (K-means clustering) that will discriminate the news articles between fake and reliable, without referencing to the given labels.
  • Conducted both Hashing Vectorisation and IF-IDF Vectorisation of the text data to determine which form of NLP Vectorisation produced to best performing model.
  • Tested Passive Aggressive Classifier, Logistic Regression and XGBoost Regression Classifier.
  • XGBoost Regression Classifier on TF-IDF Vectorisaton was the best performing model, with an accuracy of 0.9978 on Test Data.
  • Built a client facing UI using Flask API, WTForms and Jinja2.

XGBoost Classifier with TF-IFD Vectorisation (best performance)

Accuracy: 0.9978365

    precision    recall  f1-score   support

0       1.00      1.00      1.00      2046
1       1.00      1.00      1.00      2114

Predictor Interface

Predictor Interface

  • Developed an classifier to identify if a given cell is uninfected or parasitised by malaria based on an image of the cell.
  • Utilised the Malaria dataset from the Tensorflow Datasets package.
  • Constructed tensors representing the cell images
  • Trained a Deep Learning model using containing dense and convolutional neural network layers
  • Accuracy on test set: 0.9427
  • AUC Score on test set: 0.9843

Example

  • Developed an estimator for the month salary of data-related jobs in Singapore (MAE ~ S$1345) to help jobseekers in the field of data negotiate their income upon landing the job.
  • Scraped about 300 job descriptions from MyCareersFuture using Python and Selenium
  • Created features from the open-text job descriptions to quantify the value companies put on relevant technologies related to data science (AWS, Python, SQL, R, Tableau, Excel, Powerbi, Spark, Hadoop, Tensorflow).
  • Created models using common job parameters and the features from the job description using 5 regression models (Multiple Linear, Lasso, Random Forest, Gradient Boosting and Bootstrap Aggregation) tuned with GridsearchCV.
  • Built a client facing UI using Flask API, WTForms and Jinja2.

Predictor Interface

Predictor Interface

  • Developed a forecasting tool using Auto-ARIMA and FBProphet to forecast future closing prices of the S&P 500
  • Forecasting tool is trained from the past 5 years of closing prices.
  • Auto-ARIMA model has a Mean Absolute Percentage Error of 0.004912, which means that the model predicted the correct prices 99.5% of the time when tested against the validation period.

Predictor Interface

Predictor Interface

  • Developed a LSTM Deep Learning model to classfify the sentiment of a given movie review (positive/negative)
  • Built a vocabulary using Word2Vec Model to identify relationships between words.
  • Built a Sequential Deep Learning model containing Embedding, Dropout, Convolutional, Maxpooling, LSTM and dense layers.
  • Model Accuracy on Test Data: 0.904

portfolio's People

Contributors

edologgerbird avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.