Coder Social home page Coder Social logo

aakashks / gmail_organizer Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 7.27 MB

Organize your mailbox using this ML tool

Jupyter Notebook 70.07% Python 29.93%
classification machine-learning python text-classification gmail-api mailbox-automation restful-api

gmail_organizer's Introduction

GMail Organizer

MultiOutput Classification of Gmails

A menu based command-line application that automatically applies labels in the user's Gmail using classification.

Description

It uses Gmail API for reading and modifying gmail messages. Support Vector Classifier is used to perform the classification (with OvR), although some other classifiers like RandomForestClassifier also work fine. For Natural Language Processing of email text, I have used term frequency - inverse document frequency (TF-IDF) Vectorizer for vectorizing the corpus. It provides the advantage that

a high weight of the tf-idf calculation is reached when we have a high term frequency(tf) in the given document and a low document frequency of the term in the whole collection.

The goal of developing this application was to let users quickly sort their emails, thus saving them time. The model evaluation and other details are described in detail in this notebook.

Usage

The application can be set up very quickly, with the only tricky part being the Gmail API key generation. For usage, please see the user guide

Challenges faced

I tried to use KMeans Clustering to cluster the email, but the performance was poor. Due to this reason, I also discarded my plan to use semi-supervised ML.

All the work I did to attempt to use different ML techniques is described in the jupyter notebooks.

Also, as the size of the training dataset was small, I avoided using complex NLP and classification techniques as that wouldn't yield a huge improvement in performance.

Limitations

The models were trained on a small dataset of ~1200 emails I labeled manually. I did not label every mail category, but only a few categories that seemed important to me. As a result, the classifier also doesn't label many emails (as most of them are similar to those which were not important to me)

The whole application has limited features because of time constraints.

Data Privacy

Your emails or personal data, whenever used, is always stored on your local storage only.
Never upload sensitive files, especially credentials.json, token.json, and your Gmail API key.

References

GMail API reference
Rich library for colorful display
TFIDF vectorizer

gmail_organizer's People

Contributors

aakashks avatar

Stargazers

James Cuzella avatar Paul F avatar

Watchers

 avatar

gmail_organizer's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.