Coder Social home page Coder Social logo

nlp-classify-bank-transactions's Introduction

NLP Experiments: How much do I spend on groceries?

What started as an intro to learning SpaCy has turned into a comparison of text classification methods on my bank transactions.

Not happy with how my bank was attempting to classify transactions I thought I'd do it myself. I want to classify my transactions to figure out how much we spend on groceries. It is a variable amount at a variable frequency. So the only signal we have is the transaction description which is also quite variable as we grocery shop at the closest shopping centre relative to where we may be heading for other errands so these descriptions change over time too.

Turns out this is quite hard since bank transactions aren't full sentences, have very little context, words often aren't full words but concatenations of abbreviated words.

  • So existing language models are useless.
  • Word2vec has a lot of Out Of Vocabulary (OOV) issues
  • Keyword matching on vocab seems to be useful.
  • Similar subwords should be predictors even when concatenated with a brand; eg "chemist", "chem", "pharmacy", "pharma", "pharm"

So I want to try and build a classifier, that uses word2vec to approximate similar words into concept vectors, but the subword vectors can help approximate those words in something like "PRICELINEPHARAMA".

fastText has made a lot of progress in this space, but SpaCy has forked fastText and created floret for subword embeddings.

Getting started

Download example data into data/

Then setup your dev environment:

python3 tasks.py init && . ./.venv/bin/activate
invoke lab

This will open jupyter-lab where experiments can be found in notebooks/

Project Lifecycle Tasks

 inv --list
Available tasks:

  format    Autoformat code, notebooks and sort imports.
  lint      Linting and fomatting checks for quality control.
  test      Run test suite with dependency on lint task running first.
  lab       Launch jupyter lab instance.
  publish   Clean, format and run all notebooks.

TODO

References:

nlp-classify-bank-transactions's People

Contributors

neozenith avatar

Stargazers

 avatar Nathan Gold avatar

Watchers

 avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.