Coder Social home page Coder Social logo

nlp-practicum2021's Introduction

Machine Learning Guild - NLP Practicum

Textbook:

LESSONS

0. Configuration (Pre-work)

  • Topics: course overview, git bash, python config.ini files, conda virtual environments
  • Technology: git bash, configparser, conda
  • Homework: use the command line to search data among 1000's of server configuration files

1. Text Extraction

  • Topics: Extract text from docx, pdf, and image files
  • Technology: docx, PyPDF2, pdfminer.six, subprocess, pytesseract
  • Homework: structure the annual reports into sections
  • Supplementary Material: watch lesson_databases videos

2. Text Preprocessing

  • Topics: POS tagging, dependency parsing, rule-based matching, phrase dectection
  • Technology: SpaCy, gensim
  • Prework: Read section 2.1-2.4 SLP and/or 2.1-2.5 SLP videos , section 8.1-8.3 SLP, and chapter 5 Collocations
  • Supplementary Material: watch lesson_automation videos

3. Text Vectorization (count-based methods)

  • Topics: vector space model, TFIDF, BM25, Co-occurance matrix
  • Technology: scikit-learn
  • Prework: Read section 6.1-6.6 SLP
  • Supplementary Material: watch lesson_object_oriented_python

4. Dimensionality Reduction

  • Topics: PCA, latent semantic indexing (LSI), latent dirichlet allocation(LDA), topic coherence metrics
  • Technology: scikit-learn, gensim
  • Prework: Read TamingTextwiththeSVD

5. Word Embeddings

6. Deep Learning for NLP 1

7. Deep Learning for NLP 2

8. Text Similarity

  • Topics: cosine similarity, distance metrics, l1 and l2 norm, recommendation engines
  • Technology: scikit-learn, SpaCy, gensim
  • Prework: Read section 2.5 SLP and/or 2.1-2.5 SLP videos

SUPPLEMENTARY MATERIAL

Automation

  • Topics: automate the process to collect data from https://www.annualreports.com
  • Technology: requests, Jupyter Notebooks, BeautifulSoup, Scrapy
  • Homework: automate the process to identify and download company 10-K annual reports

Databases

  • Topics: use sqlalchemy to create and populate a database, locally and on AWS
  • Technology: sqlalchemy, sqllite, AWS RDS (MySQL)
  • Homework: create and populate a database with sqlalchemy

Object Oriented Python

  • Topics: reconstruct scikit-learn's CountVectorizer codebase
  • Technology: scikit-learn, object oriented Python

nlp-practicum2021's People

Contributors

anhvinhdoanvo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.