Coder Social home page Coder Social logo

mkhoin / text-mining Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rstudio-conf-2020/text-mining

0.0 0.0 0.0 19.9 MB

Text Mining with Tidy Data Principles โœจ ๐Ÿ“–โœจ

License: Creative Commons Attribution Share Alike 4.0 International

CSS 8.44% HTML 91.56%

text-mining's Introduction

Text Mining with Tidy Data Principles Workshop

rstudio::conf 2020

by Julia Silge


๐Ÿ—“๏ธ January 27 and 28, 2020
โฐ 09:00 - 17:00
๐Ÿจ Franciscan Rooms A-B (Ballroom Level)
โœ๏ธ bit.ly/silge-rstudioconf-1 and bit.ly/silge-rstudioconf-2


Overview

Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight? Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be analyzed within the tidyverse ecosystem? Do you need a flexible framework for handling text data that allows you to engage in tasks from exploratory data analysis to supervised predictive modeling?

Text data is increasingly important in many domains, and tidy data principles and tidy tools can make text mining easier and more effective. In this workshop, learn how to manipulate, summarize, and visualize the characteristics of text using these methods and R packages from the tidy tool ecosystem. These tools are highly effective for many analytical questions and allow analysts to integrate natural language processing into effective workflows already in wide use. Explore how to implement approaches such as sentiment analysis of texts, measuring tf-idf, network analysis of words, and building both supervised and unsupervised text models.

Learning objectives

At the end of this course, participants will understand how to:

  • Perform exploratory data analyses of text datasets, including summarization and data visualization
  • Understand and implement both tf-idf and sentiment analysis
  • Build classification models for text using tidy data principles

Is this course for me?

This course will be appropriate for you if you answer yes to these questions:

  • Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight?
  • Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be analyzed within the tidyverse ecosystem?
  • Do you need a flexible framework for handling text data that allows you to engage in tasks from exploratory data analysis to supervised predictive modeling?

Prework

During this workshop, we'll share code and slides via a GitHub repo and code interactively together using an RStudio Cloud project. You can log in to RStudio Cloud via Google credentials, GitHub credentials, or email. Go ahead and log in with your choice of method before we meet in person so you see what the platform looks like. Bring your laptop to the course and we'll explore text mining together!

Schedule

Day 1

Time Activity
09:00 - 10:30 Tidying text
10:30 - 11:00 Coffee break
11:00 - 12:30 Sentiment analysis
12:30 - 13:30 Lunch break
13:30 - 15:00 tf-idf
15:00 - 15:30 Coffee break
15:30 - 17:00 N-grams, and wrap-up!

Day 2

Time Activity
09:00 - 10:30 Topic modeling
10:30 - 11:00 Coffee break
11:00 - 12:30 Finish up topic modeling
12:30 - 13:30 Lunch break
13:30 - 15:00 Text classification
15:00 - 15:30 Coffee break
15:30 - 17:00 Finish text classification, and wrap-up! โœจ

Instructor

Julia Silge is a data scientist and software engineer. She is both an international keynote speaker and a real-world practitioner focusing on data analysis and machine learning practice. She is an author of Text Mining with R with her coauthor David Robinson and has a PhD in astrophysics. She loves making beautiful charts and communicating about technical topics with diverse audiences.


This work is licensed under a Creative Commons Attribution 4.0 International License.

text-mining's People

Contributors

juliasilge avatar mine-cetinkaya-rundel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.