Coder Social home page Coder Social logo

saturiotdv / twitterclassification Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hazelbasil/twitterclassification

0.0 1.0 0.0 2.1 MB

Python code that cleans, collects, and prepares Twitter user profile classification

Python 61.79% Jupyter Notebook 38.21%

twitterclassification's Introduction

TwitterClassification

TwitterClassification is a set of scripts and methods that collect and analyze Twitter user data. This repository includes descipriton for a Twitter user sampling method that specifically helps filter outs institutional users (mostly media, journalism professionals). The further description of the methods and the background of the study are found in

K. Hazel Kwon, J. Hunter Priniski, & Monica Chadha (2018). Disentangling user samples: A machine learning approach to proxy-population mismatch in Twitter research. Communication Methods and Measures. Online before print: http://www.tandfonline.com/doi/full/10.1080/19312458.2018.1430755

CITATION: Please cite the journal publication above when using this repository.

Files:

  1. 18CMM: The method paper (published)
  2. ManuallyLabeled_All.csv : The csv file includes 8945 user profile description that human coders labeled as either general public user profile (1) or not. The profile texts were collected using StreamAPI in the context of Mesa Shooting in Metro-Phoenix area (2015); Boston Bombing (2013); Bruessls Airport Bombing (2016); Quebect Mosque Shooting (2017); and the random sampling of profile texts in 2017.

Dependicies:

  1. Twython
  2. Scikit-learn : bag-of-words model, Random Forest Model.
  3. Natural Language Toolkit: Snowball stemmer, English Stop-words dictionary

Scripts:

  1. get_twitter_user_descriptions.py : collect information on up to 300 Twitter User Descriptions outputted to a .csv file.

  2. get_user_information.py: General version of getTwitterUserDescriptions.py: collect more data than just descriptions and timezone information (which is the functionality of get_Twitter_User_Descriptions.py).

  3. account_classification.py: Classes and methods that implement supervised machine learning protocols that classify Twitter profiles as whether a profile is a general public user account or not (such as a business', public figure's, news reporter's, promotional accounts). For references on how to use this code, see below. Also, references the example.txt document.

  4. concreteness.py. To understand the concreteness of a tweet (how concrete of language a user employs in their tweets gives us a proximal coefficient, which informs us about how close the tweeter is to the event). This function will calculate the average concreteness rating of a sentence (total concreteness score divided by the amount of concrete words). The concreteness ratings are taken from Marc Brysbaert, Amy Beth Warriner, and Victor Kuperman's work (2014) found here (http://crr.ugent.be/papers/Brysbaert_Warriner_Kuperman_BRM_Concreteness_ratings.pdf)

Additional information:

To use account_classification.py, the data must be structured as bellow.

training data:

user_ID'\t'Description'\t'Personal some id'\t'some description'\t'1 some id'\t'some description'\t'0 some id'\t'some description'\t'1 ...

testing data:

user_ID'\t'Description
some id'\t'some description
some id'\t'some description
some id'\t'some description

Where 1 = general public users, 0 = others (mainly media professional, inistitutional, and promotional accounts). user_ID can either be a string or an int data type. Furthermore, both the training and testing data should be saved as a .txt file with a tab ('\t') separator. **Make sure all data is saved in utf-8 format. To encode entire data frames we recommend using R.

twitterclassification's People

Contributors

hazelbasil avatar jpriniski avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.