Coder Social home page Coder Social logo

girsigit / tweets-collector Goto Github PK

View Code? Open in Web Editor NEW

This project forked from motazsaad/tweets-collector

0.0 0.0 0.0 912 KB

Collect tweets (tweets corpus) using Twitter API. Collection can be based on hashtags, keywords, geographical location

License: Apache License 2.0

Python 100.00%

tweets-collector's Introduction

tweets-collector

Collect tweets (tweets corpus) using Twitter API.

Collection can be based on hashtags, keywords, geographical location.

install requirements

pip install -r requirements.txt

Getting your API keys from Twitter

  1. Go to https://apps.twitter.com and create an new app twitter apps

  2. Provide a name and describe for the app, then specify permissions app info

  3. Then go to keys and access management tab app keys

  4. put these info in credentials.txt and in api_keys.py files.

query_tweets.py Usage

usage: query_tweets.py [-h] -k KEYWORDS_FILE -o OUTFILE -n NUMBER

collect tweets based on keywords

optional arguments:
  -h, --help            show this help message and exit
  -k KEYWORDS_FILE, --keywords-file KEYWORDS_FILE
                        keywords or hashtags file. The file should contain one
                        keyword/hashtag per line
  -o OUTFILE, --outfile OUTFILE
                        the output json file path and prefix.
  -n NUMBER, --number NUMBER
                        the number of tweets that you want to collect


json2text.py Usage

usage: json2text.py [-h] -i JSON_DIR -o OUT_DIR [--exclude-redundant]
                    [--include-id] [-n] [--remove-repeated-letters]
                    [--keep-only-arabic]

extract tweet texts from json

optional arguments:
  -h, --help            show this help message and exit
  -i JSON_DIR, --json-dir JSON_DIR
                        tweets json directory
  -o OUT_DIR, --out-dir OUT_DIR
                        the output directory.
  --exclude-redundant   exclude redundant tweets
  --include-id          include tweet id
  -n, --normalize       normalize text
  --remove-repeated-letters
                        removed repeated letters (+2 consecutive) from text
  --keep-only-arabic    only keep Arabic words

stream_geolocation.py Usage

Get Geo locations from http://boundingbox.klokantech.com/

usage: stream_geolocation.py [-h] -l GEO_LOCATIONS -j JSON -n NUMBER

collect tweets based on geographic location

optional arguments:
  -h, --help            show this help message and exit
  -l GEO_LOCATIONS, --geo-locations GEO_LOCATIONS
                        geo location coordinates from
                        http://boundingbox.klokantech.com copy and past using 
                        csv option
  -j JSON, --json JSON  the the json output file.
  -n NUMBER, --number NUMBER
                        the number of tweets that you want to collect


stream_users.py Usage

Get users id from https://tweeterid.com

usage: stream_users.py [-h] -u USERS -j JSON -n NUMBER

collect tweets based on following twitter users

optional arguments:
  -h, --help            show this help message and exit
  -u USERS, --users USERS
                        twitter user ids file. Get ids from tweeterid.com
  -j JSON, --json JSON  the the json output file.
  -n NUMBER, --number NUMBER
                        the number of tweets that you want to collect

user_tweets_history.py Usage

get the most recent tweets of a user

usage: user_tweets_history.py [-h] -u USER

emoji list

positive/negative emoji list is obtained from https://emojipedia.org/

Sentiment Analysis in Arabic tweets

Please check the article https://mksaad.wordpress.com/2018/12/07/sentiment-analysis-in-arabic-tweets-with-python/

tweets-collector's People

Contributors

motazsaad avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.