Coder Social home page Coder Social logo

tweets-analyzer's Introduction

Tweets-Analyzer

Tweets Dataset

This repository include three datasets under tweets folder, each of them is a txt file.

  • date=2022-08-29/hour=20/tweets.txt: contains 51,676 tweets
  • date=2022-09-01/hour=10/tweets.txt: contains 50,033 tweets
  • date=2022-09-02/hour=10/tweets.txt: contains 889 tweets

Sample format of tweets.txt

[
    {
        "id": "1565655773933457408",
        "author_id": "1510724946657517580",
        "text": "RT @dos_xyz: Hello Solana, We are pleased to announce our Hackathon Submission: DreamOS \ud83d\udcab\n\nA new user experience for crypto - it is the fir\u2026",
        "created_at": "2022-09-02T10:59:59.000Z"
    },
    {
        "id": "1565655773488812033",
        "author_id": "70647170",
        "text": "RT @joshtokitaaaaa: HELLO P-POP KINGS!! \ud83d\udc51\n\n@SB19Official #SB19\n#WYAT #WhereYouAtSB19 https://t.co/ZoXd2Ihhgi",
        "created_at": "2022-09-02T10:59:59.000Z"
    }
]

Usage

downloader.py

Download tweets from Twitter API and save them to local files (tweets/date=YYYY-MM-DD/hour=HH/tweets.txt).

python downloader.py -h
usage: downloader.py [-h] --query-type {keyword,user} --date DATE --hour HOUR --max-results MAX_RESULTS

## Example
# Download tweets use 'keyword' query type
python downloader.py --query-type keyword  --date 2022-08-30 --hour 10 --max-results 200
> keyword: hello

# Download tweets use 'user' query type
python downloader.py --query-type user  --date 2022-08-29 --hour 20 --max-results 200
> username: nasa

extractor.py

  • Extract and clean text from tweets.txt and save them to text.txt files.
  • Extract and count hashtags from tweets.txt and save them to hashtags_count.txt files.
  • Extract and count mentioned username from tweets.txt and save them to username_count.txt files.
python extractor.py -h
usage: extractor.py [-h] --date DATE --hour HOUR

python extractor.py --date 2022-09-01 --hour 10

Output

There are three output files for each tweets dataset.

  • text.txt: clean text of tweets
  • hashtags_count.txt: hashtags and their counts
  • username_count.txt: mentioned username and their counts

text.txt sample

"RT @dos_xyz: Hello Solana, We are pleased to announce our Hackathon Submission: DreamOS A new user experience for crypto - it is the fir"
"RT @joshtokitaaaaa: HELLO P-POP KINGS!! @SB19Official #SB19 #WYAT #WhereYouAtSB19 https://t.co/ZoXd2Ihhgi"
"RT @humansdotai: Hello, humans! Meet @lucian2k, our Head of User Product! #HumansOfHumansDotAI #Team #Blockchain #AI https://t.co/dq1z5Py"
"@GeorgeBeany978 Hello George, I've had a look on railcam data and it's a ballast machine Hope this helps. JASON https://t.co/ngobSTtM21"
"@DVTLJH96 Hello ang cute mo naman"

hashtags_count.txt sample

#HAPPYJKDAY 1119
#HappyBirthdayJungkook 1030
#JUNGKOOKDAY 768
#Happy_JungKook_Day๐Ÿ‡ 524
#TinyTAN 524

username_count.txt sample

@SamsungLevant 1558
@janusrose 502
@Genshin_7 349
@kevboucher 184
@DUALIPA 184
@caroandlace 178

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.