It would be awesome if there was a built in utility to convert twitter's archive .js f

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Utility to convert twitter archive .js files to twitter_ebooks compatible json about twitter_ebooks HOT 6 CLOSED

mispy-archive commented on July 28, 2024

Utility to convert twitter archive .js files to twitter_ebooks compatible json

from twitter_ebooks.

Comments (6)

elad661 commented on July 28, 2024

I've made something with python and it seems to be working, but I guess you'd want to re-write it in ruby if you want to include this in twitter_ebooks

Here's the code if anyone is interested:

#!/usr/bin/env python3
# coding=utf8
import argparse
import json
import os
import os.path
from operator import itemgetter


def main():
    parser = argparse.ArgumentParser(description='Parse twitter-generated tweet archive to a format twitter_ebooks can understand')
    parser.add_argument('path', metavar='path', type=str,
                        help='path to the archive')
    args = parser.parse_args()
    args.path = os.path.expanduser(args.path)
    tweets_dir = os.path.join(args.path, 'data', 'js', 'tweets')
    all_tweets = []
    for month in sorted(os.listdir(tweets_dir)):
        with open(os.path.join(tweets_dir, month), 'r') as f:
            contents = f.read()
            if not contents.startswith('['):
                # Remove js variable assignment line, if exists
                contents = contents[contents.index('\n')+1:]
            this_month_tweets = json.loads(contents)
            for tweet in this_month_tweets:
                if 'retweeted_status' not in tweet:  # remove retweets
                    all_tweets.append(tweet)

    # Sort approximately the same way `ebooks archive` would sort
    # (close enough, at least)
    all_tweets.sort(key=itemgetter('created_at'))
    all_tweets.reverse()

    # Write the result
    with open('archive.json', 'w') as f:
        json.dump(all_tweets, f, ensure_ascii=False, indent=2)

if __name__ == "__main__":
    main()

from twitter_ebooks.

felinira commented on July 28, 2024

It crashes for me when wanting to consume the archive. Any idea how I can find out why? It works for another account.

└[~/var/ebooks/farthen_ebooks]> ebooks consume corpus/farthen.json
Reading json corpus from corpus/farthen.json
Removing commented lines and sorting mentions
[1] 22214 terminated ebooks consume corpus/farthen.json

from twitter_ebooks.

elad661 commented on July 28, 2024

I don't know, I only tried it on one account and it worked.

from twitter_ebooks.

daveschumaker commented on July 28, 2024

twitter_ebooks should actually be able to read the csv file that's included with your Twitter archive. Copy the tweets.csv file into whatever folder you're working in (or just make sure you properly point to it) and then run: ebooks consume tweets.csv

Tada! No need to try and parse all the individual months that are found in the /data/js/tweets/ directory of your Twitter archive.

from twitter_ebooks.

brighid commented on July 28, 2024

@RockBandit's solution worked for me. Also: the README.txt that comes with your Twitter archive suggests "To consume the export in a generic JSON parser in any language, strip the first and last lines of each file." So you can push each of the js files through cat myfile | sed $d | perl -ne 'print if $. != 1' > newfile and wind up with a usable JSON file.

from twitter_ebooks.

commented on July 28, 2024

As rockbandit notes, you can consume the csv from twitter archives directly. If you want to convert the csv file to json, "ebooks jsonify" in 3.0.9 will do that too :)

from twitter_ebooks.

Utility to convert twitter archive .js files to twitter_ebooks compatible json about twitter_ebooks HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent