Coder Social home page Coder Social logo

Comments (6)

elad661 avatar elad661 commented on July 28, 2024

I've made something with python and it seems to be working, but I guess you'd want to re-write it in ruby if you want to include this in twitter_ebooks

Here's the code if anyone is interested:

#!/usr/bin/env python3
# coding=utf8
import argparse
import json
import os
import os.path
from operator import itemgetter


def main():
    parser = argparse.ArgumentParser(description='Parse twitter-generated tweet archive to a format twitter_ebooks can understand')
    parser.add_argument('path', metavar='path', type=str,
                        help='path to the archive')
    args = parser.parse_args()
    args.path = os.path.expanduser(args.path)
    tweets_dir = os.path.join(args.path, 'data', 'js', 'tweets')
    all_tweets = []
    for month in sorted(os.listdir(tweets_dir)):
        with open(os.path.join(tweets_dir, month), 'r') as f:
            contents = f.read()
            if not contents.startswith('['):
                # Remove js variable assignment line, if exists
                contents = contents[contents.index('\n')+1:]
            this_month_tweets = json.loads(contents)
            for tweet in this_month_tweets:
                if 'retweeted_status' not in tweet:  # remove retweets
                    all_tweets.append(tweet)

    # Sort approximately the same way `ebooks archive` would sort
    # (close enough, at least)
    all_tweets.sort(key=itemgetter('created_at'))
    all_tweets.reverse()

    # Write the result
    with open('archive.json', 'w') as f:
        json.dump(all_tweets, f, ensure_ascii=False, indent=2)

if __name__ == "__main__":
    main()

from twitter_ebooks.

felinira avatar felinira commented on July 28, 2024

It crashes for me when wanting to consume the archive. Any idea how I can find out why? It works for another account.

└[~/var/ebooks/farthen_ebooks]> ebooks consume corpus/farthen.json
Reading json corpus from corpus/farthen.json
Removing commented lines and sorting mentions
[1] 22214 terminated ebooks consume corpus/farthen.json

from twitter_ebooks.

elad661 avatar elad661 commented on July 28, 2024

I don't know, I only tried it on one account and it worked.

from twitter_ebooks.

daveschumaker avatar daveschumaker commented on July 28, 2024

twitter_ebooks should actually be able to read the csv file that's included with your Twitter archive. Copy the tweets.csv file into whatever folder you're working in (or just make sure you properly point to it) and then run: ebooks consume tweets.csv

Tada! No need to try and parse all the individual months that are found in the /data/js/tweets/ directory of your Twitter archive.

from twitter_ebooks.

brighid avatar brighid commented on July 28, 2024

@RockBandit's solution worked for me. Also: the README.txt that comes with your Twitter archive suggests "To consume the export in a generic JSON parser in any language, strip the first and last lines of each file." So you can push each of the js files through cat myfile | sed $d | perl -ne 'print if $. != 1' > newfile and wind up with a usable JSON file.

from twitter_ebooks.

 avatar commented on July 28, 2024

As rockbandit notes, you can consume the csv from twitter archives directly. If you want to convert the csv file to json, "ebooks jsonify" in 3.0.9 will do that too :)

from twitter_ebooks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.