Comments (6)
I've made something with python and it seems to be working, but I guess you'd want to re-write it in ruby if you want to include this in twitter_ebooks
Here's the code if anyone is interested:
#!/usr/bin/env python3
# coding=utf8
import argparse
import json
import os
import os.path
from operator import itemgetter
def main():
parser = argparse.ArgumentParser(description='Parse twitter-generated tweet archive to a format twitter_ebooks can understand')
parser.add_argument('path', metavar='path', type=str,
help='path to the archive')
args = parser.parse_args()
args.path = os.path.expanduser(args.path)
tweets_dir = os.path.join(args.path, 'data', 'js', 'tweets')
all_tweets = []
for month in sorted(os.listdir(tweets_dir)):
with open(os.path.join(tweets_dir, month), 'r') as f:
contents = f.read()
if not contents.startswith('['):
# Remove js variable assignment line, if exists
contents = contents[contents.index('\n')+1:]
this_month_tweets = json.loads(contents)
for tweet in this_month_tweets:
if 'retweeted_status' not in tweet: # remove retweets
all_tweets.append(tweet)
# Sort approximately the same way `ebooks archive` would sort
# (close enough, at least)
all_tweets.sort(key=itemgetter('created_at'))
all_tweets.reverse()
# Write the result
with open('archive.json', 'w') as f:
json.dump(all_tweets, f, ensure_ascii=False, indent=2)
if __name__ == "__main__":
main()
from twitter_ebooks.
It crashes for me when wanting to consume the archive. Any idea how I can find out why? It works for another account.
└[~/var/ebooks/farthen_ebooks]> ebooks consume corpus/farthen.json
Reading json corpus from corpus/farthen.json
Removing commented lines and sorting mentions
[1] 22214 terminated ebooks consume corpus/farthen.json
from twitter_ebooks.
I don't know, I only tried it on one account and it worked.
from twitter_ebooks.
twitter_ebooks should actually be able to read the csv file that's included with your Twitter archive. Copy the tweets.csv file into whatever folder you're working in (or just make sure you properly point to it) and then run: ebooks consume tweets.csv
Tada! No need to try and parse all the individual months that are found in the /data/js/tweets/ directory of your Twitter archive.
from twitter_ebooks.
@RockBandit's solution worked for me. Also: the README.txt
that comes with your Twitter archive suggests "To consume the export in a generic JSON parser in any language, strip the first and last lines of each file." So you can push each of the js files through cat myfile | sed $d | perl -ne 'print if $. != 1' > newfile
and wind up with a usable JSON file.
from twitter_ebooks.
As rockbandit notes, you can consume the csv from twitter archives directly. If you want to convert the csv file to json, "ebooks jsonify" in 3.0.9 will do that too :)
from twitter_ebooks.
Related Issues (20)
- Stopword = never tweet? HOT 2
- JSON exception when archiving HOT 2
- Bot is building sentences but not tweeting HOT 2
- Mention reply breaks Twitter limit HOT 3
- pictweet() causes TypeError HOT 3
- Heroku-specific error HOT 1
- http and https always top keyword list HOT 3
- Bot stops reacting to mentions and messages HOT 6
- Problem when trying to tweet with mentions enabled HOT 2
- Any way to consume multiple files to the same model? HOT 3
- json HOT 1
- cannot determine timezone from nil (ebooks start) HOT 1
- Error occurring upon running `ebooks start`
- Add ability to archive tweets using hashtag
- Unable to auth when running Ebooks in Supervisor
- Twitter:Error:Unauthorized HOT 1
- Fobidden words HOT 1
- cant able to stop ebooks if running in backgroud HOT 1
- requires ruby-dev but does not say so
- Twitter::Error::Unauthorized HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from twitter_ebooks.