Comments (9)
These lines:
2009-08-25 09:05:08 | 1.20 MB | swirlee | Crawler:More than Twitter cap of 3200 already in system, moving on.
2009-08-25 09:05:08 | 1.20 MB | swirlee | Crawler:238 in system; 6435 by owner
Tell me the crawler thinks you have 6,435 tweets in the system already but you don't. That's definitely a bug.
Try this: go into your instances table, and in the swirlee row, change the value for the 'total_tweets_in_system' from 6,435 to 236, and recrawl.
Now we have to figure out why it got screwed up.
This value gets set in the Instance->save method. It's a giant UPDATE statement, but the relevant part sets total_tweets_in_system = (select count(*) from tweets where author_user_id=".$i->twitter_user_id.")
If you run that update statement (and replace the twitter_user_id with swirlee's), does it set the value correctly?
from thinkup.
total_tweets_in_system in the instances table says 239 already (up a few since I made a few new tweets today). That matches the number the number returned by SELECT COUNT(*) FROM tweets WHERE author_user_id='49833'
.
mysql> SELECT twitter_username, total_tweets_by_owner, total_tweets_in_system
FROM instances WHERE twitter_username='swirlee';
+------------------+-----------------------+------------------------+
| twitter_username | total_tweets_by_owner | total_tweets_in_system |
+------------------+-----------------------+------------------------+
| swirlee | 6435 | 239 |
+------------------+-----------------------+------------------------+
mysql> SELECT COUNT(*) FROM tweets WHERE author_username='swirlee';
+----------+
| COUNT(*) |
+----------+
| 239 |
+----------+
from thinkup.
Ah-HA! Ok, this is awesome--a use case I haven't seen already. You have 6k+ tweets in Twitter, just not in the system, so the Crawler's test is incorrect.
In common/class.Crawler.php, change line 93 (after this comment):
//if you've got more than the Twitter API archive limit, stop looking for more tweets
if ( $this->owner_object->tweet_count >= $cfg->archive_limit ) {
to:
//if you've got more than the Twitter API archive limit, stop looking for more tweets
if ( $this->instance->total_tweets_in_system >= $cfg->archive_limit ) {
And give it a recrawl. That should fix it, lemme know how it goes!
from thinkup.
Nope, no change. Same numbers all around.
Here's the log:
2009-08-25 11:38:50 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/account/rate_limit_status.xml
2009-08-25 11:38:50 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:Parsing XML data from https://twitter.com/account/rate_limit_status.xml
2009-08-25 11:38:50 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:42 of 150 API calls left this hour; 23 for crawler until 12:05:05
2009-08-25 11:38:51 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/users/show/swirlee.xml
2009-08-25 11:38:51 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:41 of 150 API calls left this hour; 22 for crawler until 12:05:05
2009-08-25 11:38:51 | 1.16 MB | swirlee | Crawler:Owner info set.
2009-08-25 11:38:52 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&since_id=3536563209&
2009-08-25 11:38:52 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:40 of 150 API calls left this hour; 21 for crawler until 12:05:05
2009-08-25 11:38:52 | 1.16 MB | swirlee | Crawler:0 tweet(s) found and 0 saved
2009-08-25 11:38:52 | 1.16 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:38:58 | 1.55 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:38:58 | 1.55 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:39 of 150 API calls left this hour; 20 for crawler until 12:05:05
2009-08-25 11:38:59 | 2.59 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:38:59 | 2.59 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:01 | 2.98 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:01 | 2.98 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:38 of 150 API calls left this hour; 19 for crawler until 12:05:05
2009-08-25 11:39:01 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:01 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:04 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:04 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:37 of 150 API calls left this hour; 18 for crawler until 12:05:05
2009-08-25 11:39:04 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:04 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:07 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:07 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:36 of 150 API calls left this hour; 17 for crawler until 12:05:05
2009-08-25 11:39:08 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:08 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:10 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:10 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:35 of 150 API calls left this hour; 16 for crawler until 12:05:05
2009-08-25 11:39:10 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:10 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:12 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:12 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:34 of 150 API calls left this hour; 15 for crawler until 12:05:05
2009-08-25 11:39:13 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:13 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:14 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:14 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:33 of 150 API calls left this hour; 14 for crawler until 12:05:05
2009-08-25 11:39:14 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:14 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:17 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:17 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:32 of 150 API calls left this hour; 13 for crawler until 12:05:05
2009-08-25 11:39:18 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:18 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:19 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:19 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:31 of 150 API calls left this hour; 12 for crawler until 12:05:05
2009-08-25 11:39:21 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:21 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:26 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:26 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:30 of 150 API calls left this hour; 11 for crawler until 12:05:05
2009-08-25 11:39:26 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:26 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:27 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:27 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:29 of 150 API calls left this hour; 10 for crawler until 12:05:05
2009-08-25 11:39:29 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:29 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:30 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:30 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:28 of 150 API calls left this hour; 9 for crawler until 12:05:05
2009-08-25 11:39:31 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:31 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:33 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:33 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:27 of 150 API calls left this hour; 8 for crawler until 12:05:05
2009-08-25 11:39:33 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:33 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:35 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:35 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:26 of 150 API calls left this hour; 7 for crawler until 12:05:05
2009-08-25 11:39:35 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:35 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:37 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:37 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:25 of 150 API calls left this hour; 6 for crawler until 12:05:05
2009-08-25 11:39:37 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:37 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:39 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:39 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:24 of 150 API calls left this hour; 5 for crawler until 12:05:05
2009-08-25 11:39:39 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:39 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:40 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:40 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:23 of 150 API calls left this hour; 4 for crawler until 12:05:05
2009-08-25 11:39:41 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:41 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:42 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:42 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:22 of 150 API calls left this hour; 3 for crawler until 12:05:05
2009-08-25 11:39:43 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:43 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:44 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:44 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:21 of 150 API calls left this hour; 2 for crawler until 12:05:05
2009-08-25 11:39:45 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:45 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:47 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:47 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:20 of 150 API calls left this hour; 1 for crawler until 12:05:05
2009-08-25 11:39:48 | 2.73 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:48 | 2.73 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:49 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:49 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:19 of 150 API calls left this hour; 0 for crawler until 12:05:05
2009-08-25 11:39:50 | 2.73 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:50 | 2.73 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:Crawler API call limit exceeded.
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:158 friends in system, 244 friends according to Twitter; Friend archive is not loaded
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:Follower archive marked as loaded
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:New follower count is 604 and system has 592; 12 new follows to load
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:Fetching follows via IDs
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:0 stray replied-to tweets to load.
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:0 unloaded follower details to load.
2009-08-25 11:39:51 | 1.39 MB | swirlee | InstanceDAO:Updated swirlee's system status.
from thinkup.
Ok, thank you so much for helping me debug this.
Good news: the crawler no longer thinks Twitalytic has gotten 3200 tweets for you.
Bad news: you're caught in an infinite loop of just requesting the 1st page of tweets and never paging back.
That's because the if statement in common/class.Crawler.php line 57 is also using the same wrong conditional as above. Change
$this->owner_object->tweet_count < $cfg->archive_limit
to
$this->instance->total_tweets_in_system < $cfg->archive_limit
And try a recrawl. Hopefully this will do it. (So sorry for all this back and forth...)
from thinkup.
Now I'm getting these errors:
Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/common/class.Crawler.php on line 47
Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/common/class.Crawler.php on line 120
Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/common/class.Crawler.php on line 394
Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/common/class.Crawler.php on line 314
Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/common/class.Crawler.php on line 315
Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/crawler/crawl.php on line 47
It took an inordinate amount of time as well, 10-15 minutes.
Log:
2009-08-25 14:05:03 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/account/rate_limit_status.xml
2009-08-25 14:05:03 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:Parsing XML data from https://twitter.com/account/rate_limit_status.xml
2009-08-25 14:05:03 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:112 of 150 API calls left this hour; 87 for crawler until 14:40:37
2009-08-25 14:05:04 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/users/show/swirlee.xml
2009-08-25 14:05:04 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:111 of 150 API calls left this hour; 86 for crawler until 14:40:37
2009-08-25 14:05:04 | 1.16 MB | swirlee | Crawler:Owner info set.
2009-08-25 14:05:05 | 1.17 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&since_id=3540483581&
2009-08-25 14:05:05 | 1.17 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:110 of 150 API calls left this hour; 85 for crawler until 14:40:37
2009-08-25 14:05:05 | 1.17 MB | swirlee | Crawler:1 tweet(s) found and 1 saved
2009-08-25 14:05:05 | 1.17 MB | swirlee | Crawler:241 in system; 6437 by owner
2009-08-25 14:05:16 | 1.18 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&page=2&
2009-08-25 14:05:16 | 1.18 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:109 of 150 API calls left this hour; 84 for crawler until 14:40:37
2009-08-25 14:05:16 | 1.18 MB | swirlee | Crawler:0 tweet(s) found and 0 saved
2009-08-25 14:05:16 | 1.18 MB | swirlee | Crawler:241 in system; 6437 by owner
from thinkup.
I believe that is caused by Twitter downage. It's been intermittently out for me this afternoon too. Any time XML from the API comes back not well-formed (or anything other than what the crawler expects), Twitalytic gets wonky and it's because Twitter's down or throwing HTTP codes other than 200. Twitalytic needs to handle this more gracefully; it's difficult to reproduce Twitter's behavior when it's down for proper tests and I haven't spent the time I should on that bit.
Give it a few more hours and check later on. In theory, when things stabilize, we should be able to see if that fix worked.
from thinkup.
I got errors in the last crawl as well but I think you're right--it appears to be working!
System Progress
44% of Your Tweets Loaded
(2,840 of 6,437)100% of Your Followers Loaded
(646 loaded)
65% of Your Friends Loaded
(158 loaded)
(I'm going to be really depressed when it gets stuck at 50% (3,200) because of Twitter's stupid archive limit.)
Thanks for your help!
from thinkup.
Awesome! Pushed fix to master and closing this issue. (Thanks again for your patience and debugging help!)
from thinkup.
Related Issues (20)
- New insight: New Scrabble words
- New Insight: Top Colors
- Facebook Permissions HOT 6
- Installer not working
- Nav drawer does not show Instagram when Settings is selected
- ThinkUp Truck Factor HOT 2
- Undefined variable: last_week_of_posts
- Crawler can't deal with complex password containing special characters
- Updated file on 12/2/2015 - filepath too long to extract HOT 1
- White page - no indication of errors in logfiles
- force insight run HOT 1
- Installation Woes. HOT 1
- Missing .tpl files topwords.tpl and responsetime.tpl
- Oscars 2016 insight
- Trying to add a FB user I am getting "Invalid Scopes"
- Problems with plugin GeoEncoder Plugin
- PHP7 error: preg_replace(): The /e modifier is no longer supported
- ThinkUp(lastest version)- Cross-Site Scripting (XSS) HOT 1
- docs domain Thinkupapp.com domain expired, and thinkup.com seems to be a coupon site now? HOT 2
- Possible Path manipulation vulnerability HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from thinkup.