Coder Social home page Coder Social logo

Comments (9)

ginatrapani avatar ginatrapani commented on July 20, 2024

These lines:

2009-08-25 09:05:08 | 1.20 MB | swirlee | Crawler:More than Twitter cap of 3200 already in system, moving on.
2009-08-25 09:05:08 | 1.20 MB | swirlee | Crawler:238 in system; 6435 by owner

Tell me the crawler thinks you have 6,435 tweets in the system already but you don't. That's definitely a bug.

Try this: go into your instances table, and in the swirlee row, change the value for the 'total_tweets_in_system' from 6,435 to 236, and recrawl.

Now we have to figure out why it got screwed up.

This value gets set in the Instance->save method. It's a giant UPDATE statement, but the relevant part sets total_tweets_in_system = (select count(*) from tweets where author_user_id=".$i->twitter_user_id.")

If you run that update statement (and replace the twitter_user_id with swirlee's), does it set the value correctly?

from thinkup.

jrunning avatar jrunning commented on July 20, 2024

total_tweets_in_system in the instances table says 239 already (up a few since I made a few new tweets today). That matches the number the number returned by SELECT COUNT(*) FROM tweets WHERE author_user_id='49833'.

mysql> SELECT twitter_username, total_tweets_by_owner, total_tweets_in_system 
FROM instances WHERE twitter_username='swirlee';
+------------------+-----------------------+------------------------+
| twitter_username | total_tweets_by_owner | total_tweets_in_system |
+------------------+-----------------------+------------------------+
| swirlee          |                  6435 |                    239 |
+------------------+-----------------------+------------------------+

mysql> SELECT COUNT(*) FROM tweets WHERE author_username='swirlee';
+----------+
| COUNT(*) |
+----------+
|      239 |
+----------+

from thinkup.

ginatrapani avatar ginatrapani commented on July 20, 2024

Ah-HA! Ok, this is awesome--a use case I haven't seen already. You have 6k+ tweets in Twitter, just not in the system, so the Crawler's test is incorrect.

In common/class.Crawler.php, change line 93 (after this comment):
//if you've got more than the Twitter API archive limit, stop looking for more tweets
if ( $this->owner_object->tweet_count >= $cfg->archive_limit ) {

to:

//if you've got more than the Twitter API archive limit, stop looking for more tweets
if ( $this->instance->total_tweets_in_system >= $cfg->archive_limit  ) {

And give it a recrawl. That should fix it, lemme know how it goes!

from thinkup.

jrunning avatar jrunning commented on July 20, 2024

Nope, no change. Same numbers all around.

Here's the log:

2009-08-25 11:38:50 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/account/rate_limit_status.xml
2009-08-25 11:38:50 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:Parsing XML data from https://twitter.com/account/rate_limit_status.xml
2009-08-25 11:38:50 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:42 of 150 API calls left this hour; 23 for crawler until 12:05:05
2009-08-25 11:38:51 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/users/show/swirlee.xml
2009-08-25 11:38:51 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:41 of 150 API calls left this hour; 22 for crawler until 12:05:05
2009-08-25 11:38:51 | 1.16 MB | swirlee | Crawler:Owner info set.
2009-08-25 11:38:52 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&since_id=3536563209&
2009-08-25 11:38:52 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:40 of 150 API calls left this hour; 21 for crawler until 12:05:05
2009-08-25 11:38:52 | 1.16 MB | swirlee | Crawler:0 tweet(s) found and 0 saved
2009-08-25 11:38:52 | 1.16 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:38:58 | 1.55 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:38:58 | 1.55 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:39 of 150 API calls left this hour; 20 for crawler until 12:05:05
2009-08-25 11:38:59 | 2.59 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:38:59 | 2.59 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:01 | 2.98 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:01 | 2.98 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:38 of 150 API calls left this hour; 19 for crawler until 12:05:05
2009-08-25 11:39:01 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:01 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:04 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:04 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:37 of 150 API calls left this hour; 18 for crawler until 12:05:05
2009-08-25 11:39:04 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:04 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:07 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:07 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:36 of 150 API calls left this hour; 17 for crawler until 12:05:05
2009-08-25 11:39:08 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:08 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:10 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:10 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:35 of 150 API calls left this hour; 16 for crawler until 12:05:05
2009-08-25 11:39:10 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:10 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:12 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:12 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:34 of 150 API calls left this hour; 15 for crawler until 12:05:05
2009-08-25 11:39:13 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:13 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:14 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:14 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:33 of 150 API calls left this hour; 14 for crawler until 12:05:05
2009-08-25 11:39:14 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:14 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:17 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:17 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:32 of 150 API calls left this hour; 13 for crawler until 12:05:05
2009-08-25 11:39:18 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:18 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:19 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:19 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:31 of 150 API calls left this hour; 12 for crawler until 12:05:05
2009-08-25 11:39:21 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:21 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:26 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:26 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:30 of 150 API calls left this hour; 11 for crawler until 12:05:05
2009-08-25 11:39:26 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:26 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:27 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:27 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:29 of 150 API calls left this hour; 10 for crawler until 12:05:05
2009-08-25 11:39:29 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:29 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:30 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:30 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:28 of 150 API calls left this hour; 9 for crawler until 12:05:05
2009-08-25 11:39:31 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:31 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:33 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:33 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:27 of 150 API calls left this hour; 8 for crawler until 12:05:05
2009-08-25 11:39:33 | 2.71 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:33 | 2.71 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:35 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:35 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:26 of 150 API calls left this hour; 7 for crawler until 12:05:05
2009-08-25 11:39:35 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:35 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:37 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:37 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:25 of 150 API calls left this hour; 6 for crawler until 12:05:05
2009-08-25 11:39:37 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:37 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:39 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:39 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:24 of 150 API calls left this hour; 5 for crawler until 12:05:05
2009-08-25 11:39:39 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:39 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:40 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:40 | 3.10 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:23 of 150 API calls left this hour; 4 for crawler until 12:05:05
2009-08-25 11:39:41 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:41 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:42 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:42 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:22 of 150 API calls left this hour; 3 for crawler until 12:05:05
2009-08-25 11:39:43 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:43 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:44 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:44 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:21 of 150 API calls left this hour; 2 for crawler until 12:05:05
2009-08-25 11:39:45 | 2.72 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:45 | 2.72 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:47 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:47 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:20 of 150 API calls left this hour; 1 for crawler until 12:05:05
2009-08-25 11:39:48 | 2.73 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:48 | 2.73 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:49 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&
2009-08-25 11:39:49 | 3.11 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:19 of 150 API calls left this hour; 0 for crawler until 12:05:05
2009-08-25 11:39:50 | 2.73 MB | swirlee | Crawler:200 tweet(s) found and 0 saved
2009-08-25 11:39:50 | 2.73 MB | swirlee | Crawler:239 in system; 6435 by owner
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:Crawler API call limit exceeded.
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:158 friends in system, 244 friends according to Twitter; Friend archive is not loaded
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:Follower archive marked as loaded
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:New follower count is 604 and system has 592; 12 new follows to load
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:Fetching follows via IDs
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:0 stray replied-to tweets to load.
2009-08-25 11:39:50 | 1.38 MB | swirlee | Crawler:0 unloaded follower details to load.
2009-08-25 11:39:51 | 1.39 MB | swirlee | InstanceDAO:Updated swirlee's system status.

from thinkup.

ginatrapani avatar ginatrapani commented on July 20, 2024

Ok, thank you so much for helping me debug this.

Good news: the crawler no longer thinks Twitalytic has gotten 3200 tweets for you.

Bad news: you're caught in an infinite loop of just requesting the 1st page of tweets and never paging back.

That's because the if statement in common/class.Crawler.php line 57 is also using the same wrong conditional as above. Change

$this->owner_object->tweet_count < $cfg->archive_limit

to

$this->instance->total_tweets_in_system < $cfg->archive_limit

And try a recrawl. Hopefully this will do it. (So sorry for all this back and forth...)

from thinkup.

jrunning avatar jrunning commented on July 20, 2024

Now I'm getting these errors:

Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/common/class.Crawler.php on line 47

Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/common/class.Crawler.php on line 120

Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/common/class.Crawler.php on line 394

Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/common/class.Crawler.php on line 314

Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/common/class.Crawler.php on line 315

Notice: Trying to get property of non-object in /home/.matelda/swirlee/jordanrunning.com/twitalytic/crawler/crawl.php on line 47

It took an inordinate amount of time as well, 10-15 minutes.

Log:

2009-08-25 14:05:03 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/account/rate_limit_status.xml
2009-08-25 14:05:03 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:Parsing XML data from https://twitter.com/account/rate_limit_status.xml
2009-08-25 14:05:03 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:112 of 150 API calls left this hour; 87 for crawler until 14:40:37
2009-08-25 14:05:04 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/users/show/swirlee.xml
2009-08-25 14:05:04 | 1.16 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:111 of 150 API calls left this hour; 86 for crawler until 14:40:37
2009-08-25 14:05:04 | 1.16 MB | swirlee | Crawler:Owner info set.
2009-08-25 14:05:05 | 1.17 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&since_id=3540483581&
2009-08-25 14:05:05 | 1.17 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:110 of 150 API calls left this hour; 85 for crawler until 14:40:37
2009-08-25 14:05:05 | 1.17 MB | swirlee | Crawler:1 tweet(s) found and 1 saved
2009-08-25 14:05:05 | 1.17 MB | swirlee | Crawler:241 in system; 6437 by owner
2009-08-25 14:05:16 | 1.18 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:API request: https://twitter.com/statuses/user_timeline/swirlee.xml?count=200&page=2&
2009-08-25 14:05:16 | 1.18 MB | swirlee | CrawlerTwitterAPIAccessorOAuth:109 of 150 API calls left this hour; 84 for crawler until 14:40:37
2009-08-25 14:05:16 | 1.18 MB | swirlee | Crawler:0 tweet(s) found and 0 saved
2009-08-25 14:05:16 | 1.18 MB | swirlee | Crawler:241 in system; 6437 by owner

from thinkup.

ginatrapani avatar ginatrapani commented on July 20, 2024

I believe that is caused by Twitter downage. It's been intermittently out for me this afternoon too. Any time XML from the API comes back not well-formed (or anything other than what the crawler expects), Twitalytic gets wonky and it's because Twitter's down or throwing HTTP codes other than 200. Twitalytic needs to handle this more gracefully; it's difficult to reproduce Twitter's behavior when it's down for proper tests and I haven't spent the time I should on that bit.

Give it a few more hours and check later on. In theory, when things stabilize, we should be able to see if that fix worked.

from thinkup.

jrunning avatar jrunning commented on July 20, 2024

I got errors in the last crawl as well but I think you're right--it appears to be working!

System Progress

44% of Your Tweets Loaded
(2,840 of 6,437)

100% of Your Followers Loaded
(646 loaded)
65% of Your Friends Loaded
(158 loaded)

(I'm going to be really depressed when it gets stuck at 50% (3,200) because of Twitter's stupid archive limit.)

Thanks for your help!

from thinkup.

ginatrapani avatar ginatrapani commented on July 20, 2024

Awesome! Pushed fix to master and closing this issue. (Thanks again for your patience and debugging help!)

from thinkup.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.