Comments (4)
Somehow the server responses 404 Not Found to GoRead
2013/07/07 19:23:56 INFO: adding feed http://sfbay.craigslist.org/sad/index.rss to user 185804764220139124118
2013/07/07 19:23:56 WARNING: fetch feed error: status code: 404 Not Found
2013/07/07 19:23:56 ERROR: add sub error (http://sfbay.craigslist.org/sad/index.rss): could not add feed http://sfbay.craigslist.org/sad/index.rss
from goread.
Looks like craigslist checks the user agent header
This is with the default user agent that CURL uses
$ curl -v http://sfbay.craigslist.org/sad/index.rss > /dev/null
* About to connect() to sfbay.craigslist.org port 80 (#0)
* Trying 208.82.236.225... % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0connected
* Connected to sfbay.craigslist.org (208.82.236.225) port 80 (#0)
> GET /sad/index.rss HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8x zlib/1.2.5
> Host: sfbay.craigslist.org
> Accept: */*
>
< HTTP/1.1 200 OK
< Connection: close
< Cache-Control: max-age=900, public
< Last-Modified: Sun, 07 Jul 2013 19:28:14 GMT
< Transfer-Encoding: chunked
< Date: Sun, 07 Jul 2013 19:28:14 GMT
< Vary: Accept-Encoding
< Content-Type: application/xml; charset=iso-8859-1
< Server: Apache
< Expires: Sun, 07 Jul 2013 19:43:14 GMT
<
{ [data not shown]
100 98573 0 98573 0 0 89976 0 --:--:-- 0:00:01 --:--:-- 106k* Closing connection #0
and now with user agent that GAE uses
$ curl -v -A 'AppEngine-Google; (+http://code.google.com/appengine)' http://sfbay.craigslist.org/sad/index.rss > /dev/null
* About to connect() to sfbay.craigslist.org port 80 (#0)
* Trying 208.82.236.225... % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0connected
* Connected to sfbay.craigslist.org (208.82.236.225) port 80 (#0)
> GET /sad/index.rss HTTP/1.1
> User-Agent: AppEngine-Google; (+http://code.google.com/appengine)
> Host: sfbay.craigslist.org
> Accept: */*
>
< HTTP/1.1 404 Not Found
* no chunk, no close, no size. Assume close to signal end
<
{ [data not shown]
100 2 0 2 0 0 5 0 --:--:-- --:--:-- --:--:-- 11* Closing connection #0
from goread.
From my observation, if the Craigslist server detects "AppEngine-Google" substring in the user agent, it will return 404 Not Found.
The GAE documentation says the following about setting the user-agent string
In addition, the User-Agent header can be modified but App Engine will append an identifier string to allow servers to identify App Engine requests. The appended string has the format "AppEngine-Google; (+http://code.google.com/appengine; appid: APPID)", where APPID is your app's identifier.
I tried to set the user agent string to GoRead
and this is what my dummy web server received
GET / HTTP/1.1
Host: localhost:8011
Accept-Encoding: gzip
User-Agent: GoRead AppEngine-Google; (+http://code.google.com/appengine)
The AppEngine-Google
substring is still there and the Craigslist server still blocks the request.
from goread.
@fajran hi did you finally figure out a solution for bypass this?
from goread.
Related Issues (20)
- All feeds mark as read after a few minutes HOT 9
- View menu can be hidden behind content
- story-footer cuts off on narrow screens
- Button labels for controls do not speak for a screen reader user.
- Feeds not scrolling properly in all feeds list.
- images broken
- Marking collapsed item as read fails
- Nested folder support for feed organization
- blobstore.Create() is deprecated HOT 8
- How to disable redirect to https? HOT 2
- How to work with https
- GoRead seems to get confused when using j/k to read HOT 3
- Incorrect cyrillic text for some feeds HOT 1
- tabnabbing? HOT 1
- Is the feed size limit too low? HOT 2
- HTML showing up in story
- broken encodings
- Exception while handling service_name: "memcache" HOT 1
- Suggestions for datastore bloat (self-hosting) HOT 1
- Production? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from goread.