I have a very strange use case that I would like to submit for consideration because I

That unfortunately did not work: <div class="highlight highlight-source-r notransl

I think this works: <div class="highlight highlight-source-r notranslate position-

Working with very strange redirects about curl HOT 8 CLOSED

briatte commented on August 21, 2024

Working with very strange redirects

from curl.

Comments (8)

hadley commented on August 21, 2024

You probably need to set the Accepts header to prioritise html over json.

from curl.

briatte commented on August 21, 2024

That unfortunately did not work:

GET("http://www.camera.it/leg17/29?tipoAttivita=&tipoVisAtt=&tipoPersona=&shadow_deputato=300453&idLegislatura=17", accept("text/html"))

still returns the "JSON" version of the page.

from curl.

jeroen commented on August 21, 2024

They are running a misconfiged cache server so you are getting false hits. Try this:

req <- GET("http://www.senato.it/loc/link.asp?tipodoc=CAM.DEP&leg=17&id=30350", 
  add_headers(Accept = "text/html", "Cache-control" = "max-age=0"))
content(req, "text")

Sometimes it helps if you just add an arbitrary parameter to the URL to bypass the cache:

url <- paste0("http://www.senato.it/loc/link.asp?tipodoc=CAM.DEP&leg=17&id=30350&_random=", runif(1))
req <- GET(url, accept("text/html"))
content(req, "text")

from curl.

briatte commented on August 21, 2024

Still no luck: I get a false hit, whatever the URL used.

On top of that, I've just discovered that curl -v always returns HTML, but the content of the page is often faulty ("page temporarily inaccessible, return later").

from curl.

jeroen commented on August 21, 2024

I think this works:

library(httr)
url <- "http://www.camera.it/leg17/29?tipoAttivita=&tipoVisAtt=&tipoPersona=&shadow_deputato=300453&idLegislatura=17"
url <- paste0(url, "&_random=", rnorm(1))
req <- GET(url, accept("text/html"))
stopifnot(req$headers[["x-cache"]] == "MISS")
stopifnot(req$headers$age == "0")
content(req, "text")

Their server is really poorly configured, not only does it give false hits but it ignores the Cache-Control: no-cache request header. But slightly changing the URL usually forces the cache server to fetch a new copy.

from curl.

briatte commented on August 21, 2024

It seems wot work indeed!

Thank you very much to both of you for your help.

How did you come with the &random= part?

from curl.

jeroen commented on August 21, 2024

It's just something arbitrary that you add to the URL in order to trick the cache server into thinking that you are fetching a different page, so it cannot serve you a cached copy. It's a common trick to force bypassing any cache.

from curl.

briatte commented on August 21, 2024

Excellent. Thanks again and enjoy your days.

from curl.

Recommend Projects

Working with very strange redirects about curl HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent