Coder Social home page Coder Social logo

Comments (17)

judell avatar judell commented on June 27, 2024

Please ping Michael Boudreau [email protected] when this is done. (See: https://hypothesis.zendesk.com/agent/tickets/751). Thanks!

from product-backlog.

judell avatar judell commented on June 27, 2024

Correspondence with Marty Picco at Atypon:

Marty Picco
Hi Jon,

Following up on this.

As I mentioned in our last call, any uniquely identifiable pattern (such as you propose) would work as a user agent. If we go with this method, the reverse DNS on the crawler IP addresses must resolve to your domain. Alternatively we can use an IP range directly if you could provide that.

Please let me know what you'd prefer so we can sort out UChicago.

Regards,

Marty


Hi Marty,

It's in the queue, and I've told Michael Boudreau that we'll ping him when it's done.

Thanks,

Jon

from product-backlog.

judell avatar judell commented on June 27, 2024

Bump from Marty on Dec 5:

Hi Jon,

I hope you're doing well.

Curious minds are asking about how this is progressing. Any update from your end?

What needs doing here, @ajpeddakotla: Simply the addition of Hypothesis-Via (or some other suitable identifier) to the User-Agent string that via sends. (And then notification to Marty Picco [email protected] that it's done.)

from product-backlog.

ajpeddakotla avatar ajpeddakotla commented on June 27, 2024

@judell we'll need to scope this card out before we can figure out how to make this change.

We'll timebox the scoping work here that needs to be done to 2 days. Let's research what the possible solutions are and discuss as a team, the best possible approach to solving this problem.

from product-backlog.

sean-roberts avatar sean-roberts commented on June 27, 2024

Requesting some input on a path forward from the maintainer of pywb (the service behind this whole system) webrecorder/pywb#202

from product-backlog.

robertknight avatar robertknight commented on June 27, 2024

We haven't had a response from upstream yet so I suggest what we do for now is investigate this ourselves, in our fork of pywb, and create an upstream PR afterwards.

from product-backlog.

ajpeddakotla avatar ajpeddakotla commented on June 27, 2024

We can research if we can use squid to implement and keep the time box to 2 days.

from product-backlog.

robertknight avatar robertknight commented on June 27, 2024

While reviewing a different PR, an alternative approach came up - https://hypothes-is.slack.com/archives/public/p1484674499002511

Via is a Python WSGI HTTP server consisting of several layers of middleware, where the innermost layer is pywb. We might be able to add/modify the headers in a middleware layer (in Via's app.py) instead of modifying pywb.

from product-backlog.

robertknight avatar robertknight commented on June 27, 2024

Hmm, I suppose we should really be using the Via header here instead of User-Agent. However, as noted in https://community.akamai.com/community/web-performance/blog/2015/05/06/beware-the-via-header-disabled-compression-can-have-a-performance-impact that may have other undesirable consequences.

This also assumes that our users can filter on the Via header.

from product-backlog.

robertknight avatar robertknight commented on June 27, 2024

This is now on staging, you can see it in action by visiting one of the sites that displays your user agent string through Via, eg. https://qa-via.hypothes.is/http://www.whatismyuseragent.net/

from product-backlog.

chdorner avatar chdorner commented on June 27, 2024

@judell this is now on production as well, as of January 30th.

from product-backlog.

judell avatar judell commented on June 27, 2024

A note from Marty Picco today:

Hi Jon,

It looks like things are configured on our end. Can you confirm that things are working as expected on yours?

This link http://via.hypothes.is/http:/www.journals.uchicago.edu/doi/pdf/10.1086/682050 still points to a cookies required page. Are you caching responses on your end? If so, you'll need to recrawl.

Please let me know...

Thanks,

Marty

I said we'd check and let him know.

from product-backlog.

chdorner avatar chdorner commented on June 27, 2024

We strip the cookies due to security concerns, and I believe always have been, just that it's now implemented in via instead of the nginx configuration on the old infrastructure.

To be exact, it looks like we started stripping cookies 10 months ago, and recently moved the code into via itself with the move to Skyliner.

from product-backlog.

nickstenning avatar nickstenning commented on June 27, 2024

Thanks, @chdorner. Just to confirm:

  1. Adding Hypothesis-Via to the User-Agent string is working as expected.

  2. Via does not (and will not, due to security concerns) support making cookie authenticated requests.

from product-backlog.

judell avatar judell commented on June 27, 2024

Follow-up with Marty:

Hi folks,

According to https://via.hypothes.is/https://httpbin.org/user-agent, the string "Hypothesis-Via" is included in the User-Agent header, as per hypothesis/via@46e84af#diff-d7a39c0d6fdaa37450167e35b2dbec97.

Note that we do not transmit cookies or auth headers, though: https://github.com/hypothesis/via/blob/81f950e04c2c787641b7141eaef0beb82937c093/via/security.py#L12-L17. I see that http://www.journals.uchicago.edu/doi/full/10.1086/692829, for example, which is openly available, leads to https://via.hypothes.is/http://www.journals.uchicago.edu/action/cookieAbsent when sent through via.

Not sure what to do about "Cookies required" -- what do you think?

On Mon, Sep 25, 2017 at 2:40 PM, Marty Picco [email protected] wrote:
Hi Jon,

I hope you're doing well.

We've had another inquiry from UChicago as it still appears that this isn't working.

I checked our logs and couldn't find any instance of UA matching '.Hypothesis-Via.' regex.

Could you check on your end?

Copying Nohar Wahnishe who's chasing the issue on our end.

Regards,

Marty

from product-backlog.

chdorner avatar chdorner commented on June 27, 2024

We strip the Cookie header from the request to the origin, and then the Set-Cookie from the response back to the browser. This is done for security reasons, as the cookies would all be shared between the websites a user would visit through via.
I'm guessing that journals.uchicago.edu requires cookies for some reason? Unfortunately, I don't think there is much we can do about that.

from product-backlog.

judell avatar judell commented on June 27, 2024

I'm guessing that journals.uchicago.edu requires cookies for some reason?

They do. More from Marty today:

Every article access requires a session, even from crawlers. Session data includes (but might not be limited to) JSESSIONID cookie, which is used by our load balancer among other things. No cookies -> no session -> "Cookies Required"

I guess we should just encourage them to embed H, eh?

from product-backlog.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.