Comments (17)
Please ping Michael Boudreau [email protected] when this is done. (See: https://hypothesis.zendesk.com/agent/tickets/751). Thanks!
from product-backlog.
Correspondence with Marty Picco at Atypon:
Marty Picco
Hi Jon,
Following up on this.
As I mentioned in our last call, any uniquely identifiable pattern (such as you propose) would work as a user agent. If we go with this method, the reverse DNS on the crawler IP addresses must resolve to your domain. Alternatively we can use an IP range directly if you could provide that.
Please let me know what you'd prefer so we can sort out UChicago.
Regards,
Marty
Hi Marty,
It's in the queue, and I've told Michael Boudreau that we'll ping him when it's done.
Thanks,
Jon
from product-backlog.
Bump from Marty on Dec 5:
Hi Jon,
I hope you're doing well.
Curious minds are asking about how this is progressing. Any update from your end?
What needs doing here, @ajpeddakotla: Simply the addition of Hypothesis-Via (or some other suitable identifier) to the User-Agent string that via sends. (And then notification to Marty Picco [email protected] that it's done.)
from product-backlog.
@judell we'll need to scope this card out before we can figure out how to make this change.
We'll timebox the scoping work here that needs to be done to 2 days. Let's research what the possible solutions are and discuss as a team, the best possible approach to solving this problem.
from product-backlog.
Requesting some input on a path forward from the maintainer of pywb (the service behind this whole system) webrecorder/pywb#202
from product-backlog.
We haven't had a response from upstream yet so I suggest what we do for now is investigate this ourselves, in our fork of pywb, and create an upstream PR afterwards.
from product-backlog.
We can research if we can use squid to implement and keep the time box to 2 days.
from product-backlog.
While reviewing a different PR, an alternative approach came up - https://hypothes-is.slack.com/archives/public/p1484674499002511
Via is a Python WSGI HTTP server consisting of several layers of middleware, where the innermost layer is pywb. We might be able to add/modify the headers in a middleware layer (in Via's app.py
) instead of modifying pywb.
from product-backlog.
Hmm, I suppose we should really be using the Via
header here instead of User-Agent
. However, as noted in https://community.akamai.com/community/web-performance/blog/2015/05/06/beware-the-via-header-disabled-compression-can-have-a-performance-impact that may have other undesirable consequences.
This also assumes that our users can filter on the Via header.
from product-backlog.
This is now on staging, you can see it in action by visiting one of the sites that displays your user agent string through Via, eg. https://qa-via.hypothes.is/http://www.whatismyuseragent.net/
from product-backlog.
@judell this is now on production as well, as of January 30th.
from product-backlog.
A note from Marty Picco today:
Hi Jon,
It looks like things are configured on our end. Can you confirm that things are working as expected on yours?
This link http://via.hypothes.is/http:/www.journals.uchicago.edu/doi/pdf/10.1086/682050 still points to a cookies required page. Are you caching responses on your end? If so, you'll need to recrawl.
Please let me know...
Thanks,
Marty
I said we'd check and let him know.
from product-backlog.
We strip the cookies due to security concerns, and I believe always have been, just that it's now implemented in via instead of the nginx configuration on the old infrastructure.
To be exact, it looks like we started stripping cookies 10 months ago, and recently moved the code into via itself with the move to Skyliner.
from product-backlog.
Thanks, @chdorner. Just to confirm:
-
Adding
Hypothesis-Via
to the User-Agent string is working as expected. -
Via does not (and will not, due to security concerns) support making cookie authenticated requests.
from product-backlog.
Follow-up with Marty:
Hi folks,
According to https://via.hypothes.is/https://httpbin.org/user-agent, the string "Hypothesis-Via" is included in the User-Agent header, as per hypothesis/via@46e84af#diff-d7a39c0d6fdaa37450167e35b2dbec97.
Note that we do not transmit cookies or auth headers, though: https://github.com/hypothesis/via/blob/81f950e04c2c787641b7141eaef0beb82937c093/via/security.py#L12-L17. I see that http://www.journals.uchicago.edu/doi/full/10.1086/692829, for example, which is openly available, leads to https://via.hypothes.is/http://www.journals.uchicago.edu/action/cookieAbsent when sent through via.
Not sure what to do about "Cookies required" -- what do you think?
On Mon, Sep 25, 2017 at 2:40 PM, Marty Picco [email protected] wrote:
Hi Jon,
I hope you're doing well.
We've had another inquiry from UChicago as it still appears that this isn't working.
I checked our logs and couldn't find any instance of UA matching '.Hypothesis-Via.' regex.
Could you check on your end?
Copying Nohar Wahnishe who's chasing the issue on our end.
Regards,
Marty
from product-backlog.
We strip the Cookie
header from the request to the origin, and then the Set-Cookie
from the response back to the browser. This is done for security reasons, as the cookies would all be shared between the websites a user would visit through via.
I'm guessing that journals.uchicago.edu
requires cookies for some reason? Unfortunately, I don't think there is much we can do about that.
from product-backlog.
I'm guessing that journals.uchicago.edu requires cookies for some reason?
They do. More from Marty today:
Every article access requires a session, even from crawlers. Session data includes (but might not be limited to) JSESSIONID cookie, which is used by our load balancer among other things. No cookies -> no session -> "Cookies Required"
I guess we should just encourage them to embed H, eh?
from product-backlog.
Related Issues (20)
- Add “Only Me” view to group selector, and create a new group in the db
- All groups view control
- HTML Side-by-side
- Hierarchical Org Usage
- New install in a Canvas Beta site is changing Canvas assignment names when the tool is selected HOT 2
- Default sort order for annotations
- Automate email outreach user list generation
- Upgrade SQLAlchemy 1.4 -> 2.0
- Upgrade RabbitMQ
- Upgrade Elasticsearch
- Upgrade Python
- Compile a list of breakpoint resolutions that we want to resolve
- LMS app - link in an LMS app URL assignment, to the same URL the assignment is made from, is opening in a new tab
- LMS app - Links embedded in PDFs open in the tab instead of opening in a new window
- LMS app: updating a course name does not update the Hypothesis group name associated with the course
- Share Annotations Between Sections
- Rework Sections
- D2L Assignment Creation HOT 4
- Grading toolbar and feedback comments in Non-Canvas LMS’s HOT 2
- Moodle Parity
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from product-backlog.