Coder Social home page Coder Social logo

mirrorweb / pywb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from webrecorder/pywb

1.0 1.0 2.0 23.9 MB

Core Python Web Archiving Toolkit for replay and recording of web archives

Home Page: https://pypi.python.org/pypi/pywb

License: GNU General Public License v3.0

Python 38.27% JavaScript 57.99% CSS 0.12% HTML 1.50% Shell 0.10% Arc 0.07% Dockerfile 0.02% Vue 1.92%
preservation-team

pywb's People

Contributors

anastasia avatar anjackson avatar ato avatar atomotic avatar danielbicho avatar dependabot[bot] avatar devhercule avatar edsu avatar ekilfeather avatar humberthardy avatar ikreymer avatar jcushman avatar kaij avatar kngenie avatar krakan avatar kuechensofa avatar ldko avatar lukey3332 avatar m4rk3r avatar machawk1 avatar maeb avatar n0tan3rd avatar nlevitt avatar rajbot avatar rebeccacremona avatar robertknight avatar sebastian-nagel avatar tilgovi avatar tw4l avatar vanecat avatar

Stargazers

 avatar

pywb's Issues

Webarchive won't replay unless Cloudfront is bypassed

cc @themasonbanks

Example URLs
https://webarchive.nationalarchives.gov.uk/ukgwa/20031020010435/http://www.nationalarchives.gov.uk/news/stories/9.htm
https://tnaqa.mirrorweb.com/ukgwa/20211004151828/https://www.counterterrorism.police.uk/latest-news/page/20/

When trying to navigate to these pages we are served with an error like this - "The web page at https://tnaqa.mirrorweb.com/ukgwa/20211004151810mp_/https://www.counterterrorism.police.uk/latest-news/page/20/ might be temporarily down or it may have moved permanently to a new web address."

When viewing this page we are served with a PYWB error (there are multiple examples of this error throughout)
https://tnaqa.mirrorweb.com/ukgwa/20211004151800/https://www.police.uk/

When bypassing Cloudfront these pages load fine.

Inconsistent behaviour in the archived site.

Describe the bug

When testing in a local pywb environment, the site is shown to work consistently, with the help of the developed content scripts. But inconsistent behaviour is shown when a crawl is done accessed. Sometimes the instance will show a blank screen, then when refreshed or the link is re-opened again, the site loads up. Links that were previously working may then decide not to work. Console.log's from my code are present in the dev tools, when locally running the site in pywb. But when looking for the same console.logs in the new marrionette instances, they are not present. Older instances show no effect and seem to be working, as expected. To note, only the most recent crawls seem to be affected. Possibly a cdx/indexing issue?

Steps to reproduce the bug

Access the most recent instance in marionette:

  1. Open the archive to the latest instance
  2. Navigate to News and Statements, via the News and Campaigns tab in the main navbar
  3. Access any pages on the page.
  4. Some pages will appear blank but accessible when refreshed. Others will load in as expected.

Expected behaviour

When a link is clicked the corresponding page is open.

Screenshots

Screenshot 2023-01-20 at 10 59 29 (2)
Screenshot 2023-01-20 at 10 59 32 (2)

Environment

  • OS: macOs Monterey version 12.6 (21G115)
  • Browser: Google Chrome
  • Version: Version 108.0.5359.124 (Official Build) (arm64)

Additional context

Clients Profile Link: https://app.mirrorweb.com/management/web/16cac3a5-4135-4750-b8fe-657a3faa7a7e/profile/aec04c0d-4c98-4eb1-a491-2e10f306c99c/
Jira ticket: https://mirrorweb.atlassian.net/browse/CS-1326

CS-753: Science & Technology Facilities Council (STFC) Continous refresh issue.

Expected behavior

The page, when directed too, should replay as normal with all images being present and interactivity working as well.

What actually happened

The site is continually refreshing when the site is interactied with; when the site page is scrolled or any image in the image carousell or the dropdown menu is clicked. It makes site navigation impossible. When in the dev tools, and disable the javascript option, the refreshng stops. This suggests a javascript problem but at this point the solution has not been identiafiable.

Steps to replicate:

  1. Go to 'https://webarchive.nationalarchives.gov.uk/ukgwa/20220601094243/https://stfccareers.co.uk/'
  2. Click on the drop-down menu options, or scroll the page
  3. The page will refresh, continously.

specs of tech used:
macbook m1 2021
macos os monterrey version 12.5.1
google chrome version: 105.0.5195.125 (Official Build) (arm64)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.