Coder Social home page Coder Social logo

wayback-diff's Introduction

Wayback-diff

This React project is a fork of EDGI's web-monitoring-ui project to enable analysts to quickly assess changes to monitored government websites. It works with EDGI's web-monitoring-processing server and the Internet Archive's Wayback Machine to render the differences between two captures of the same web page.

In addition, this project contains a Sunburst component to illustrate the differences of a web page capture compared to other captures for the same year at the Wayback Machine.

Table of Contents

Installation and Requirements
Usage
Build the project

Install node dependencies with: yarn install

web-monitoring-processing server must be running with CORS mechanism enabled.

This component uses Bootstrap 3, so make sure to include it in your entry point HTML document.

You need to have a CORS-enabled browser for this component to work stand alone.

Run the server with: yarn start

There are three types of URL calls:

1)

http://localhost:port(default 3000)/diff/WEBSITE

Example request: http://localhost:3000/diff/iskme.org

2)

http://localhost:port(default 3000)/diff/TIMESTAMP_A/TIMESTAMP_B/WEBSITE

Example request: http://localhost:3000/diff/20170223193029/20171212125810/archive.org

3)

http://localhost:port(default 3000)/diagram/WEBPAGE/YEAR/TIMESTAMP/

Example request: http://localhost:3000/diagram/iskme.org/2018/20180813072115

Props

DiffContainer can receive up to eight props. All of them are optional.

The conf prop receives a JSON file that contains the configuration of the wayback-diff component.

The fetchCDXCallback prop is a callback function used to fetch the snapshots available from the CDX server.

The fetchSnapshotCallback prop is a callback function that is used to fetch the snapshots from the Wayback Machine.

  • If null is passed to either one of the fetchCallback props a default fallback method is going to be used instead.

  • The callback function should return a fetch Promise.

    If you use this prop, the limit conf option does not have any effect.

The loader prop is a React Component that displays when loading. If this is not set or it is null, a default loader is used instead.

The timestampA and timestampB props are the timestamps extracted from the URL.

The url prop is the webpage for which the snapshots are shown.

The noTimestamps prop which should only be set to true in the /diff///WEBPAGE path schema.

SunburstContainer can receive up to five props. All of them are optional.

The loader prop is a React Component that displays when loading. If this is not set or it is null, a default loader is used instead.

The timestamp which is the timestamp whose simhash is compared to others.

The url which is the webpage for which the the simhashes to be compared.

The conf which is a JSON file that contains the configuration of the wayback-diff component.

The fetchSnapshotCallback which is a callback function used to fetch the snapshots from the Wayback Machine. This is used to validate the timestamp in the URL.

  • If null is passed to either one of the fetchCallback props a default fallback method is going to be used instead.

  • The callback function should return a fetch Promise.

conf.json

The configuration file should have the following format:

{
  "webMonitoringProcessingURL": "http://localhost:8888",
  "limit": "1000",
  "snapshotsPrefix": "http://web.archive.org/web/",
  "urlPrefix": "/diff/",
  "diffgraphPrefix": "/diffgraph/",
  "cdxServer": "http://web.archive.org/cdx/",
  "sparklineURL": "http://web.archive.org/__wb/sparkline",
  "waybackDiscoverDiff": "http://localhost:4000",
  "maxSunburstLevelLength": "70",
  "compressedSimhash": true
}

Run yarn build:dev

Install the library

To install the component library, inside your new project directory you should run:

yarn add file:[PATH_TO]/wayback-diff

where [PATH_TO] equals with the path where you have wayback-diff saved.

If you are installing this component as a library from this Github repository:

yarn add https://github.com/ftsalamp/wayback-diff

Import the component

In the file you want to use the wayback-diff component use the following code to import it:

import {DiffContainer, SunburstContainer} from 'wayback-diff';

Use the component

After importing the component you might use it like any other React component:

  <Router>
    <Switch>
      <Route path='/diff/([0-9]{14})/([0-9]{14})/(.+)' render={({match, location}) =>
        <DiffContainer url={match.params[2] + location.search} timestampA={match.params[0]}
          loader={LOADER_COMPONENT}
          timestampB={match.params[1]} fetchCDXCallback={null} conf={this.conf} fetchSnapshotCallback={null} />
        } />
      <Route path='/diff/([0-9]{14})//(.+)' render={({match, location}) =>
        <DiffContainer url={match.params[1] + location.search} timestampA={match.params[0]}
          loader={LOADER_COMPONENT}
          fetchCDXCallback={null} conf={this.conf} fetchSnapshotCallback={null}/>
        } />
      <Route path='/diff//([0-9]{14})/(.+)' render={({match, location}) =>
        <DiffContainer url={match.params[1] + location.search} timestampB={match.params[0]}
          loader={LOADER_COMPONENT}
          fetchCDXCallback={null} conf={this.conf} fetchSnapshotCallback={null}/>
      } />
      <Route path='/diff///(.+)' render={({match, location}) =>
        <DiffContainer url={match.params[0] + location.search} conf={this.conf} noTimestamps={true} fetchCDXCallback={null}
          loader={LOADER_COMPONENT}/>
      } />
      <Route path='/diff/(.+)' render={({match, location}) =>
        <DiffContainer url={match.params[0] + location.search} fetchCDXCallback={null}
          loader={LOADER_COMPONENT} conf={this.conf}/>}
      />
      <Route path='/diffgraph/([0-9]{14})/(.+)' render={({match, location}) =>
        <SunburstContainer url={match.params[1] + location.search} timestamp={match.params[0]}
          loader={LOADER_COMPONENT}
          conf={this.conf} fetchSnapshotCallback={null}/>} 
      />
    </Switch>
  </Router>
}/>

Example project

If you need an example on how to use the component check out this repository

wayback-diff's People

Contributors

anishsarangi avatar cclauss avatar dependabot[bot] avatar ftsalamp avatar ibnesayeed avatar vbanos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wayback-diff's Issues

Missing license

Thanks for the commit cfe567d expanding on the provenance of this code. It is also important to add web-monitoring-ui's GPL license to this repository. You may simply copy this file.

base64 decoding improvement

From my understanding we implement base64 decoding in our own code,
can we use standard function atob ?
https://www.w3schools.com/jsref/met_win_atob.asp
(Browser support is OK, we don't need to support old browsers).

I guess there should be many other libraries in npm like https://www.npmjs.com/package/js-base64

Please also add a comment in the function saying why we need to base64 decoding.

Sunburst timestamp validation improvement

Before showing the sunburst, we validate the target timestamp.
The way we do it currently is that we load the capture and if its successful, we conclude that the timestamp is valid.
This method is very inefficient, the capture could be really big, its not good trying to download it all.
Please perform a CDX query instead using the URL and timestamp to see if its valid.

Please also add a short comment in this function to explain its purpose.

Drop d3-scale-cluster dependency

Reimplement the d3 scale cluster in a simpler way with d3 without any external dependencies.

We need to have a leaner package with less dependencies for better maintenance and efficiency.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.