Coder Social home page Coder Social logo

web-archive-it-api's Introduction

Archive-It APIs Scripts

Overview

These scripts use the Archive-It web archiving service APIs (Partner API and WASAPI) to generate reports. They are used to prepare for quarterly downloads from Archive-It for preservation and to review and update metadata.

All reports are CSVs. Report scripts in this repository:

Getting Started

Dependencies

  • pandas: edit and summarize API output
  • requests: download content from the APIs

Installation

Prior to using any of these scripts, create a file named configuration.py, modeled after configuration_template.py, and save it to your local copy of this repository. This defines a place for script output to be saved and includes your Archive-It login credentials.

Script Arguments

collection_metadata_report.py

  • required (optional): add "required" to limit the report to UGA's required collection metadata fields. Otherwise, all fields are included.

preservation_download_tracker.py

  • warc_metadata_path (required): the location of the WARC metadata report, created using warc_metadata_report.py.

seed_metadata_report.py

  • required (optional): add "required" to limit the report to UGA's required seed metadata fields. Otherwise, all fields are included.

warc_csv.py

  • Both date arguments are formatted YYYY-MM-DD and define the date range of WARCs to include.
  • start_date (required): first store date of WARCs to include.
  • end_date (required): first store date of WARCs NOT to include (last date included is the day before end_date).

Testing

There are unit tests for each function and the entire script for each of the scripts, except for check_config() (Issue 21) and the API error for get_metadata() (Issue 22). The tests for functions that call the API and for the script rely on UGA Archive-It data. For UGA, the expected results of these tests may need to be updated occasionally to keep in sync with our edits. To use these tests with another account, all expected results must be edited to use data in that account.

Workflow

These scripts are used for two different workflows at UGA:

The reports may also be created and used individually.

Author

Adriane Hanson, Head of Digital Stewardship, University of Georgia

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.