Coder Social home page Coder Social logo

uga-libraries / general-aip Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 0.0 960 KB

This is the general workflow to make archival information packages (AIPs) that are ready for ingest into the UGA Libraries' digital preservation system (ARCHive). The workflow organizes files, extracts and formats metadata, and packages the files. It may be used for any combination of file formats.

License: Creative Commons Attribution Share Alike 4.0 International

XSLT 19.13% Python 70.19% HTML 10.68%

general-aip's Introduction

General Workflow for Creating AIPs

Overview

This script implements the general workflow to make archival information packages (AIPs) that are ready for ingest into the UGA Libraries' digital preservation system (ARCHive). It may be used for one or multiple files of any file format.

AIPs contain digital objects and metadata files, including a preservation.xml file required by ARCHive, and are bagged according to the Library of Congress standard. UGA Libraries AIP Definition

Specialized AIP workflows for audiovisual materials:

Getting Started

Dependencies

Installation

Configuration File

Use the configuration_template.py to make a file named configuration.py with file path variables for your local machine.

FITS Configuration

FITS includes multiple identification tools, and we adjust which tools are used for particular formats (based on the file extension) to reduce the number of errors.

  1. Navigate to the "xml" folder in the FITS folder on your local machine.
  2. Open the "fits.xml" file
  3. Edit the "exclude-exts" and "include-exts" for each tool as needed.
    1. Jhove: exclude "warc"
    2. FileUtility: exclude "warc"
  4. Comment out (start with ) MediaInfo, which has a known issue of not running correctly in FITS.

7-Zip Path

For Windows, add 7-Zip to your Windows System PATH. In settings, go to Environment Variables > Path > Edit > New and add the 7-zip folder.

Metadata File

Create a file named metadata.csv in the AIPs directory. Example metadata.csv This contains required information about each of the AIPs to be included in this batch. The header row is formatted Department,Collection,Folder,AIP_ID,Title,Version

For UGA, these values are:

  • Department: ARCHive group name
  • Collection: collection identifier
  • Folder: the current folder name of the AIP folder
  • AIP_ID: AIP identifier
  • Title: AIP title
  • Version: AIP version number, which must be a whole number

Script Arguments

To run the script via the command line: python /path/general_aip.py aips_directory [no-zip]

  • aips_directory (required) is the folder which contains all the folders to make into AIPs and the metadata.csv file
  • no-zip (optional) is included to only tar and not zip the AIP (for big formats like disk images).

Testing

Includes one test file per function, and a test to run the full script. Unit test scripts should be run with the script repo folder "tests" as the current working directory.

Known issue: Tests that check the contents of XML may fail due to the inconsistent order of element attributes.

Workflow

The script organizes the files, extracts and formats technical metadata, and bags and zips the AIP folders. See AIP Creation Instructions for details.

Each AIP is fully processed before the next one is started. If a known error is encountered, such as failing a validation test or a regular expression does not find a match, the AIP is moved to an error folder, and the rest of the steps are skipped for that AIP.

Author

Adriane Hanson, Head of Digital Stewardship, December 2019.

History

These scripts were adapted from a set of two bash scripts that were used for making AIPs from 2017-October 2019 at UGA Libraries. (https://github.com/uga-libraries/aip-mac-bash-fits)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.