Coder Social home page Coder Social logo

uga-libraries / av-aip_russell Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 401 KB

Creates aips from folders of Russel Library digital audiovisual objects that are ready for ingest into the UGA Libraries' digital preservation system (ARCHive).

License: Creative Commons Attribution Share Alike 4.0 International

Python 58.20% Perl 2.14% XSLT 39.66%

av-aip_russell's Introduction

Russell Library Workflow for AV AIPs

Purpose and overview

This is the workflow to make archival information packages (AIPs) for Russel Library digital audiovisual objects that are ready for ingest into the UGA Libraries' digital preservation system (ARCHive). The workflow organizes files, extracts and formats metadata, and packages the files. The Brown Media Archives has their own workflow for audiovisual materials which has specialized rules for the different formats they use. UGA Libraries also has a general workflow for mixed formats and a specialized workflow for web archives.

The UGA Libraries has two types of AV AIPs: media and metadata. The only difference is media AIPs contain the AV files and metadata AIPs contain supporting documentation such as OHMS XML, transcripts, and releases. The media and metadata are kept in two separate AIPs so that the media, which is generally ready for preservation first, can be ingested into ARCHive without delay. Once the metadata is created, it can be added to ARCHive as a separate AIP. If the media and metadata were in the same AIP, when the metadata was ready a new version of the AIP would have to be ingested into ARCHive that contained an identical copy of the media, which is not a good use of our preservation storage space.

As of August 2021, this script also works for creating Hargrett oral history AIPs.

Script approach

The script iterates over each folder to be made into an AIP, completing all steps for one folder before starting the next. If a known error is encountered, such as failing a validation test, the folder is moved to an error folder, and the rest of the steps are skipped for that folder.

Because this script can take some time to complete, particularly when tarring and zipping larger files, it prints to the terminal whenever it is starting a new AIP folder so staff can monitor the script's progress.

A log is created by the script with the name of each AIP folder and its final status, which is whether the AIP encountered a known error or if it completed, so staff can quickly review the result of a batch of AIPs.

Script usage

python3 'path/aip_av.py' 'path/aip-directory'

See "Script Input" (below) for details on the AIPs directory.

Dependencies

Installation

  1. Install the dependencies (listed above). Saxon may come with your OS.

  2. Download this repository and save to your computer.

  3. Use the configuration_template.py to make a file named configuration.py with file path variables for your local machine.

  4. Change permissions on the scripts so they are executable.

Script Input (AIPS Directory)

The content to be transformed into AIPs must be in a single folder, which is the AIPs directory. Within the AIPs directory, there is one folder for each AIP. Each folder must be only media or only metadata files.

Hargrett script input

The AIPs directory should be in a bag, since files are transferred over the network before they are transformed into AIPs. The AIP folders are named AIPID_Title. Example AIPs directory:

Screenshot of Hargrett AIPs Directory

Hargrett oral history AIP IDs are formatted har-ua##-###_####, for example har-ua12-003_0001

Hargrett title naming conventions are:

  • Firstname Lastname Interview Recording (for media AIPs)
  • Firstname Lastname Interview Metadata (for metadata AIPs)

Use the hargrett-preprocessing.py script to validate the AIPs directory bag and remove the AIP folders from the bag prior to running this script.

Russell script input

Russell AIP folders should be named with the AIP title. The naming convention is identifier_lastname where the identifier is the AIP ID without the type (media or metadata) suffix.

Workflow Details

See also the graphical representation of this workflow.

  1. Deletes files that do not have any of the expected file extensions (.dv, .m4a, .mov, .mp3, .mp4, .wav, .pdf, or .xml) from the AIP folder.

  2. Determines the department, AIP ID, and AIP title from the AIP folder name and file formats.

  3. Structures the AIP folder contents:

    1. Makes a folder named objects and moves all files and folders into it.
    2. Makes a folder named metadata for script outputs.
    3. Renames the AIP folder to the AIP ID.
  4. Runs MediaInfo on the objects folder and saves the result in the metadata folder.

  5. Transforms the MediaInfo xml into the preservation.xml (PREMIS technical metadata) file using saxon and xslt.

  6. Validates the preservation.xml with xmllint and xsd's.

  7. Bags the AIP in place with md5 and sha256 manifests using bagit.py.

  8. Validates the bags using bagit.py.

  9. Runs the perl script prepare_bag on the AIP to tar and zip it and saves output to aips-ready-to-ingest.

  10. When all AIPs are processed, makes a md5 manifest of the packaged AIPs in the aips-to-ingest folder using md5sum.

Initial Author

Adriane Hanson, Head of Digital Stewardship, January 2020

Acknowledgements

These scripts were adapted from bash scripts developed by Iva Dimitrova. These were used by the Russell Library for making AV AIPs from 2017 to 2019.

av-aip_russell's People

Contributors

amhanson9 avatar

Watchers

 avatar  avatar  avatar

av-aip_russell's Issues

Work with Hargrett MS ids

Manuscript ids are formatted har-ms####_####, where ms#### is the collection id. Until now, we've only had university archives ids, which are formatted har-ua##-###_####.

Tests that determine the department will still work, since they look for starting with "har" and that is also true of manuscript ids. Need to update the pattern in aip_metadata() and in the variable collection-id in mediainfo-to-preservation.xslt.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.