Coder Social home page Coder Social logo

bags's Introduction

Quick scripts to work with Library of Congress bags

Overview

Scripts to perform simple actions on Library of Congress bags. Includes removing content from a bag or batch of bags and validating a batch of bags.

Related script: Unpack AIPs, used to get AIPs from the preservation system ready to share with users.

Getting Started

Dependencies

These scripts only use standard Python libraries and have no special dependencies.

Script Arguments

undo_all_bags.py

  • bag_directory (required): path to directory that contains the bags. Bag folder names should end with "_bag".

undo_one_bag.py

  • bag (required): path to the bag.

validate_bags.py

  • bag_directory (required): path to the directory that contains the bag. Bag folder names should end with "_bag".

Testing

There are unit tests for each script in the tests folder. The scripts do not have functions, so the only tests are for each entire script. The tests for undo_one_bag.py are incomplete: see Issue 1

Workflow

undo_all_bags.py and undo_one_bag.py

These scripts are used to removed files from all bags in a specified directory or a specified bag. They are most commonly used when files are bagged for storage and later need to be worked on. After validating the bag(s), the content is removed from the bags for further processing.

For each bag:

  1. Deletes the bag manifests.
  2. Moves the content from the data folder into the parent directory.
  3. Deletes the data folder.
  4. Renames the parent directory to remove "_bag".

validate_bags.py

This script validates all bags in a specified directory and prints the results to the terminal. Use this instead of bagit.py's validation because bagit.py prints a lot of extra text during validation.

For each file in the directory:

  1. Confirms it is a bag, based on naming convention of ending with "_bag".
  2. Validates the bag using bagit.py.
  3. Prints to the terminal the bag name, if it is valid or invalid, an any error message.

Author

Adriane Hanson, Head of Digital Stewardship, University of Georgia

bags's People

Contributors

amhanson9 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

bags's Issues

Unit test for undo_one_bag.py if input folder has _bag

Unable to get the test to work if the input folder has _bag. Got a permissions error for renaming to remove "_bag" when running the unit test.

This did not happen with the undo_all_bags.py script. I couldn't find the difference between the two.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.