Coder Social home page Coder Social logo

uga-libraries / aspace_batch_export-cleanup-upload Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 3.0 3.28 MB

A program to batch export and clean ArchivesSpace resources, and upload resources to XTF-built finding aid websites.

License: Creative Commons Attribution Share Alike 4.0 International

Python 98.30% Inno Setup 1.70%
archivesspace xtf

aspace_batch_export-cleanup-upload's Introduction

ArchivesSpace Batch Exporter

Overview

This application batch exports records from ArchivesSpace in EAD, MARCXML, Container Label, or PDF. Additionally, it can run exported EAD records through a series of cleanup processes. Lastly, a user can choose to connect to an XTF-based finding aid website server to upload .xml or .pdf files to their instance of XTF.

EAD_Export_Demo

Getting Started

Dependencies

  • ArchivesSnake - Library used for interacting with the ArchivesSpace API
  • cx_Freeze - Generated the executable file
  • Inno - Generated Windows installer (for GitHub action only or local .exe generation)
  • loguru - Logging package
  • lxml - Parsing XML files for cleanup
  • paramiko - Connecting to XTF server for file upload/indexing/delete
  • PySimpleGUI - The GUI used
  • requests - Used for validating API link
  • scp - Manages client for uploading/indexing/deleting files from XTF server

Installation

For Windows Users

  1. Go to Releases and download the .exe file from the latest release.
  2. Follow the on-screen instructions.
  3. The User Manual walking you through the program and its features can be found on the Wiki page.

For Mac Users

  1. Install Python 3 on your computer. You can install python using the following link: https://www.python.org/downloads/mac-osx/
  2. Download the GitHub repo using the Code button in the top right corner of the repo, then unzip the downloaded file.
  3. Open your terminal and go to the unzipped folder. Run the command: pip3 install -r requirements.txt.
  4. After installing requirements, run the command: python3 as_xtf_GUI.py. This will start the program.
  5. The User Manual walking you through the program and its features can be found on the Wiki page.

Script Arguments

Open your console of choice and navigate to the project directory and run python3 as_xtf_GUI.py to start the program. See #Prerequisites for more info.

Prerequisites

  1. Install Python 3 on your computer. You can install python using the following link: https://www.python.org/downloads/
  2. Install packages as specified in requirements.txt
  3. Your ArchivesSpace Instance's API URL (8089), your username and password
  4. (OPTIONAL) XTF hostname URL, XTF remote path for EAD files, XTF indexer path to re-index new and/or changed files, and XTF lazy index path to update the .lazy files with appropriate permissions for rw-rw-r.

Installing

  1. Clone/Download or Fork the Master branch
  2. Set up your virtual environment using the packages as specified in requirements.txt
  3. Run as_xtf_GUI.py. This will automatically create folders and a defaults.json file at the same directory
  4. If you need to reset the defaults or rerun setup, delete the folders within the repository and defaults.json file and rerun as_xtf_GUI.py.

Testing

There are currently no unittests associated with this project.

Right now, the best way to test the program is to input resource identifiers and try uploading them to XTF. If you want to generate errors, input any string or random numbers, such as "hello world" or 42.

For UGA

For Hargrett and Russell Libraries, input the following to generate different results:

  • ms3000_2e - the biggest one, will take a long time to export and index
  • ms1170-series1
  • ms1376
  • RBRL/025/ACLU
  • RBRL/044/CFH
  • RBRL/112/JRR
  • HCTC001
  • HCTC021
  • UA97-121
  • UA20-004
  • hmap1640b55
  • hmap1792a7

You can also try using the following, which will generate more than 1 result in the Output Terminal:

  • ms1170
  • RBRL/220/ROGP

Workflow

See the User Manual for a complete walkthrough of the application

Author

  • Corey Schmidt - Project Management Librarian/Archivist at the University of Georgia Libraries

Acknowledgements:

  • Adriane Hanson - Head of Digital Stewardship at the University of Georgia Libraries
  • ArchivesSpace community
  • Kevin Cottrell - GALILEO/Library Infrastructure Systems Architect at the University of Georgia Libraries
  • PySimpleGUI
  • Shawn Kiewel
  • Tyler Brockmeyer

aspace_batch_export-cleanup-upload's People

Contributors

crugas avatar dependabot[bot] avatar kco-uga avatar schmidtster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

aspace_batch_export-cleanup-upload's Issues

Remove Hyphens from Russell XML Filename

Issue:
Add exception to remove hyphens from Russell input IDs when creating XML files.

Time:
06/11/2023

Resolution:
Add a check in as_export.py to remove hyphen(s) in the input_id specifically for Russell collections.

if type(input_id) is str and "/" in self.input_id:

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Resource Not Found - Progress Bar Doesn't Update

Issue:
When exporting files, if a resource is not found, the progress popup does not update and hangs, not allowing the user to exit out of the popup. Related to #12

Time:
20/04/2022

Resolution:
Manage export error to update progress window to reduce the number of exports by 1. Also, if possible to enable a user to exit the progress meter if it hangs.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Add Warning Popup for Export All Button

Issue:
When opening the "Export All" button, there should be a warning popup to prevent users from accidentally starting to export all the records in a repository. This should be similar to the "Are you a system Admin" warning popup when doing an export across repositories.

Time:
08/02/2023

Resolution:
Add warning popup message to user when they click "Export All" button, before it starts exporting all records from a repository.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

EAD3 Export Cleanup Option Fails

Issue:
When a user selects EAD3 as an export option and sets the Add Certainty Attribute cleanup option, the application fails. EAD3 has a unitdatestructured element that was matching in add_certainty_attr function on if 'unitdate' in child.tag. However, unitdatestructured has no text, so NoneType was occurring. https://www.loc.gov/ead/EAD3taglib/index.html#elem-unitdatestructured

Time:
24/05/2022

Resolution:
Add regex to add_certainty_attr in cleanup.py to check for 'unitdate' whole word and not a partial match.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Delete Files does not Disable Buttons when Indexing

Issue:
When using the delete feature on XTF, the Upload Files, Delete Files, and Index Changed Records buttons do not stay disabled when indexing records begins. They are disabled when running the delete command, however.

Time:
17/08/2021

Resolution:
Manage threads to keep buttons disabled until Indexing thread is finished.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Export Popup Remains Open on EAD validation error

Issue:
When exporting an EAD.xml file, if that file has an XML validation error, it causes the Export progress popup to remain open and get stuck on 0 of 1. See screenshot below.

image

Time:
30/08/2022

Resolution:
Add check for if there is a validation error to close/complete the progress popup.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Invalid filepaths in defaults.json Cause Crash

Issue:
When a user has an invalid filepath in their defaults.json file, when the user goes to access that path by either using the Open Folder buttons or Upload to XTF button (pulling from clean_eads folder), the app will crash with the logs saying clean_eads folder is not a valid directory.

Time:
14/09/2023

Resolution:
Add a check in defaults_setup.py to check that the filepaths in defaults.json file are valid and either do a complete reset of the defaults file or reset those filepaths in particular. The former is easier from a coding standpoint, the latter is easier from a user's standpoint so they don't have to fill in all their defaults again.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Changing Login Credentials Doesn't Update Within GUI


name: Changing Login Credentials Doesn't Update Within GUI
about: Changing login settings from within GUI
labels: 'bug'

Is your proposal related to a problem?

When changing login settings from within the GUI (after login popups), the credentials don't actually update. I changed sclfind-dev to sclfind but it still uploaded a file and indexed to sclfind-dev.

Describe the solution you'd like

Default settings and variables should be updated after resetting the login credentials.

Describe alternatives you've considered

Additional context

Add resizeable=True to Main GUI Window

Issue:
When users magnify their desktop text/screen or have smaller screen sizes than 1920x1080, the GUI does not fit within the screen and some parts are not visible.

Time:
06/12/2022

Resolution:
Add resizable=True to main GUI window to allow users to make the GUI window fullscreen/adjustable within their screen. This will not adjust the text/button/box sizes, but it should be a good first step to making it more adjustable.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Screenshot??

Hi!

Was looking at your code and your GUI code looks like you've got a heck of a GUI. I see a lot of stuff there. I bet it looks great, but I can't tell.

I've found that repos that have a screenshot in their readme.md file get a LOT more visitors / people that will try the code than those without.

Adding one is simple if you paste an image into one of these GitHub issues and then paste the code GitHub generates into the readme.md file. This way you don't have to upload an image to your repo.

I would really like to see what you've made.

attributes["label"] key not found

Issue:
When the Remove Archivists' Toolkit IDs cleanup option is enabled, for some EAD exports, an error is generated, saying:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "threading.py", line 954, in _bootstrap_inner
  File "threading.py", line 892, in run
  File "as_xtf_GUI.py", line 818, in get_eads
  File "cleanup.py", line 422, in cleanup_eads
  File "cleanup.py", line 365, in clean_suite
  File "cleanup.py", line 252, in remove_at_leftovers
  File "src\lxml\etree.pyx", line 2479, in lxml.etree._Attrib.__getitem__
KeyError: 'label'

Time:
10/05/2021

Resolution:
Check for "label" key in attributes and if not found, then give generic error that regex did not match for 's attributes.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Progress Bar Doesn't Disappear If Export Error Occurs

Issue:
When exporting files, if an export error occurs, the progress popup does not update and hangs, not allowing the user to exit out of the popup.

Time:
15/09/2021

Resolution:
Manage export error to update progress window to reduce the number of exports by 1. Also, if possible to enable a user to exit the progress meter if it hangs.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Integrate File Permission Updates into Master Branch - 1 release per update

Issue:
Integrate file permission updates into the master branch - delete UGA_Version branch. Allow this as an option in the XTF Options window. This will mean only 1 release per version (as opposed to a "Regular" and "UGA Version" for every release).

Time:
24/05/2022

Resolution:
Add checkbox option in XTF Options popup to change behavior to allow/disallow changing file permissions.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

ASpace Login Credentials Save Empty Strings and Crash

Issue:
When a user tries to change their ASpace login credentials and hits Save and Close, the warning popup will occur. If a user hits the exit button after that, the username and password will be empty strings and the program will crash upon exporting anything.

Time:
28/07/2020

Resolution:
Don't let the Save and Close button save empty username and password fields or make the original login credentials appear when opening popup.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Incorrect ASpace API URL Causes Crash

Issue:
When logging into ArchivesSpace in the ArchivesSpace login popup box, if a user enters an invalid API URL, the program will crash.

Time:
15/09/2021

Resolution:
Manage exception to invalid API URL and provide feedback to user to correct the API URL input.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

MARCXML and PDF Option to Open Output not Working

Issue:
When a user checks the "Open output folder on export" option for MARCXML and PDF exports, after the export(s) is completed, the output folder does not open. This is caused by the threads being closed after the Window.Read() is done, so "OPEN_MARCXML_DEST" and "OPEN_PDF_DEST" are not read as an actionable event.

Time:
24/05/2022

Resolution:
Add check to see if option is True in defaults.json file after the MARCXML and PDF threads complete.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Add a "Remove Archon IDs" EAD Export Cleanup Option


name: Add a "Remove Archon IDs" EAD Export Cleanup Option
about: Finding Archon internal IDs in EAD exports
labels: 'enhancement'

Is your proposal related to a problem?

Users can export EAD's with Archon IDs linked in the EAD XML file, which right now the app only searched for Archivists' Toolkit IDs. This happens when a user chooses to export EADs with Include Unpublished Components option set to true in EAD Export Options menu. The result is related to issue #7 , where Archon internal IDs look like this:

<unitid audience="internal" identifier="#######" type="Archon Instance::COLLECTION">#######</unitid>

Describe the solution you'd like

Add an EAD Export cleanup option like, "Remove Archon IDs", as well as fixing the issue in issue #7

Describe alternatives you've considered

Removing all legacy IDs (AT and Archon included). Might be better to separate them, though.

Additional context

ArchivesSpace Export Timeout Error

Issue:
User was exporting a large EAD record and ArchivesSpace timed-out. Caused program to crash.

Time:
27/07/2020

Resolution:
Build in ArchivesSpace export error handling when export times out.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Add Help Links to Upload Files and Delete Files Popups for XTF

Issue:
When opening the "Upload Files" or "Delete Files" popup, they should have a "Help" link just like the other popups that link to the User Manual.

Time:
06/12/2022

Resolution:
Add Help links to "Upload Files" and "Delete Files" popup to appropriate popups.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Check for Other Files in clean_eads Directory

Issue:
When an unintended directory sneaks into the clean_eads directory, the app was giving an error and couldn't delete any files older than 2 months as per the checks for because the rogue directory couldn't be deleted using os.remove (only os.rmdir works for that).

Time:
06/12/2022

Resolution:
Add a check to see if the file in os.listdir in clean_eads is not a directory and if not, proceed with deleting files greater than 2 months old. Otherwise, skip the directory and leave it there until the user wants to delete or move it. Also, add a check when using the clear_exports() function to log when a rogue directory is in one of the source_<> or clean_eads directories.

Issue tracker:

  • Looked through the User Manual and found no solutions
  • Found no closed or open issues on the Issues page
  • Have downloaded and running the latest version of the program

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.