The manx from legalizeadulthood

Change project links from codeplex to github

Change links in the code to the project and issue tracker from codeplex to github

Use PHPUnit's expectOutputString for render tests

Create relationship table between pub and pub_history

pub references pub_history and pub_history references pub.

This creates an awkward situation when you add a publication, you first add to the pub table and get the id for that and add to the pub_history table, then you have to update the pub table again to reference the new pub_history row.

Fix this by creating a table that explicitly links publications to their edit history.

The pub table would have it's pub_history reference removed. The new table would be used instead to obtain the current pub_history entry for a pub.

Adding a publication would be:

add a pub row
add a pub_history row using pub_id
add a pub_revision row using pub_id, pub_history_id

Measure code coverage from tests

Use php-coveralls to measure coverage in travis

Merge two publications

Sometimes two publications are entered that are duplicates.

Merge the two publications into a single publication and join all the copies together.

The user wants to specify amendments between documents

In the URL wizard, the user wants to specify which document is amended by another, similar to the way supercessions are indicated.

Allow the user to specify a source chain for a document copy

In the URL wizard, allow the user to specify the soruce chain for the digital copy, similar to the way live concert recordings specify a source chain for the resulting digital audio file.

Examples:
Original > HP Scanjet 4p 300dpi monochrome TIFF > ImageMagick 6.6.9-7 PDF > pdftk 1.44 PDF
Original > Photocopy > HP Scanjet 4p 300dpi monochrome TIFF > ImageMagick 6.6.9-7 PDF > pdftk 1.44 PDF

Extract this from the PDF metadata if possible; otherwise provide UI for entering it as above text.

SSL is not enforced

If I strip the https and replace it with http after I'm logged in, manx still lets me do everything.

Instead it should immediately redirect back to https

Browse new bitsavers/ChiClassicComp entries by directory

When working on incorporating new documents into manx from the IndexByDate.txt on bitsavers/chiclassiccomp, you can currently sort them by date in that file or by path.

However, the path form is inconvenient because you have to navigate clumsily around lots of pages to find the ones in which you're interested.

Instead, present the paths as a directory structure so you can drill down naturally to the part that you want.

Pick file format from a dropdown

It's unclear what file formats are available in the URL wizard. Replace the textbox that is filled from a table based on filename extension with a dropdown that lets you pick from one of the known formats or create a new format.

Selecting new format exposes a field group named "Format" with fields "Format" and "File extension"

The user doesn't want to enter data repetitively into the URL Wizard

Its annoying to have to type the same input into manually entered fields.

For fields that were not populated by the URL wizard, remember user entered responses on a per-user basis. Perform a dynamic query as the user types to autocomplete these hand-entered fields.

Create pub_format table

Replace text strings of formats in copy and format_extension tables with identifier that is primary key into pub_format table.

Users want spell check on entered input

When typing in a search term or metadata, users want their manually entered input spell checked. They would like to see the ability to select a word flagged as misspelled and see a list of suggested corrections to pick from.

Find a spell-check service and consume it.

Provide spell checking on the keyword field on the search page.

Provide spell checking on all the manually editable text fields on the url wizard page.

Redirect to URLs

For online documents, urls should be given that redirect through the PHP application. The PHP application attempts to verify that the resource is located at the URL and then redirects the user to that URL.

For HTTP URLs, this implies doing a HEAD and then a redirect header. I'm not sure how this would work for FTP URLs, perhaps the redirect would be done without any attempt to verify.

For URLs where the HTTP HEAD request fails, log them in the database and mark the document as possibly offline.

For offline copies with mirrors, check mirror copies and redirect to those in rank order.

For copies previously marked offline, display the last check date in the details view and a warning that the copy may be unavailable.

Delete a mirror

Create the ability to delete an existing mirror.

Administrator would select a site and a mirror. The proposed mirror would be validated against the site in order to determine applicability.

Recognize moved files through WhatsNew.txt

Sometimes files get moved on bitsavers (or on ChiClassicComp, which also has an IndexByDate.txt). Process the IndexByDate.txt file looking for filenames that remained the same, but which changed directory locations. Fetch the MD5 of the file at the new location and if it matches the existing entry for a copy at the old location, then adjust the copy to reside at the new location.

Do this processing in a cron job script which is run periodically.

Enter SGI manuals from techpubs

Use the URL wizard to add the documents from techpubs.sgi.com

Add ftp.digital.com mirror copies

Some files previously at ftp.digital.com mirrored at different locations; add them as new copies for existing docs. None of the sites is a complete mirror for the corresponding site of the documents, so add them as copies not mirrors.

See attached text file for URLs.

Export/Import data as JSON

An administrative page should allow data to be exported and imported as JSON. This would allow the data to be backed up in a form more convenient than a SQL table dump which also dumps SQL to create the tables as well as the data.

IndexByDate.txt should be fetched in a cron job

Fetching the IndexByDate.txt file can be a lengthy operation and cause a PHP page render timeout.

Instead of checking whether or not to fetch the file every time the corresponding IndexByDate page is instantiated, set up a cron job to check for updating the IndexByDate file once an hour for sites with IndexByDate files.

Import open issues from codeplex

Scrape the web page archive at codeplex and import the issues into github.

Delete copy for publication

Create the ability to delete online copies of an existing publication.

Existing URLs would be validated and URLs that didn't match data in the database would be shown as candidates for deletion.

Allow the user to review extracted metadata from the PDF file in the URL wizard

When a PDF file is supplied to the URL wizard, extract metadata from the PDF file and use this to populate fields in the URL wizard.

Metadata is extracted from the metadata in a PDF file as shown by pdftk's dump_data command.

Potential fields that can be filled in:
pub_history.ph_title,
pub_history.ph_keywords,
pub_history.ph_abstract,
copy.notes,
copy.credits

sequential database modifications should be inside a transaction

Some encapsulated database operations in ManxDatabase perform more than one SQL statement to modify the database. These modifications should be encapsulated inside a transaction to ensure that the database remains consistent when being updated by multiple concurrent users.

Merge duplicate entries

Create a way to merge two duplicate document entries into a single entry, with one of the entries marked for redirect to the merged entry.

Delete duplicate publications

14,18470 is a duplicate of 14,17570
14,18471 is a duplciate of 14,17569
5,18496 is a duplicate of 5,17432

change the duplicate to an additional online copy of original publications, or eliminate duplicate entirely

Automatically create entries from IndexByDate.txt

Currently, files can only be added through human curation. This means that there is an ever-growing backlog of files from bitsavers that haven't been added to the database.

Write a cron script that fetches IndexByDate.txt and processes a certain number of entries every night. This involves automatically creating Company records for unrecognized companies and can possibly introduce duplicate entries for existing documents because their part numbers and dates may not be correctly deducible from the filename of the document.

Therefore, there should be in place a mechanism for a curator to resolve duplicates (see #1) and a mechanism for manually editing entries to correct errors. These mechanisms should be in place before automated IndexByDate.txt processing is performed.

Improve the styling of abstracts

Change the label from "Text" to "Abstract". Align the label with the top of the table cell.

Redo publication searches when company changes

When you change the company dropdown to something other than "new company", the publication and supersession searches should be re-run since they are company specific

Lists of maintainers/contacts useful for subsets of the index

It would be useful if subsets of the index (by manufacturer, probably) kept lists of people who were a) responsible for maintenance of that part of the index, and b) interested in being informed when a change occurred to that part of the index.

Point a) would allow casual site visitors to contact the relevant people directly if an error or omission were spotted.

Point b) would allow the site to automatically inform people when changes (corrections, additions etc.) occurred to a portion of the index with which they were registered.

Create user roles

Create two user roles: administrator and contributor

Administrators can change anything.

Contributors can run the URL Wizard and see the list of sites and mirrors, but can't edit any existing data.

RSS feed doesn't include updates

When new copies are added, or copies of known documents are added, they aren't included in the RSS feed.

Enhance the RSS feed to include these changes as well as new documents that were previously unknown to manx.

Add a mirror

Create the ability to add a mirror of a known site.

Administrator would enter a URL that would be validated and site contents analyzed for applicability.

Create foreign key relationships

Modify the schema to contain foreign key relationships between tables to make the relationships explicit

Redirect to login page when web service times out

When the json web service times out the cookie, we need to detect that and redirect to the login page.

Try to keep all our existing information intact in the fields.

Purge duplicate copies

The copy table contains duplicate URLs:

mysql> select count(distinct url) from copy;
10851
mysql> select count(*) from copy;
10926

purge the duplicates

Should be able to specify supersedes and superseded by

A single document may be in the middle of two other known documents in the supersession chain.

Currently you can only specify one of these relationships; it should be possible to specify both.

If the new document is interposed between two documents that are already related in the supersession chain, this will itnroduce some redundant relationships, but a separate supersession editor should be created to handle that situation.

Migrate to an ORM library?

Instead of hand-writing SQL and a hand-written database abstraction layer (IManxDatabase), consider migrating to an ORM library package such as propel.

Add teletype manuals

Add the teletype manuals listed here:

http://www.navy-radio.com/manuals-ttycorp.htm

Extract default data for sgi techpubs

Enhance the url wizard to know how to extract information for an SGI techpubs document.

Organize Antonio Carlini's scans

Organize the docs into folders according to mfr, like bitsavers.

Update URLs of existing documents.

Add new documents.

Add citation support to URL Wizard

Add the ability to select one or more publications cited by the publication being added.

Change MD5 processing

When adding documents, we currently fetch the document and compute the MD5 from the fetched document for insertion into the table.

However, MD5 processing on large documents fetched from slow links can cause the web server to think that PHP has crashed or hung and will timeout the response.

Instead, always insert a blank MD5 for new documents and create a cronjob script to compute the missing MD5s for documents.

Eliminate duplicate URL prefix

Every document stored on a site has a URL with a common prefix. The site table already lists a copy_base URL. This field is currently unused by the details page, but should be used as the prefix of the URL for an online copy of the publication.

Remove %xx encoding from human readable text on links

Example

http://www.computer.museum.uq.edu.au/pdf/EK-VT55E-TM-001%20VT55-E%2C%20F%2C%20H%2C%20J%20DECgraphic%20Scope%20User's%20Manual.pdf

The link should be rendered as

http://www.computer.museum.uq.edu.au/pdf/EK-VT55E-TM-001 VT55-E, F, H, J DECgraphic Scope User's Manual.pdf

Edit copy for publication

Create the ability to edit known copies of an existing publication.

Administrator would edit a URL that would be validated and MD5 updated.

Approve or reject submission

Users with the editor role can approve submissions made by others. A list of unapproved submissions is displayed to an editor. The editor selects an item from the list and it is assigned to the editor. The details of the selected item are displayed in an editable form. The editor can approve, reject, defer or decline the item. Approving the item moves it into the publicly visible database and removes it from the approval queue. Rejecting the item sends email to the original contributor with comments from the editor about why the submission was rejected. The user can resubmit with corrections for approval later. Deferring the item leaves it assigned to the editor and in the approval queue, but makes no changes to the item. Declining the item returns it to the approval queue and removes its assignment to the editor.

The defer/decline mechanism allows an editor to "claim" an item for approval, so that other editors won't be attempting to review the same information. Once an item has been claimed by an editor, it is no longer shown in the queue of submissions to other editors.

Allow user to keep query parameters when redirecting to login

When being redirected to login from the bitsavers page due to session timeout, the query parameters (sort order and starting page) are lost.

Users want the sort name for a company automatically populated

When entering fields for a new company, the sort name should be lower-case only and is most often just the short name for the company with punctuation and upper casing removed. Automatically populate this field with a filtered version of the company short name and stop autopopulating as soon as the user manually edits this field.

Edit a mirror

Create the ability to edit an existing mirror.

Administrator would enter a URL that would be validated and mirror contents analyzed for applicability.

legalizeadulthood / manx Goto Github PK

manx's Introduction

manx

manx's People

Contributors

Stargazers

Watchers

Forkers

manx's Issues

Recommend Projects

Recommend Topics

Recommend Org