Coder Social home page Coder Social logo

legalizeadulthood / manx Goto Github PK

View Code? Open in Web Editor NEW
6.0 3.0 2.0 3.36 MB

Source for the manx vintage computer documentation database.

Home Page: http://manx-docs.org

License: GNU General Public License v2.0

PHP 95.59% CSS 0.92% JavaScript 3.33% HTML 0.17%

manx's Introduction

Travis Build Status Coveralls

manx

Source for the manx vintage computer documentation database.

manx's People

Contributors

dependabot[bot] avatar legalizeadulthood avatar smrhoney avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

manx's Issues

Create relationship table between pub and pub_history

pub references pub_history and pub_history references pub.

This creates an awkward situation when you add a publication, you first add to the pub table and get the id for that and add to the pub_history table, then you have to update the pub table again to reference the new pub_history row.

Fix this by creating a table that explicitly links publications to their edit history.

The pub table would have it's pub_history reference removed. The new table would be used instead to obtain the current pub_history entry for a pub.

Adding a publication would be:

  • add a pub row
  • add a pub_history row using pub_id
  • add a pub_revision row using pub_id, pub_history_id

Merge two publications

Sometimes two publications are entered that are duplicates.

Merge the two publications into a single publication and join all the copies together.

Allow the user to specify a source chain for a document copy

In the URL wizard, allow the user to specify the soruce chain for the digital copy, similar to the way live concert recordings specify a source chain for the resulting digital audio file.

Examples:
Original > HP Scanjet 4p 300dpi monochrome TIFF > ImageMagick 6.6.9-7 PDF > pdftk 1.44 PDF
Original > Photocopy > HP Scanjet 4p 300dpi monochrome TIFF > ImageMagick 6.6.9-7 PDF > pdftk 1.44 PDF

Extract this from the PDF metadata if possible; otherwise provide UI for entering it as above text.

SSL is not enforced

If I strip the https and replace it with http after I'm logged in, manx still lets me do everything.

Instead it should immediately redirect back to https

Browse new bitsavers/ChiClassicComp entries by directory

When working on incorporating new documents into manx from the IndexByDate.txt on bitsavers/chiclassiccomp, you can currently sort them by date in that file or by path.

However, the path form is inconvenient because you have to navigate clumsily around lots of pages to find the ones in which you're interested.

Instead, present the paths as a directory structure so you can drill down naturally to the part that you want.

Pick file format from a dropdown

It's unclear what file formats are available in the URL wizard. Replace the textbox that is filled from a table based on filename extension with a dropdown that lets you pick from one of the known formats or create a new format.

Selecting new format exposes a field group named "Format" with fields "Format" and "File extension"

Create pub_format table

Replace text strings of formats in copy and format_extension tables with identifier that is primary key into pub_format table.

Users want spell check on entered input

When typing in a search term or metadata, users want their manually entered input spell checked. They would like to see the ability to select a word flagged as misspelled and see a list of suggested corrections to pick from.

Find a spell-check service and consume it.

Provide spell checking on the keyword field on the search page.

Provide spell checking on all the manually editable text fields on the url wizard page.

Redirect to URLs

For online documents, urls should be given that redirect through the PHP application. The PHP application attempts to verify that the resource is located at the URL and then redirects the user to that URL.

For HTTP URLs, this implies doing a HEAD and then a redirect header. I'm not sure how this would work for FTP URLs, perhaps the redirect would be done without any attempt to verify.

For URLs where the HTTP HEAD request fails, log them in the database and mark the document as possibly offline.

For offline copies with mirrors, check mirror copies and redirect to those in rank order.

For copies previously marked offline, display the last check date in the details view and a warning that the copy may be unavailable.

Delete a mirror

Create the ability to delete an existing mirror.

Administrator would select a site and a mirror. The proposed mirror would be validated against the site in order to determine applicability.

Recognize moved files through WhatsNew.txt

Sometimes files get moved on bitsavers (or on ChiClassicComp, which also has an IndexByDate.txt). Process the IndexByDate.txt file looking for filenames that remained the same, but which changed directory locations. Fetch the MD5 of the file at the new location and if it matches the existing entry for a copy at the old location, then adjust the copy to reside at the new location.

Do this processing in a cron job script which is run periodically.

Add ftp.digital.com mirror copies

Some files previously at ftp.digital.com mirrored at different locations; add them as new copies for existing docs. None of the sites is a complete mirror for the corresponding site of the documents, so add them as copies not mirrors.

See attached text file for URLs.

Export/Import data as JSON

An administrative page should allow data to be exported and imported as JSON. This would allow the data to be backed up in a form more convenient than a SQL table dump which also dumps SQL to create the tables as well as the data.

IndexByDate.txt should be fetched in a cron job

Fetching the IndexByDate.txt file can be a lengthy operation and cause a PHP page render timeout.

Instead of checking whether or not to fetch the file every time the corresponding IndexByDate page is instantiated, set up a cron job to check for updating the IndexByDate file once an hour for sites with IndexByDate files.

Delete copy for publication

Create the ability to delete online copies of an existing publication.

Existing URLs would be validated and URLs that didn't match data in the database would be shown as candidates for deletion.

Allow the user to review extracted metadata from the PDF file in the URL wizard

When a PDF file is supplied to the URL wizard, extract metadata from the PDF file and use this to populate fields in the URL wizard.

Metadata is extracted from the metadata in a PDF file as shown by pdftk's dump_data command.

Potential fields that can be filled in:
pub_history.ph_title,
pub_history.ph_keywords,
pub_history.ph_abstract,
copy.notes,
copy.credits

sequential database modifications should be inside a transaction

Some encapsulated database operations in ManxDatabase perform more than one SQL statement to modify the database. These modifications should be encapsulated inside a transaction to ensure that the database remains consistent when being updated by multiple concurrent users.

Merge duplicate entries

Create a way to merge two duplicate document entries into a single entry, with one of the entries marked for redirect to the merged entry.

Delete duplicate publications

14,18470 is a duplicate of 14,17570
14,18471 is a duplciate of 14,17569
5,18496 is a duplicate of 5,17432

change the duplicate to an additional online copy of original publications, or eliminate duplicate entirely

Automatically create entries from IndexByDate.txt

Currently, files can only be added through human curation. This means that there is an ever-growing backlog of files from bitsavers that haven't been added to the database.

Write a cron script that fetches IndexByDate.txt and processes a certain number of entries every night. This involves automatically creating Company records for unrecognized companies and can possibly introduce duplicate entries for existing documents because their part numbers and dates may not be correctly deducible from the filename of the document.

Therefore, there should be in place a mechanism for a curator to resolve duplicates (see #1) and a mechanism for manually editing entries to correct errors. These mechanisms should be in place before automated IndexByDate.txt processing is performed.

Lists of maintainers/contacts useful for subsets of the index

It would be useful if subsets of the index (by manufacturer, probably) kept lists of people who were a) responsible for maintenance of that part of the index, and b) interested in being informed when a change occurred to that part of the index.

Point a) would allow casual site visitors to contact the relevant people directly if an error or omission were spotted.

Point b) would allow the site to automatically inform people when changes (corrections, additions etc.) occurred to a portion of the index with which they were registered.

Create user roles

Create two user roles: administrator and contributor

Administrators can change anything.

Contributors can run the URL Wizard and see the list of sites and mirrors, but can't edit any existing data.

RSS feed doesn't include updates

When new copies are added, or copies of known documents are added, they aren't included in the RSS feed.

Enhance the RSS feed to include these changes as well as new documents that were previously unknown to manx.

Add a mirror

Create the ability to add a mirror of a known site.

Administrator would enter a URL that would be validated and site contents analyzed for applicability.

Purge duplicate copies

The copy table contains duplicate URLs:

mysql> select count(distinct url) from copy;
10851
mysql> select count(*) from copy;
10926

purge the duplicates

Should be able to specify supersedes and superseded by

A single document may be in the middle of two other known documents in the supersession chain.

Currently you can only specify one of these relationships; it should be possible to specify both.

If the new document is interposed between two documents that are already related in the supersession chain, this will itnroduce some redundant relationships, but a separate supersession editor should be created to handle that situation.

Migrate to an ORM library?

Instead of hand-writing SQL and a hand-written database abstraction layer (IManxDatabase), consider migrating to an ORM library package such as propel.

Change MD5 processing

When adding documents, we currently fetch the document and compute the MD5 from the fetched document for insertion into the table.

However, MD5 processing on large documents fetched from slow links can cause the web server to think that PHP has crashed or hung and will timeout the response.

Instead, always insert a blank MD5 for new documents and create a cronjob script to compute the missing MD5s for documents.

Eliminate duplicate URL prefix

Every document stored on a site has a URL with a common prefix. The site table already lists a copy_base URL. This field is currently unused by the details page, but should be used as the prefix of the URL for an online copy of the publication.

Edit copy for publication

Create the ability to edit known copies of an existing publication.

Administrator would edit a URL that would be validated and MD5 updated.

Approve or reject submission

Users with the editor role can approve submissions made by others. A list of unapproved submissions is displayed to an editor. The editor selects an item from the list and it is assigned to the editor. The details of the selected item are displayed in an editable form. The editor can approve, reject, defer or decline the item. Approving the item moves it into the publicly visible database and removes it from the approval queue. Rejecting the item sends email to the original contributor with comments from the editor about why the submission was rejected. The user can resubmit with corrections for approval later. Deferring the item leaves it assigned to the editor and in the approval queue, but makes no changes to the item. Declining the item returns it to the approval queue and removes its assignment to the editor.

The defer/decline mechanism allows an editor to "claim" an item for approval, so that other editors won't be attempting to review the same information. Once an item has been claimed by an editor, it is no longer shown in the queue of submissions to other editors.

Users want the sort name for a company automatically populated

When entering fields for a new company, the sort name should be lower-case only and is most often just the short name for the company with punctuation and upper casing removed. Automatically populate this field with a filtered version of the company short name and stop autopopulating as soon as the user manually edits this field.

Edit a mirror

Create the ability to edit an existing mirror.

Administrator would enter a URL that would be validated and mirror contents analyzed for applicability.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.