manx
Source for the manx vintage computer documentation database.
Source for the manx vintage computer documentation database.
Home Page: http://manx-docs.org
License: GNU General Public License v2.0
Change links in the code to the project and issue tracker from codeplex to github
See Testing Output
pub references pub_history and pub_history references pub.
This creates an awkward situation when you add a publication, you first add to the pub table and get the id for that and add to the pub_history table, then you have to update the pub table again to reference the new pub_history row.
Fix this by creating a table that explicitly links publications to their edit history.
The pub table would have it's pub_history reference removed. The new table would be used instead to obtain the current pub_history entry for a pub.
Adding a publication would be:
Use php-coveralls to measure coverage in travis
Sometimes two publications are entered that are duplicates.
Merge the two publications into a single publication and join all the copies together.
In the URL wizard, the user wants to specify which document is amended by another, similar to the way supercessions are indicated.
In the URL wizard, allow the user to specify the soruce chain for the digital copy, similar to the way live concert recordings specify a source chain for the resulting digital audio file.
Examples:
Original > HP Scanjet 4p 300dpi monochrome TIFF > ImageMagick 6.6.9-7 PDF > pdftk 1.44 PDF
Original > Photocopy > HP Scanjet 4p 300dpi monochrome TIFF > ImageMagick 6.6.9-7 PDF > pdftk 1.44 PDF
Extract this from the PDF metadata if possible; otherwise provide UI for entering it as above text.
If I strip the https and replace it with http after I'm logged in, manx still lets me do everything.
Instead it should immediately redirect back to https
When working on incorporating new documents into manx from the IndexByDate.txt on bitsavers/chiclassiccomp, you can currently sort them by date in that file or by path.
However, the path form is inconvenient because you have to navigate clumsily around lots of pages to find the ones in which you're interested.
Instead, present the paths as a directory structure so you can drill down naturally to the part that you want.
It's unclear what file formats are available in the URL wizard. Replace the textbox that is filled from a table based on filename extension with a dropdown that lets you pick from one of the known formats or create a new format.
Selecting new format exposes a field group named "Format" with fields "Format" and "File extension"
Its annoying to have to type the same input into manually entered fields.
For fields that were not populated by the URL wizard, remember user entered responses on a per-user basis. Perform a dynamic query as the user types to autocomplete these hand-entered fields.
Replace text strings of formats in copy and format_extension tables with identifier that is primary key into pub_format table.
When typing in a search term or metadata, users want their manually entered input spell checked. They would like to see the ability to select a word flagged as misspelled and see a list of suggested corrections to pick from.
Find a spell-check service and consume it.
Provide spell checking on the keyword field on the search page.
Provide spell checking on all the manually editable text fields on the url wizard page.
For online documents, urls should be given that redirect through the PHP application. The PHP application attempts to verify that the resource is located at the URL and then redirects the user to that URL.
For HTTP URLs, this implies doing a HEAD and then a redirect header. I'm not sure how this would work for FTP URLs, perhaps the redirect would be done without any attempt to verify.
For URLs where the HTTP HEAD request fails, log them in the database and mark the document as possibly offline.
For offline copies with mirrors, check mirror copies and redirect to those in rank order.
For copies previously marked offline, display the last check date in the details view and a warning that the copy may be unavailable.
Create the ability to delete an existing mirror.
Administrator would select a site and a mirror. The proposed mirror would be validated against the site in order to determine applicability.
Sometimes files get moved on bitsavers (or on ChiClassicComp, which also has an IndexByDate.txt). Process the IndexByDate.txt file looking for filenames that remained the same, but which changed directory locations. Fetch the MD5 of the file at the new location and if it matches the existing entry for a copy at the old location, then adjust the copy to reside at the new location.
Do this processing in a cron job script which is run periodically.
Use the URL wizard to add the documents from techpubs.sgi.com
Some files previously at ftp.digital.com mirrored at different locations; add them as new copies for existing docs. None of the sites is a complete mirror for the corresponding site of the documents, so add them as copies not mirrors.
See attached text file for URLs.
An administrative page should allow data to be exported and imported as JSON. This would allow the data to be backed up in a form more convenient than a SQL table dump which also dumps SQL to create the tables as well as the data.
Fetching the IndexByDate.txt file can be a lengthy operation and cause a PHP page render timeout.
Instead of checking whether or not to fetch the file every time the corresponding IndexByDate page is instantiated, set up a cron job to check for updating the IndexByDate file once an hour for sites with IndexByDate files.
Scrape the web page archive at codeplex and import the issues into github.
Create the ability to delete online copies of an existing publication.
Existing URLs would be validated and URLs that didn't match data in the database would be shown as candidates for deletion.
When a PDF file is supplied to the URL wizard, extract metadata from the PDF file and use this to populate fields in the URL wizard.
Metadata is extracted from the metadata in a PDF file as shown by pdftk's dump_data command.
Potential fields that can be filled in:
pub_history.ph_title,
pub_history.ph_keywords,
pub_history.ph_abstract,
copy.notes,
copy.credits
Some encapsulated database operations in ManxDatabase perform more than one SQL statement to modify the database. These modifications should be encapsulated inside a transaction to ensure that the database remains consistent when being updated by multiple concurrent users.
Create a way to merge two duplicate document entries into a single entry, with one of the entries marked for redirect to the merged entry.
14,18470 is a duplicate of 14,17570
14,18471 is a duplciate of 14,17569
5,18496 is a duplicate of 5,17432
change the duplicate to an additional online copy of original publications, or eliminate duplicate entirely
Currently, files can only be added through human curation. This means that there is an ever-growing backlog of files from bitsavers that haven't been added to the database.
Write a cron script that fetches IndexByDate.txt and processes a certain number of entries every night. This involves automatically creating Company records for unrecognized companies and can possibly introduce duplicate entries for existing documents because their part numbers and dates may not be correctly deducible from the filename of the document.
Therefore, there should be in place a mechanism for a curator to resolve duplicates (see #1) and a mechanism for manually editing entries to correct errors. These mechanisms should be in place before automated IndexByDate.txt processing is performed.
Change the label from "Text" to "Abstract". Align the label with the top of the table cell.
When you change the company dropdown to something other than "new company", the publication and supersession searches should be re-run since they are company specific
It would be useful if subsets of the index (by manufacturer, probably) kept lists of people who were a) responsible for maintenance of that part of the index, and b) interested in being informed when a change occurred to that part of the index.
Point a) would allow casual site visitors to contact the relevant people directly if an error or omission were spotted.
Point b) would allow the site to automatically inform people when changes (corrections, additions etc.) occurred to a portion of the index with which they were registered.
Create two user roles: administrator and contributor
Administrators can change anything.
Contributors can run the URL Wizard and see the list of sites and mirrors, but can't edit any existing data.
When new copies are added, or copies of known documents are added, they aren't included in the RSS feed.
Enhance the RSS feed to include these changes as well as new documents that were previously unknown to manx.
Create the ability to add a mirror of a known site.
Administrator would enter a URL that would be validated and site contents analyzed for applicability.
Modify the schema to contain foreign key relationships between tables to make the relationships explicit
When the json web service times out the cookie, we need to detect that and redirect to the login page.
Try to keep all our existing information intact in the fields.
The copy table contains duplicate URLs:
mysql> select count(distinct url) from copy;
10851
mysql> select count(*) from copy;
10926
purge the duplicates
A single document may be in the middle of two other known documents in the supersession chain.
Currently you can only specify one of these relationships; it should be possible to specify both.
If the new document is interposed between two documents that are already related in the supersession chain, this will itnroduce some redundant relationships, but a separate supersession editor should be created to handle that situation.
Instead of hand-writing SQL and a hand-written database abstraction layer (IManxDatabase
), consider migrating to an ORM library package such as propel.
Add the teletype manuals listed here:
Enhance the url wizard to know how to extract information for an SGI techpubs document.
Organize the docs into folders according to mfr, like bitsavers.
Update URLs of existing documents.
Add new documents.
Add the ability to select one or more publications cited by the publication being added.
When adding documents, we currently fetch the document and compute the MD5 from the fetched document for insertion into the table.
However, MD5 processing on large documents fetched from slow links can cause the web server to think that PHP has crashed or hung and will timeout the response.
Instead, always insert a blank MD5 for new documents and create a cronjob script to compute the missing MD5s for documents.
Every document stored on a site has a URL with a common prefix. The site table already lists a copy_base URL. This field is currently unused by the details page, but should be used as the prefix of the URL for an online copy of the publication.
Create the ability to edit known copies of an existing publication.
Administrator would edit a URL that would be validated and MD5 updated.
Users with the editor role can approve submissions made by others. A list of unapproved submissions is displayed to an editor. The editor selects an item from the list and it is assigned to the editor. The details of the selected item are displayed in an editable form. The editor can approve, reject, defer or decline the item. Approving the item moves it into the publicly visible database and removes it from the approval queue. Rejecting the item sends email to the original contributor with comments from the editor about why the submission was rejected. The user can resubmit with corrections for approval later. Deferring the item leaves it assigned to the editor and in the approval queue, but makes no changes to the item. Declining the item returns it to the approval queue and removes its assignment to the editor.
The defer/decline mechanism allows an editor to "claim" an item for approval, so that other editors won't be attempting to review the same information. Once an item has been claimed by an editor, it is no longer shown in the queue of submissions to other editors.
When being redirected to login from the bitsavers page due to session timeout, the query parameters (sort order and starting page) are lost.
When entering fields for a new company, the sort name should be lower-case only and is most often just the short name for the company with punctuation and upper casing removed. Automatically populate this field with a filtered version of the company short name and stop autopopulating as soon as the user manually edits this field.
Create the ability to edit an existing mirror.
Administrator would enter a URL that would be validated and mirror contents analyzed for applicability.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.