Coder Social home page Coder Social logo

princeton-cdh / ppa-django Goto Github PK

View Code? Open in Web Editor NEW
4.0 6.0 2.0 30.21 MB

Princeton Prosody Archive v3.x - Python/Django web application

Home Page: http://prosody.princeton.edu

License: Apache License 2.0

Python 78.68% HTML 7.16% CSS 0.79% JavaScript 5.20% TypeScript 2.06% SCSS 6.11%
python django hathitrust digital-humanities solr

ppa-django's Introduction

ppa-django

Django web application for Princeton Prosody Archive version 3.x.

Code and architecture documentation for the current release available at https://princeton-cdh.github.io/ppa-django/.

DOI: 10.5281/zenodo.2400705

Unit test status

Code coverage

CodeFactor

code style Black

imports: isort

This repo uses git-flow conventions; main contains the most recent release, and work in progress will be on the develop branch. Pull requests should be made against develop.

Python 3.11 / Django 5.0 / Node 18.12 / Postgresql 15 / Solr 8

Development instructions

Initial setup and installation:

  • recommended: create and activate a python 3.11 virtual environment, perhaps with virtualenv or venv
  • Use pip to install required python dependencies:

    pip install -r requirements.txt
    pip install -r dev-requirements.txt
  • Copy sample local settings and configure for your environment:

    cp ppa/settings/local_settings.py.sample ppa/settings/local_settings.py
  • Create a database, configure in local settings in the DATABASES dictionary, change SECRET_KEY, and run migrations:

    python manage.py migrate
  • Create a new Solr configset from the files in solr_conf :

    cp -r solr_conf /path/to/solr/server/solr/configsets/ppa
    chown solr:solr -R /path/to/solr/server/solr/configsets/ppa

    and configure SOLR_CONNECTIONS in local settings with your preferred core/collection name and the configset name you created.

    See developer notes for setup instructions for using docker with solr:8.4 image.

  • Bulk import (provisional): requires a local copy of HathiTrust data as pairtree provided by rsync. Configure the path in localsettings.py and then run:

    python manage.py hathi_import
  • Then index the imported content into Solr:

    python manage.py index -i work
    python manage.py index_pages

Frontend development setup:

This project uses the Fomantic UI library in addition to custom styles and javascript. You need to compile static assets before running the server.

  • To build all styles and js for production, including fomantic UI:

    npm install
    npm run build

Alternatively, you can rebuild just the custom files or fomantic independently. This is useful if you make small changes and need to recompile once:

npm run build:qa # just the custom files, with sourcemaps
npm run build:prod # just the custom files, no sourcemaps
npm run build:semantic # just fomantic UI

Finally, you can run a development server with hot reload if you'll be changing either set of assets frequently. These two processes are separate as well:

npm run dev # serve just the custom files from memory, with hot reload
npm run dev:semantic # serve just fomantic UI files and recompile on changes

Tests

Python unit tests are written with pytest but use Django fixture loading and convenience testing methods when that makes things easier. To run them, first install development requirements:

pip install -r dev-requirements.txt

To run all python unit tests, use: pytest

Some deprecation warnings for dependencies have been suppressed in pytest.ini; to see warnings, run with pytest -Wd.

Make sure you configure a test solr connection and set up an empty Solr core using the same instructions as for the development core.

Some python unit tests access rendered views, and therefore expect static files to be compiled; see "Frontend development setup" above for how to do this.

In a CI context, we use a fake webpack loader backend that ignores missing assets.

Javascript unit tests are written with Jasmine and run using Karma. To run them, you can use an npm command:

npm test

Automated accessibility testing is also possible using pa11y and pa11y-ci. To run accessibility tests, start the server with python manage.py runserver and then use npm:

npm run pa11y

The accessibility tests are configured to read options from the .pa11yci.json file and look for a sitemap at localhost:8000/sitemap.xml to use to crawl the site. Additional URLs to test can be added to the urls property of the .pa11yci.json file.

Setup pre-commit hooks

If you plan to contribute to this repository, please run the following command:

pre-commit install

This will add a pre-commit hook to automatically style and clean python code with black and ruff.

Because these styling conventions were instituted after multiple releases of development on this project, git blame may not reflect the true author of a given line. In order to see a more accurate git blame execute the following command:

git blame <FILE> --ignore-revs-file .git-blame-ignore-revs

Or configure your git to always ignore styling revision commits:

git config blame.ignoreRevsFile .git-blame-ignore-revs

Documentation

Documentation is generated using sphinx To generate documentation them, first install development requirements:

pip install -r dev-requirements.txt

Then build documentation using the customized make file in the docs directory:

cd sphinx-docs
make html

To check documentation coverage, run:

make html -b coverage

This will create a file under _build/coverage/python.txt listing any python classes or methods that are not documented. Note that sphinx can only report on code coverage for files that are included in the documentation. If a new python file is created but not included in the sphinx documentation, it will be omitted.

Documentation will be built and published with GitHub Pages by a GitHub Actions workflow triggered on push to main.

The same GitHub Actions workflow will build documentation and checked documentation coverage on pull requests.

License

This project is licensed under the Apache 2.0 License.

©2019-2024 Trustees of Princeton University. Permission granted via Princeton Docket #20-3624 for distribution online under a standard Open Source license. Ownership rights transferred to Rebecca Koeser provided software is distributed online via open source.

ppa-django's People

Contributors

code-factor avatar dependabot[bot] avatar gissoo avatar jerielizabeth avatar kmcelwee avatar laurejt avatar meg-codes avatar quadrismegistus avatar rlskoeser avatar thatbudakguy avatar vineetbansal avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ppa-django's Issues

As an admin, I want a way to search and select digitized items for bulk addition to a collection so that I can efficiently organize large groups of items.

Notes for testing

  • Select a few digitized works in the admin interface and then select (lower left) from the dropdown to add them to a collection. This will take you to an intermediate page where you can select collections and then finalize the add.
  • On successful add you should be returned to the digitzed works listing with a success message and the collections should be set.
  • Try doing this by selecting all items on the page (check box in the top row) and then using the select all 5007 option in lower left and add everything to a collection. This will take several seconds, but should work.
  • Try searching the digitized works for something that returns more than 100 items (e.g. "elocution" returns 259) and add all items from the search to a collection.

Notes for development

As a user, I want to add a work to my Zotero library from the individual item page so that I can save it for research without having to go back to the list of results.

Notes for testing

You need to have Zotero installed and running, and use a browser with the Zotero plugin installed. On the list archive page, you should see a folder icon (indicating multiple items available). Click on the folder and after a delay (while it loads the metadata) you should see a dialog box that will let you select all or some of the items on the page for harvest into your Zotero library.

As an admin, I want to add and edit collection descriptions so that I can help site users understand the collection and find related materials.

Notes for testing

  • There should be a description on collections
  • You can edit it, and also add bolding and italics to fonts
  • These will display on the 'View Collection' page with formatting intact.

Notes for development

  • Needs to support for basic formatting
  • Suggest tinymce with fairly minimal set of html fields/styles
  • make sure description displays in list view with formatting

As a user, I want to filter search results by publication year or range of years so that I focus on works from a particular time period.

Notes for testing

  • placeholder text should display max/min values based on what's in the database
  • should be able to specify just a min (everything after date), just a max (everything before), or both
  • use same date in both to get items from a single year
  • entering a min greater than max should give an error message
  • entering a date outside the max/min in the db should give an error message

Mockup: homepage

testing notes

  • should be mostly responsive
  • card styles are ugly but they'll come in the next wave since they're component styles

As an admin, I want a link from the digitized work list view to HathiTrust so that I can check the contents as I curate the archive.

Notes for testing

  • View the listing of all digitized works. There should now be a second column labeled Source id as before, but that is now clickable and links to the HathiTrust url in a new tab/window depending on your browser's defaults.
  • View the change form for a particular digitized work. It should also have a similar uneditable link at the top of the form with the same functionality. (The source_url field has been suppressed since it is no longer needed and isn't intended to be editable from the admin).
  • All other fields appear as they did before (both editable and uneditable)

Notes for development

  • add a property to generate a link to Hathi using source_url and display the source_id
  • link should open in a new window (target="_blank")
  • adjust list field to make title the edit link, display source id second
  • display on the edit page as well (clickable version)

As a user, I want to filter search results by collection so that I can include or exclude groups of materials based on my interests.

Suggestions for Testing

  • add a collection
  • add a digitized work (on its edit screen) to that collection
  • the collection should appear on the main search form and allow you to use a checkbox to filter by collection

Notes for Development

  • add a solr collection field to use as a facet (consider text collection field and string collection_exact copy field)
  • add logic to reindex digitized works when they are updated (i.e. when associated with a collection)
  • modify the search form to provide a collection facet select field

As a user viewing an individual item from a keyword search, I want to see page image thumbnails and text snippets that match my search terms so I can see how many and what kind of pages match my search terms.

Notes for testing

  • when you do a keyword search on the main archive page, you should see a "view all N pages" link for any work that has more than two matching pages
  • link should take you to the detail view for that work, with a paginated list of matching pages; pages should display thumbnail image and highlighted text snippets
  • you should see your query text in a search box which you can refine and edit and resubmit

Notes for development

  • pass querystring from listview context
  • add a search form to detail view
  • update the view to search for pages in the book when there's a keyword
  • use existing pagination logic
  • refactor thumbnail and snippets as components

As a user, I want to change how my results are sorted so I can browse the results in multiple ways.

Notes for testing

Should be able to sort by:

  • relevance, but only if there is a keyword search
  • chronology (reversible)
  • alpha by title (reversible)
  • alpha by author (reversible)
    Also check:
  • Sort should persist as you page through results
  • default sort is title a-z

Note: we currently don't have a way to make relevance the default for keyword searches without overriding user's sort selection when there is a keyword search, so consider that out of scope for this feature (I'm thinking about a way to do it and we'll try to add it; feel free to create a new issue to document this).

As an admin, I want to suppress items from the site so that I can pull content that should not be included or was wrongly added as I am going through and assigning collections to archive volumes.

Notes for testing

  • digitized works now have a status field; default is public, you can set manually to suppressed
  • should see an indicator on the list view if something is public or suppressed
  • should be able to filter the list view on status
  • status should be included in CSV export
  • when you set a record to suppressed the data should be deleted from the hathitrust pairtree data so we don't actually import and index it again (not sure how you can test this; you could ask us to run the hathi import script on the source id?)
  • if you try to switch a suppressed record back to public, you should get a validation error because it's not yet supported

Notes for development

We don't want to actually delete the record from the database; we'll want to keep a stub at least, to indicate the record was removed and track the history.

  • add a status field; options public/suppressed, default to public
  • make editable in admin
  • display status in the admin list view so removed items are obvious; also configure as a filter.
  • Include status field in CSV export
  • when status is changed to suppressed, delete rsync data so it won't be re-added/indexed on a full import
  • don't allow un-suppressing items (validation? pre-save hook?)

out of scope

  • We may eventually want a bulk removal option, but consider that out of scope for now.
  • Supporting "un-suppress" logic is out of scope for now.

Snippets: base template, header and footer

testing notes

zeplin links

nav L / M / M expanded / S / S expanded
footer L / M / S

notes

  • main navigation should have the correct "pitbar" behavior borrowed from cdh-web (it hides itself when you scroll down quickly and reappears when you scroll up quickly)
  • main navigation links should all work, except for editorial which isn't in this milestone
  • main navigation should be responsive
  • if the hamburger menu is clicked on mobile, it should make the main navigation not do the "pitbar" (it should always stay until the menu is dismissed)
  • footer should have the correct items with more or less correct alignment (e.g. the version number is floated right on large screens but stacked on mobile)

dev notes

  • add all the basic meta information to the base.html template
  • add blocks for css and js, including compress tags
  • add header and footer snippets that are rendered on every page

Mockup: single volume search

testing notes

zeplin links

results within work: L / M / S

notes

  • does searching for a term within the work function as you expect it to?
  • do the search controls (e.g. pagination) function the same way they do on the archive search?
  • is it responsive?

As a user, I should not see suppressed items in search results or item display so that my results are not cluttered by items not meant to be part of the archive.

Notes for testing

  • suppressed items should not be included in the public archive search
  • suppressed items should not be included in collection counts
  • suppressed items should not be included in sitemap.xml (maybe check by looking at the xml and then suppressing the first item listed?)
  • detail page for a suppressed item should return a 410 Gone page and should not show any item details

Notes for development

  • when a record is suppressed by an admin, delete from solr index (on save, when status has changed)
  • when reindexing, do not index suppressed/removed records (maybe index data returns nothing for suppressed items?)
  • detail view should return a 410 Gone status for items that have been suppressed
  • check behavior for changing a collection name that includes removed items
  • collection counts should ignore suppressed items
  • xml sitemaps should exclude suppressed items

As an admin, I want to see the history of all edits to a digitized work, including import and updates via script, so that I can track the full history of contributions and changes to the record.

Notes for testing

  • view history for any record using the "history" button on the top right of the individual digitized work edit page
  • import script now creates a log entry when a record created via import script
  • import script now creates a log entry when records are updated via import script; message should indicate if update was forced by person running the script or triggered by a hathi last modified date

  • update import script to create admin log entries on create and update
  • make author and pub date optional in admin edit
  • unit tests

As a user viewing keyword search results, I want to see a few text snippets from the full text of a work so that I can get an idea how my search terms are used in the work.

Notes for testing

  • you should not see page images or text highlighting if you don't enter search terms
  • you should see one or two page images with text highlighting when you enter search terms; most relevant pages are displayed first; for now, matching terms will be italicized (styles will be added later based on Xinyi's design)
  • you should see a link "view all N pages" only if there are more than two matches; link goes to the item detail view for now (actual functionality will be implemented and tested under #32)

reindex script

Need to be able to reindex content in Solr independently of importing content into the database.

  • needs an option for reindexing just books or just pages
  • needs multithreading to make the full reindex faster

Mockup: collections list page

testing notes

zeplin links

collections list: L / M / S

notes

  • cards are a component style that will come in the next iteration, so they won't look much like the spec for now.
  • is it responsive?
  • do the card links work?
  • is there anything that looks out of place?

Setup js pipeline with compressor

At a minimum I'd like to be able to use ES6 features in the source, so that will require a transpiler in addition to minification.

It might be overkill, but TypeScript could be nice too.

It looks like this plugin actually handles both the scss and ES6 js together, which could be a nice drop-in solution going forward.

As an admin, I want to edit user and group permissions so I can manage project team member access within the system.

Notes for testing

  • I've customized the user list display to show more information that I hope will make it easier to manage accounts
  • There are now two groups preloaded for you: archive manager and content editor (please let me know if you want different names). An archive manager can edit digitized works; content editor can use the content management functionality (create site pages).
  • To test the group permissions and get comfortable with managing user accounts, I recommend the following:
    • Login to the admin site with your own account (should have superuser permissions)
    • Create two new test users. They should have staff permissions (allows login to the admin site) and be active but should not have superuser (allows everything). Assign one group to each user.
    • In a different browser or in an incognito window in the same browser, login as your test user and check what that user sees and is able to do in the admin site.
  • FYI: I have also set the app to create a script user account so that the import script can create log entries for when records are created and updated. This account should not be removed. If you have ideas for a better name please let me know!

load partial html for page of search results via ajax

If we do, template needs to be broken into components so that the view can specify the full page template or just the reloaded portion if the request is made via AJAX.

Also note from the Zotero documentation on exposing metadata

Websites for which metadata changes without a page reload should fire a ZoteroItemUpdated event to tell Zotero to re-detect metadata on the page. This is supported in Zotero 3.0 and later.

var ev = document.createEvent('HTMLEvents');
ev.initEvent('ZoteroItemUpdated', true, true);
document.dispatchEvent(ev);

Setup scss pipeline with compressor

  • store page-specific .scss files in app /static directories
  • include them in the template's css block
  • compress the css block on every page
  • add dev info to README

Snippet: search result pagination controls

testing notes

zeplin links

components All
search page L

notes

  • check behavior with small and large sets of search results
  • check behavior when you are on a page at the beginning, middle, and end of search results

As a user, I want to search and browse digitized volumes by keyword so that I can see what materials are in the archive.

Notes for testing

  • Simple keyword search on anything in basic metadata and full text
  • Basic display (no styles, limited formatting) provided for testing search functionality. Does not yet have a custom default sort or handle pagination.
  • Searches across book metadata and page contents, grouping all pages and book together; book should always be retrieved for display even if the only matches are in pages of the book.
  • Currently supports keyword, exact phrase (use quotes) and also boolean searching - default behavior is OR, but you can specify AND. I'm not sure if we want to preserve this behavior or not, but probably worth testing to get an idea how it works.
  • Updates 12/14:
    • results are sorted by title by default; sorted by relevance when a search term is entered
    • now has basic pagination displayed at the bottom of the page; for now, showing 50 items per page
    • now displaying the first handful of pages for each volume
    • when there is a keyword search, I'm also displaying relevance per page
    • I'm currently displaying relevance for the group, but it's always 1.0, so there's something off there but I'm not sure what yet
    • I looked up the near search syntax, if you want to try that: "grammar children"~4, where 4 is the maximum number of words apart
    • please also test searches with bad syntax to see how it's handled; I've tried e.g. "incomplete quote or incomplete boolean such as thing OR; you may be able to think of others.

  • default sort
  • pagination
  • display list of matching page numbers for testing
  • unit tests

Mockup: archive search page

testing notes

this is just an aggregator issue for the below issues - it will be marked complete when they are completed

dev notes

  • #63 (search form)
  • #59 (search result)
  • #64 (sort controls)

As an admin, I want to generate a CSV report of materials on the site so that I can do analysis with other tools such as OpenRefine to analyze collection assignment.

Notes for testing

  • Should see a download CSV button on the digitized works list page
  • Make sure it includes all the information you expect
  • Please test after adding some items to collections; test with at least one item in more than one collection

Notes for Development

  • view to export all works and all metadata fields from the database
  • Customize the Admin list view to add a link to download the data
  • Restricted to admins; possibly create a new permission for this?
  • Unit tests
  • include collections

Snippet: single search result

testing notes

zeplin links

list with results: L / M / S

notes

  • is it responsive?
  • is the important metadata there (e.g. publisher)?
  • do long titles or weird metadata distort it significantly?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.