Coder Social home page Coder Social logo

internetarchive / openlibrary Goto Github PK

View Code? Open in Web Editor NEW
4.8K 170.0 1.2K 88.04 MB

One webpage for every book ever published!

Home Page: https://openlibrary.org

License: GNU Affero General Public License v3.0

Makefile 0.08% Shell 0.73% HTML 17.34% Python 56.03% JavaScript 14.09% CSS 0.12% Dockerfile 0.04% Less 7.10% Vue 4.05% PLpgSQL 0.40% MDX 0.02%
internet-archive open-source books library-catalogue hacktoberfest hacktoberfest2021

openlibrary's Introduction

Open Library

Python Build JS Build Join the chat at https://gitter.im/theopenlibrary/Lobby Open in Gitpod

Open Library is an open, editable library catalog, building towards a web page for every book ever published.

Are you looking to get started? This is the guide you are looking for. You may wish to learn more about Google Summer of Code (GSoC)? or Hacktoberfest.

Table of Contents

Overview

Open Library is an effort started in 2006 to create "one web page for every book ever published." It provides access to many public domain and out-of-print books, which can be read online.

Here's a quick public tour of Open Library to get you familiar with the service and its offerings (10min).

archive org_embed_openlibrary-tour-2020 (1)

Installation

Run docker compose up and visit http://localhost:8080

Need more details? Checkout the Docker instructions or video tutorial.

Alternatively, if you do not want to set up Open Library on your local computer, try Gitpod! This lets you work on Open Library entirely in your browser without having to install anything on your personal computer. Warning: This integration is still experimental. Open In Gitpod

Developer's Guide

For instructions on administrating your Open Library instance, refer to the Developer's Quickstart Guide.

You can also find more information regarding Developer Documentation for Open Library in the Open Library Wiki.

Code Organization

  • openlibrary/core - core openlibrary functionality, imported and used by www
  • openlibrary/plugins - other models, controllers, and view helpers
  • openlibrary/views - views for rendering web pages
  • openlibrary/templates - all the templates used in the website
  • openlibrary/macros - macros are like templates, but can be called from wikitext

Architecture

The Backend

OpenLibrary is developed on top of the Infogami wiki system, which is itself built on top of the web.py Python web framework and the Infobase database framework.

Once you've read the overview of OpenLibrary Backend technologies, it's highly encouraged you read the developer primer which explains how to use Infogami (and its database, Infobase).

If you want to dive into the source code for Infogami, see the Infogami repo.

Running tests

Open Library tests can be run using docker. Kindly look up on our Testing Document for more details.

docker compose run --rm home make test

License

All source code published here is available under the terms of the GNU Affero General Public License, version 3.

openlibrary's People

Contributors

anandology avatar bfalling avatar bharatkalluri avatar buttock avatar cclauss avatar cdrini avatar dependabot-preview[bot] avatar dependabot[bot] avatar edwardbetts avatar gdamdam avatar hornc avatar jaydenteoh avatar jdlrobson avatar jimchamp avatar jimman2003 avatar lephemere avatar mangtronix avatar mekarpeles avatar nibrahim avatar pre-commit-ci[bot] avatar rajbot avatar raybb avatar rebecca-shoptaw avatar renovate[bot] avatar sabreen-parveen avatar sbwhitt avatar scottbarnes avatar tabshaikh avatar tfmorris avatar yashs911 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openlibrary's Issues

Purge expired user accounts and verification links regularly

The emails for account verification, password reset and email change have a verification code in the url. There is a record in the datastore for each link, which is deleted once the link is used. We need to way to purge the unused links after a threshold. I think 2 weeks is a good threshold.

The same thing applies to unverified user accounts.

ISBN search should ignore hyphens

this appears to not be working again.

searching: "isbn:0-8065-1328-4"

does not find a record with ISBN: 0806513284

http://openlibrary.org/works/OL2172380W/

Bug description:
ISBN's (10 and 13), as printed on books, usually contain hyphens for better readibility: 5-93943-006-6. These hyphens are safe to ignore, and in fact most records we have, do ignore them: 5939430066. Now, the problem is that when you search for a book through the Advanced Search function, you must type the hyphens exactly as marked on the record, since otherwise the search won't find them. I think the hyphens should be allowed but ignored in all searches.

Actually, the lack of this feature already made me accidentally duplicate a record, since I didn't realize the book was already there (see the hyphenated and non-hyphenated ISBN's above; the non- hyphenated didn't even have a proper title...).

As a related note, the search field for ISBN's should be made a little wider, since hyphenated ISBN-13's such as 978-1-86228-283-4 don't fit in there at the moment. The field is now set at 15 characters, but it should be at least 17.

catalog/onix/onix.py doesn't handle books with multiple title tags

I've got a book in an ONIX record that has more than one <title> tag. When this happens, onix.py raises an Exception:

File "/openlibrary/catalog/onix/onix.py", line 62, in getitem
raise Exception ("more than one value for %s (%s)" % (reference_name, name))
Exception: more than one value for Title (title)

It is valid according to the ONIX docs for there to exist more than one <title> tag. This often happens when a book is bilingual.

Improve template reloading time in debug mode

When running in debug mode (dev-instance), the templates are loaded for each invocation, which is very slow.

Improve this by caching the templates and reload them only if the file is modified after the template is loaded.

Implement "holding pattern" for non-verified accounts

Account should not be created until the email is activated.

It requires storing the username, password and email in the database until the email is activated, taking special care to not allow any other registrations with not-verified usernames/email and periodically removing the accounts which are not verified even after allowed time (may be a month).

Finish /libraries/stats

Need to work on displaying staggered graphs showing bookreader, pdf and epub loans in the Loans/day graph. Also need to work on displaying axis labels on the graphs.

@george08 does it also include combining /admin/loans and /libraries/stats or that is a separate issue?

Enable history-v2 to everyone

history-v2 feature, if enabled, displays rich messages to history snippet.

For example, it when enabled it shows "Merged 2 duplicate author records into this master. See details." and not enabled, it shows "merge authors".

IIRC, we tried to enable this feature for everyone and noticed some performance issues and reverted back. We need to revisit that and enable it for everyone without any performance issues.

ISBN10 and ISBN13 should be interchangeable

For search, all APIs, and cover retrieval, if either an ISBN10 or ISBN13 is set for an edition, then the equivalent other number (13 to 10 conversion only valid for 978 prefix) should work as well. This allows application developers to normalise on one number instead of keeping track of both.

Create OL pages for IA lendable books without OL pages

About 10% of the IA lendinglibrary / inlibrary books can't be borrowed, as there's no corresponding OL page that can be looked up via the iaid.

Possible fixes -

  • Re-run imports for these books?
  • 'Try harder' to find a match - look for title/author/other match, etc, and add the iaid to any discovered OL records.

template changes are not live after deploy and restart

I commited this change from George to the master branch:

64ed2ce

I pushed out using /olsystem/bin/deploy-code openlibrary

Then i restarted:

fab restart:openlibrary
Restarting the program as openlibrary user
[localhost] Executing task 'restart'
[localhost] local: ssh ol-web2 /olsystem/bin/restart-gunicorn.sh localhost 7071-ol-gunicorn
ol-web2.us.archive.org restarting 7071-ol-gunicorn (pid: 11593)
[localhost] local: ssh ol-web3 /olsystem/bin/restart-gunicorn.sh localhost 7071-ol-gunicorn
ol-web3.us.archive.org restarting 7071-ol-gunicorn (pid: 20138)

My changes don't appear on openlibrary.org. They show up on my dev instance though.

I confirmed the changes are in /opt/openlibrary/deploys/openlibrary/64ed2ce/openlibrary/plugins/openlibrary/templates/site/alert.html on the ol-web nodes, and that /opt/openlibrary/openlibrary is a symlink to deploys/openlibrary/64ed2ce/

Make coverstore scalable

It looks like some users are having issues with our current rate-limit. Need to work on handling more traffic and increasing the rate-limits to a higher number.

Account Verification is BROKEN

Along with several help cases, Daniel reported that he gets stuck in a loop when trying to verify his email address.

Here's what happens:

  1. User signs up for new account
  2. User gets email verification
  3. User clicks on verification link, in this case: http://openlibrary.org/account/verify/52d6c01ebcd64874a46d22acc8bad56a
  4. User sent to http://openlibrary.org/account/login
    4a. (Incidentally, wouldn't it be useful to populate the username field with the appropriate username here?)
  5. User attempts to log in to new account (in this case, PublisherBot)
  6. User sees a screen that says "Account already activated. This account has already been activated. Please log in to continue."
  7. Back to 5. Repeat.

There are 2 "Login trouble" cases listed here that seem related: http://openlibrary.org/admin/support

E-mail verification not working

It looks like the system is allowing non-verified users to login.

So, we need to create a new error state for the login process - account_not_verified, or something, and show the info below...

<h1>Oops!</h1>

<error style>The email address you signed up with needs to be verified.</error> (Yellow box, with ! mark.)

<p>When you created your account, we sent an email to {email address} that contained a link for you to verify your email address. We need you to click that link, please.</p>

<p>If you can't find the email, just hit this button to send a fresh one:</p>

<button>RESEND VERIFICATION EMAIL</button>

<p>Thanks!</p>

acs4 shows book checked out, but datastore does not

This book shows as checked out in ACS4, but current loan is not recorded.

http://openlibrary.org/books/OL6178818M/Shad_run./borrow_admin

Book can be borrowed = True

Borrowed by: None

Available? (what's been borrowed = false)
BookReader = True
PDF = True
ePub = False

ePub fulfillment info:

[
{
"loanuntil": "2011-05-11T20:28:31",
"resourceid": "urn:uuid:fbe146e4-b766-49b6-81d0-4407cdc816f5",
"returned": "F",
"until": "2011-05-11T20:28:31"
}
]

~ $ date -u
Wed May 11 18:40:56 UTC 2011

Check if there is a timezone conversion error (datastore expired early) or other issue.

Setup a dev instance running on production data with tools to edit templates live

Since we are planning to move the templates to the dev instance, it will be nice to setup a dev instance pointing to production database. It should have some tools to allow editing templates from the website and see immediate feedback.

We need tools to:

  • edit templates/css/js files from the website
  • see the modified templates and see the diff
  • revert the changes in one or more files
  • git pull
  • commit and push the template changes to github

Build better /admin/block functionality

Build better /admin/block with the following features.

  • It should allow blocking IPs or users for a certain period of time
  • When an IP is blocked, a comment, who blocked it should be noted
  • It should allow banning an user account and banned user pages should display a special message instead of regular user page

Anything else?

Sending the same metadata twice to the Import API always creates a new work and edition

I sent this json data to the Import API, with both an isbn10 and a isbn13:

{"publishers": ["Ten Speed Press"], "pagination": "20 p.", "description": "A macabre mash-up of the children's classic Pat the Bunny and the present-day zombie phenomenon, with the tactile features of the original book revoltingly re-imagined for an adult audience.", "title": "Pat The Zombie", "isbn_13": ["9781607740360"], "languages": ["eng"], "isbn_10": ["1607740362"], "authors": [{"entity_type": "person", "name": "Aaron Ximm", "personal_name": "Aaron Ximm"}], "contributions": ["Kaveh Soofi (Illustrator)"]}

And it created a new work and edition:

{"edition": {"status": "created", "key": "/books/OL24794124M"}, "work": {"status": "created", "key": "/works/OL15886241W"}, "source_record": "http://www.archive.org/download/test_ol_import_000/000.json", "success": true, "authors": [{"status": "modified", "name": "Aaron Ximm", "key": "/authors/OL6898389A"}]}

and

{"edition": {"status": "created", "key": "/books/OL24794192M"}, "work": {"status": "created", "key": "/works/OL15886309W"}, "source_record": "http://www.archive.org/download/test_ol_import_000/001.json", "success": true, "authors": [{"status": "modified", "name": "Aaron Ximm", "key": "/authors/OL6898389A"}]}

This happens when I send in marc records as well.

Move all templates, macros, css and js to repository

Right now some templates are in the website and some are in the repo. It is becoming difficult to manage. As more people are working on OL now, it makes more sense to have the dev-instance complete, which requires the templates to be in the repo.

The same applies for macros, css and javascripts.

Reset Password form for Admins doesn't work

If you go to the admin view for a person and try to reset their password, you are taken next to a blank page. (Instead, you should see some sort of confirmation message.)

Also, testing trying to log in with the password (in spite of the blank page), and that doesn't work.

Remove or increase rate limit on covers

The current covers rate limit is blocking Evergreen users, who use OL covers through a single IP that caches them.

The rate limit isn't effective against the botnet net is crawling covers, and the site melts down when the iptables rules are removed, even when the rate limit is in place.

Can we let Evergreen users get to our covers, and have the firewall deal with the botnet?

Several fields don't display correctly in DIFF view

need simple way to get/create OL edition from iaid

This comes up from lending - I frequently get a list of books to load into lending from a library partner, but find that there's no corresponding edition/work to add lending subjects to.

This is a job for the import API, I think...

Please reassign to me when there's an import API pathway for this. (Or if there already is one!)

From the latest batch:

williamhenrymoor00defo
meadorsmeadows00mead
updatedmorgangen00morg
ancestryofrevwal1985morg
alongwaywithbenj00morr
morrowsrelatedfa00morr
genealogiaeordat00mors
forebearsdescend00mars
descendantsofhug00cham
searchforwestmos00mosl
georgemoseleyrev00cawt
jamesmosmanearly01mosm
jamesmosmanearly02mosm
mottstreet00mott
muhlenbergsofpen00wall
descendantsofjon00murr

and from an earlier set:

21_Success_Secrets_9781576759189
301_Ways_to_Have_Fun_at_Work_9781605092690
A_Game_As_Old_As_Empire_9781576757987
A_Peacock_in_the_Land_of_Penguins_3rd_9781605092522
A_Simpler_Way_9781605092546
A-Complaint-Is-A-Gift-9781576759462
Abolishing_Performance_Appraisals_9781605093956
Accidental_Genius_2nd_9781605096513_ePDF
Affluenza_9781605096476
Agenda_for_a_New_Economy_2nd_9781605093765
Aligned_Thinking_9781605091457
Alternatives_to_Economic_Globalization_9781605094090
Analysis_for_Improving_Performance_9781576755303
Appreciative_Inquiry_9781605092812
Attracting_Perfect_Customers_9781605098494
Be_a_Sales_Superstar_9781605098364
Be_the_Hero_9781576759998
Be_Your_Own_Brand_2nd_9781605098111
Branded_Customer_Service_9781576758861
Catch_9781605093802
Change_Is_Everybodys_Business_9781605093680
Change_Your_Questions_Change_Your_Life_2nd_9781605094304
Community_9781576757734
Confessions_of_an_Economic_Hitman_9781576755129
Cracking_the_Code_9781576755334
Cultural_Intelligence_2nd_9781576757994
Downshifting_9781576759905
Driving_Growth_Through_Innovation_9781576755549
Eat_That_Frog_9781576755044
Emotional_Discipline_9781576759622
Emotional_Value_9781605097244
Empowerment_Takes_More_than_a_Minute_9781605093390
Evaluating_Training_Programs_9781576757963
Finding_Our_Way_9781605091464
Flight_Plan_9781576755563
Full_Steam_Ahead_2nd_9781605098760
Fun_Works_9781576755181
Future_Search_9781605094298
Gangs_of_America_9781605097121
Get_Paid_More_And_Promoted_Faster_9781576758021
Get-There-Early-9781576755310
Getting_Things_Done_When_You_Are_Not_In_Charge_9781605092843
Go_Team_9781605093413
Goals_2nd_9781605094120_ePDF
Helping_9781576758724
Hot_Spots_9781605092973
How_to_Get_Ideas_9781605093017
How_to_Make_Collaboration_Work_9781605092850
Ideas_Are_Free_9781605090177
Ideaship_9781605093369
Intrinsic_Motivation_at_Work_2nd_9781576755921
Know_Can_Do_9781605093376
Leadership_and_Self-Deception_2nd_9781576759783
Leadership_and_the_New_Science_3rd_9781605091471
Leadership_From_the_Inside_Out_9781576759806
Love_It_Dont_Leave_It_9781576758755
Macroshift_9781576751787
Magnetic_Service_9781605096421
Managers_Not_MBAs_9781576755112
Managing_9781576758953
Networking_for_People_Who_Hate_Networking_9781605096070_ePDF
No_More_Regrets_9781605098876_WEB
One_From_Many_9781605090184
Open_Space_Technology_3rd_9781576757758
Out_of_Poverty_9781576755488
PeopleSmart_9781605098500
Performance_Consulting_2nd_Edition_9781576757772
Power_and_Love_9781605093055
Prisoners_of_Our_Thoughts_2nd_9781605099217
Repacking-Your-Bags-9781576758762
Running_Training_Like_a_Business_9781605096407
Salsa_Soul_and_Spirit_9781576755228
Screwed_9781576755297
Seeing_Systems_9781576755358
Shifting_Sands_9781576759769
Solving_Tough_Problems_9781576755372
Sprout_9781605092836
Synchronicity_2nd_9781609940188
Terms_of_Engagement_2nd_9781605094489
The_3_Keys_to_Empowerment_9781605093406
The_100_Absolutely_Unbreakable_Laws_of_Business_Success_9781576757949
The_Anatomy_of_Peace_9781576759554
The_Answer_to_How_Is_Yes_9781605093949
The_Blind_Men_and_the_Elephant_9781605096124
The_Change_Handbook_9781576755099
The_Courageous_Follower_9781605092744
The_Great_Turning_9781576755396
The_Hamster_Revolution_9781576755754
The_Introverted_Leader_9781576755877
The_Laws_of_Lifetime_Growth_9781576755051
The_Leadership_Wisdom_of_Jesus_3rd9781609940058
The_Nonverbal_Advantage_9781576757741
The_One_Minute_Negotiator_9781605096209
The_Post-Corporate_World_9781605093963
The_Power_of_Appreciative_Inquiry_2nd_9781605093291
The_Power_of_Failure_9781605093895
The_Power_of_Purpose_2nd_9781605095271
The_Referral_of_a_Lifetime_9781576758670
The_Resiliency_Advantage_9781605091501
The_Secret_9781605094700
The_World_Cafe_9781605092515
Theory_U_9781576758663
Trust_and_Betrayal_in_the_Workplace_9781576759493
Turning_to_One_Another_2nd_9781576759844
Unequal_Protection_2nd_9781605095608
We_Are_All_Self-Employed_9781605093840
Whistle_While_You_Work_9781576759523
Your_Leadership_Legacy_9781605096308
Zenobia_9781576755495

Add a setup.py

It would be nice if openlibrary used Python's package management tools to install itself into the Python environment.

Stamp git hash on website upon deployment

It would be a good idea to insert the git hash of the version of the OL code being deployed. That way, we can easily check if a deployment was botched and know if it was successful.

WorkBot created editionless works

catalog/onix/onix.py attempts to use a global variable in init(), but doesn't declare it global

onix_codelists and onix_shortnames are both initialized to None. Later in the module loading, init() gets called, which attempts to set them to useful dictionaries. Because they aren't declared global in init(), new variables are created within init() that exist only within the scope of that method. The fix is to add these lines to the top of init():

global onix_codelists
global onix_shortnames

I've forked the project and intend to make a push that will fix this issue soon. I'll post back here with the revision that you can pull to get the fix. Thanks!

/lists hangs on my dev instance

This is what the openlibrary-server output shows when loading /lists. There is no response sent.

http://0.0.0.0:8000/
0.10 (1): 200 GET 0.06 (1): 200 GET /openlibrary.org/things {'query': '{"sort": "-last_modified", "type": "/type/list", "limit": 120, "offset": 22}', 'details': 'False'}/openlibrary.org/things {'query': '{"sort": "-last_modified", "type": "/type/list", "limit": 120, "offset": 0}', 'details': 'False'}

0.04 (1): 200 GET /openlibrary.org/things {'query': '{"sort": "-last_modified", "type": "/type/list", "limit": 120, "offset": 22}', 'details': 'False'}

Search should neither be case-sensitive nor macron/diacritic-sensitive

Comment: Titles of books in Oriental and South Asian languages - like
Arabic, Persian, Urdu & Hindi - are often Romanized or transliterated
using macrons & diacritics for elongated sounds of vowels. Sometimes,
instead of macrons & diacritics, double vowels (aa, ee or oo) are
used. Similarly, there are letters whose sound is represented in more than
one way, e.g. the Arabic, Persian & Urdu letter ( Ø« ) Seh is
represented by both th and S. Similarly ض is represented by dh, d &
even Z. Thus the transliteration or Romanization of, say, proper names
etc. has never been standardized with the result that more than one
spelling are used for them. When filling the Search Box, usually macrons
are not available on the PC keyboard. May I suggest the search feature
ignore this requirement. It should neither be case-sensitive nor
macron/diacritic-sensitive.
For example, the following variations of spellings or transliterations are
used for the complete poetry collection of Mirza Ghalib (1797-1869), the
famous Indian poet. There can still be few more varieties:
Dīv�n-e-Gh�lib, Dīv�n-i-Gh�lib, Dīw�n-i-Gh�lib,
Deevaan-e-Ghaalib, Deevaan-i-Ghaalib, Deewaan-e-Ghaalib,
Deewaan-i-Ghaalib. The search should be programmed in a manner so that any
of the above-mentioned spellings (or even more) should get us the book
entry.
Thank you & kind regards,

Riaz Ahmad Barni
Karachi, Pakistan
[email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.