Comments (2)
This bug seems to apply to other punctuation , too- not just hyphens.
Example with a period searching "gener.il":
On the results page, you see hit highlights.
http://chroniclingamerica.loc.gov/search/pages/results/?state=&date1=1836&date2=1922&proxtext=gener.il&x=0&y=0&dateFilterType=yearRange&rows=20&searchType=basic
On the individual pages, no hit highlights . . .
http://chroniclingamerica.loc.gov/lccn/sn83030214/1912-06-02/ed-1/seq-37/#date1=1836&index=1&rows=20&words=gener+gener.il+il&searchType=basic&sequence=0&state=&date2=1922&proxtext=gener.il&y=18&x=9&dateFilterType=yearRange&page=1
from chronam.
@dbrunton
Though we have problems with punctuation marks in OCR words, this issue was a different one. I'll try explaining it.
Borrowing the search text "coca cola" from @eikeon 's comment, take a look at one of the search hits
http://chroniclingamerica.loc.gov/lccn/sn83030214/1909-05-23/ed-1/seq-43/#date1=1836&index=0&rows=20&words=Coca+Coca-Cola+Cola&searchType=basic&sequence=0&state=&date2=1922&proxtext=coca+cola&y=-220&x=-1136&dateFilterType=yearRange&page=1
The request parameter we care about here 'words' - words=Coca+Coca-Cola+Cola. A piece of javascript in page.js tries to find coordinates for and highlight one word at a time in the OCR, meaning, it tries to find coordinates for Coca, then Cola-Cola and finally Cola. Due to a bug in the javascript, if a word was not found in the OCR, it did not proceed to try the next word, instead, bailed out completely.
So, (please pay attention to words request parameter)
http://chroniclingamerica.loc.gov/lccn/sn83030214/1909-05-23/ed-1/seq-43/#date1=1836&index=0&rows=20&words=Coca-Cola+Coca+Cola&searchType=basic&sequence=0&state=&date2=1922&proxtext=coca+cola&y=-220&x=-1136&dateFilterType=yearRange&page=1
would work and
http://chroniclingamerica.loc.gov/lccn/sn83030214/1909-05-23/ed-1/seq-43/#date1=1836&index=0&rows=20&words=Coca+Coca-Cola+Cola&searchType=basic&sequence=0&state=&date2=1922&proxtext=coca+cola&y=-220&x=-1136&dateFilterType=yearRange&page=1
would not.
The fix would make the javascript pass the words not found in the OCR and keep looking until we run out of 'words'.
Did I explain that right? Does that make sense at all?
from chronam.
Related Issues (20)
- Update language table with some new ndnp languages
- If language validation fails, title loader should assign default language to the record
- update missing institutions HOT 1
- not all titles being pulled into new chronam HOT 2
- Chronam Loading Error HOT 2
- Add new 2016 awardees HOT 1
- Add Norwegian Bokmål to Language fixture HOT 3
- Increase max length of section_label HOT 3
- Potential Misconfiguration HOT 1
- don't return a http 200 when there is an error processing the coordinates HOT 1
- Change the loader to allow batch names that don't start batch_
- Adv Search Date Range Picker "Year" values are limited to 1789 when refining search results HOT 1
- Update cache header strategy
- Update usage of logging in management commands HOT 1
- newspaper info cache should be cleared when ingesting new batch
- RelatedObjectDoesNotExist on /newspapers.rdf HOT 1
- chronam user guide HOT 3
- Actively developed? HOT 1
- Search API: CORS requests are blocked from browser clients HOT 1
- Update Adobe tracking script HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chronam.