Comments (2)
- affected by OCR-D/core#360 (merely needs re-generating):
assets/data/page_dewarp/data/mets.xml assets/data/leptonica_samples/data/mets.xml assets/data/DIBCO11-machine_printed/data/mets.xml assets/data/grenzboten-test/data/mets.xml assets/data/communist_manifesto/data/mets.xml assets/data/dfki-testdata/data/mets.xml
- affected by OCR-D/core#328 (merely needs re-generating):
assets/data/page_dewarp/data/mets.xml assets/data/leptonica_samples/data/mets.xml assets/data/DIBCO11-machine_printed/data/mets.xml
- affected by OCR-D/core#499 (merely needs re-ordering between
dmdSec
andstructMap
):
assets/data/grenzboten-test/data/mets.xml assets/data/scribo-test/data/mets.xml assets/data/communist_manifesto/data/mets.xml
- affected by OCR-D/core#499 (merely needs re-ordering between
fileSec
andstructMap
):
assets/data/column-samples/data/mets.xml assets/data/dfki-testdata/data/mets.xml
- file ID re-used (repeated) as page ID (needs manual fix):
assets/data/kant_aufklaerung_1784-binarized/data/mets.xml
- invalid CDATA (
...
– manual fix or remove):
assets/data/SBB0000F29300010000/data/OCR-D-GT-PAGE/FILE_0001_FULLTEXT.xml assets/data/SBB0000F29300010000/data/OCR-D-GT-PAGE/FILE_0002_FULLTEXT.xml
But as of now there are even more errors:
- empty reading order group (intentional?):
assets/data/gutachten/data/TEMP1/PAGE_TEMP1
- wrongly formatted regionRef (intentional?):
assets/data/gutachten/data/TEMP2/PAGE_TEMP2_1.xml assets/data/gutachten/data/TEMP2/PAGE_TEMP2_2.xml
pc:PcGts/@pcGtsId
differs frommets:file/@ID
(newly introduced, ambitious goal)
nearly everywhere...
from assets.
- affected by OCR-D/core#360 (merely needs re-generating):
assets/data/page_dewarp/data/mets.xml assets/data/leptonica_samples/data/mets.xml assets/data/DIBCO11-machine_printed/data/mets.xml assets/data/grenzboten-test/data/mets.xml assets/data/communist_manifesto/data/mets.xml assets/data/dfki-testdata/data/mets.xml
fixed (w/o regenerating)
- affected by OCR-D/core#328 (merely needs re-generating):
assets/data/page_dewarp/data/mets.xml assets/data/leptonica_samples/data/mets.xml assets/data/DIBCO11-machine_printed/data/mets.xml
fixed
- affected by OCR-D/core#499 (merely needs re-ordering between
dmdSec
andstructMap
):assets/data/grenzboten-test/data/mets.xml assets/data/scribo-test/data/mets.xml assets/data/communist_manifesto/data/mets.xml
fixed
- affected by OCR-D/core#499 (merely needs re-ordering between
fileSec
andstructMap
):
assets/data/column-samples/data/mets.xml assets/data/dfki-testdata/data/mets.xml
fixed
- file ID re-used (repeated) as page ID (needs manual fix):
assets/data/kant_aufklaerung_1784-binarized/data/mets.xml
fixed
- invalid CDATA (
...
– manual fix or remove):assets/data/SBB0000F29300010000/data/OCR-D-GT-PAGE/FILE_0001_FULLTEXT.xml assets/data/SBB0000F29300010000/data/OCR-D-GT-PAGE/FILE_0002_FULLTEXT.xml
But as of now there are even more errors:
- empty reading order group (intentional?):
assets/data/gutachten/data/TEMP1/PAGE_TEMP1
yes this is intentional to test the reading order methods in the generateDS API
- wrongly formatted regionRef (intentional?):
assets/data/gutachten/data/TEMP2/PAGE_TEMP2_1.xml assets/data/gutachten/data/TEMP2/PAGE_TEMP2_2.xml
same
pc:PcGts/@pcGtsId
differs frommets:file/@ID
(newly introduced, ambitious goal)nearly everywhere...
fixed manually where it wasn't too much effort. will be a perfect use case for ocrd-sanitize
implementing https://github.com/mikegerber/sbb-useful-hacks/blob/master/mets-fixers/fix-page-pcgtsid-to-be-mets-file-id
from assets.
Related Issues (20)
- 1000pages: Inconsistent annotation of column separators in "krafft_landwirtschaft02_1876"" HOT 1
- 1000pages: Non-existent separator annotated on page 0018 of "krafft_landwirthschaft03_1876"" HOT 2
- 1000pages: Missing text on page 0003 and 0004 of "lenau_gedichte_1832" HOT 3
- Change the file name in DFKI test data HOT 2
- Most/All workspaces in bag files don't validate HOT 4
- Add references to OCR-D Ground Truth repo. HOT 1
- provide TableRegion/Grid examples HOT 6
- Repository not usable on case insensitive filesystems (like macOS and Windows) HOT 6
- Update scribo-tests with correct `k` parameters for sauvola-ms-fg HOT 1
- Add a METS with lots of files for testing HOT 9
- Self-contained make "update-bagit" target
- zip files broken links
- euler_rechenkunst01_1738 has wrong structLink
- OCR-D GT uses wrong mods:languageTerm/@authority
- wrong image references
- Validation errors for 'gutachten'
- Broken CI validation test and warning because of outdated code
- make local image refs LOCTYPE=OTHER OTHERLOCTYPE=FILE instead of URL HOT 1
- Missing license
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from assets.