Comments (11)
I totally agree. We have started to make this a pattern in the processors (preprocessing, OCR), but IMO core should lead by good example in Workspace.add_file
– at least if no local_filename
was already specified by the caller – and in the bagger (see OCR-D/core#258).
from assets.
Implementing OCR-D/core#258 will most likely fix this since the extensions are present until bagging...
from assets.
Yes, but processors that do not care about this and do not use local_filename
when doing their Workspace.add_file
currently also effectively suppress extensions. This case is still in core's responsibility.
from assets.
You're right, I just meant that for our provided GT the problem is the bagger. Workspace.add_file must be fixed too ofc.
from assets.
Oh, now I got it. We just keep agreeing you know!
from assets.
@kba Fixed?
from assets.
It's fixed for the bagger but I'm still evaluating whether
when doing their Workspace.add_file currently also effectively suppress extensions. This case is still in core's responsibility.
is still an issue in core.
from assets.
Yes, but processors that do not care about this and do not use local_filename when doing their Workspace.add_file currently also effectively suppress extensions.
Not sure whether I can follow. Can you give me an example when this might happen @bertsky ?
from assets.
Yes, but processors that do not care about this and do not use local_filename when doing their Workspace.add_file currently also effectively suppress extensions.
Not sure whether I can follow. Can you give me an example when this might happen @bertsky ?
I can't see it myself right now. What I do understand is that Workspace.add_file
does not in itself require either the local_filename
or url
kwarg, especially if it does not pass a content
with it. So OcrdMets.add_file
will instantiate a new OcrdFile
and then set local_filename=None
.
So it all depends on what then happens with that file reference later-on in the processor. If content
was passed to Workspace.add_file
, then an exception will come up. (I have already complained about this as a documentation issue.) Otherwise, the processor might use OcrdMets.find_files
to get a reference and then do things to it. Somewhere along that path local_filename
will/must be set. There we have to look whether it is in core's responsibility to ensure filename extensions.
Sorry, that's all I can offer ATM.
from assets.
It appears all file extensions are available now in assets/data, a related issue OCR-D/core#332 in core was closed - can this be closed too?
from assets.
Yes, fixed in assets and we're doing file extensions in the processors now as well.
from assets.
Related Issues (20)
- 1000pages: Inconsistent annotation of separators in "hobrecht_strassenbau_1890" HOT 1
- 1000pages: Incomplete annotation on page 0001 of "immermann_muenchhausen02_1839"" HOT 2
- 1000pages: Separators missing on page 0010 of "immermann_muenchhausen02_1839" HOT 1
- 1000pages: Inconsistent annotation of column separators in "krafft_landwirtschaft02_1876"" HOT 1
- 1000pages: Non-existent separator annotated on page 0018 of "krafft_landwirthschaft03_1876"" HOT 2
- 1000pages: Missing text on page 0003 and 0004 of "lenau_gedichte_1832" HOT 3
- Change the file name in DFKI test data HOT 2
- Most/All workspaces in bag files don't validate HOT 4
- Add references to OCR-D Ground Truth repo. HOT 1
- provide TableRegion/Grid examples HOT 6
- Repository not usable on case insensitive filesystems (like macOS and Windows) HOT 6
- Update scribo-tests with correct `k` parameters for sauvola-ms-fg HOT 1
- Add a METS with lots of files for testing HOT 9
- Lots of XSD validation errors HOT 2
- Self-contained make "update-bagit" target
- zip files broken links
- euler_rechenkunst01_1738 has wrong structLink
- OCR-D GT uses wrong mods:languageTerm/@authority
- wrong image references
- Validation errors for 'gutachten'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from assets.