Hi guys, I suggest change print wf.list_steps() to print (wf.list_st

print error - ICDAR2017_shared_task_workflows.ipynb about ochre HOT 3 OPEN

kbnlresearch commented on July 25, 2024

print error - ICDAR2017_shared_task_workflows.ipynb

from ochre.

Comments (3)

jvdzwaan commented on July 25, 2024

Thanks! The signature of wf.list_steps() changed, so, yes, you should do print(wf.list_steps()).

Please note that the workflow is about preprocessing the vudnc data, this has nothing to do with the icdar 2017 shared task. Also, I do not recommend using the vudnc data, because it is very noisy. But if you do want to preprocess it anyway, you should do

cwltool ochre/cwl/vudnc-preprocess-pack.cwl --archive path/to/vudnc/archive

from ochre.

thiagopx commented on July 25, 2024

Thanks! The signature of wf.list_steps() changed, so, yes, you should do print(wf.list_steps()).

Please note that the workflow is about preprocessing the vudnc data, this has nothing to do with the icdar 2017 shared task. Also, I do not recommend using the vudnc data, because it is very noisy. But if you do want to preprocess it anyway, you should do
cwltool ochre/cwl/vudnc-preprocess-pack.cwl --archive path/to/vudnc/archive

You are correct. I meant that I was not able to run vudnc-preprocess-pack.cwl.

For good results in english, do you recommend using the english monograph partition of ICDAR? I trained with both monograph and the periodical partitions in separated but the validation accuracy and loss were not good (and also the tests I made).

I would like to help with some additional documentation to improve reproducibility, but I need a roadmap of how to get significant results (mainly for english documents).

from ochre.

jvdzwaan commented on July 25, 2024

Unfortunately, ochre is not (yet) fit for training good ocr post-correction models. I plan to work on it in the future, but only as a hobby project. So no promises there!

Generally speaking, the OCR post-correction datasets are small. That's why I'm making a list of them, so they can be used for generalization. I don't think that training on the English monograph data will give you a model that will work on other data, because OCR errors tend to depend on time period, font, the ocr software that was used, etc.

from ochre.

Recommend Projects

print error - ICDAR2017_shared_task_workflows.ipynb about ochre HOT 3 OPEN

Comments (3)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent