HI, When training Calamari on my dataset, I got this error tenso

From here:<a href="https://github.com/Belval/TextRecognitionDataGenerat

Hi there, This is the output when I run calamari: <

Issue on CTC loss when training on new data about calamari HOT 14 CLOSED

calamari-ocr commented on August 16, 2024

Issue on CTC loss when training on new data

from calamari.

Comments (14)

ChWick commented on August 16, 2024 2

Usually, that means that something is wrong with your data. Can you

Check if Calamari trains on a single line (overfitting)
Paste an example line with its corresponding ground truth
Paste the command you used to call Calamari

from calamari.

realjoenguyen commented on August 16, 2024 2

From here: https://github.com/Belval/TextRecognitionDataGenerator

…

On Fri, Apr 5, 2019 at 4:53 PM Gaurish Thakkar ***@***.***> wrote: @gofortargets <https://github.com/gofortargets> how did you generate the dataset for training ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#66 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AZOpGVX0SpYwawrDTQn9kJ-kMMt5ioPlks5vdx0ngaJpZM4a2hih> .

-- Nguyễn Tuấn Anh ĐT: 0898130931

from calamari.

ChWick commented on August 16, 2024 1

As I said, usually this is a mismatch of a single GT (text, image)-pair. Most probably a line that is corrupt (e. g. an image that is rotated by 90deg, or completely white, ...). Unfortunately, I did not find such a line when scrolling through your data.

An idea to find the 'bad' files:

Train a model on all files
Predict all training files
Files with a low accuracy (0%) or an error should be the reason
Check those files and their text files

from calamari.

realjoenguyen commented on August 16, 2024

Hi there,

This is the output when I run calamari:

#0.000000: loss=475.23452759 ler=1.92407715 dt=80.18787289s
 PRED: '‪ÉWÉ.ọồSỮgỢọẰ&ĨọWọụọừọĨọSỢọĨụSứỰòọỢỘẠọWĨòĨẰÙọừỘỪụẰĨừẰụỢỘĨŨgEọiĨVẰừẰỎẲŨọẰọĨV58ừVỄụŨ‬'
 TRUE: '‪Địa chỉ: Trần Hưng Đạo, Phường Lê Bình, Quận Cái Răng, Cần Thơ‬'
2019-02-13 03:30:22.041879: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
WARNING: Infinite loss. Skipping batch.

So there are 2 different samples here.

My training data contains 60.000 samples and I get the small data (random 1000 samples from these 60.000 training samples). It still got the same error. Here is the link: ("files" for images and "labels" for OCR text label)
https://drive.google.com/drive/folders/1E2L8D7ZtrGQi7zOLeTLZSRbfOMdNbVF1?usp=sharing
here is the command:


#!/usr/bin/env bash
IMAGE_DIR=./data/small/files/*
LABEL_DIR=./data/small/labels/*

python3 ./calamari_ocr/scripts/train.py \
                --files "${IMAGE_DIR}" \
                --text_files "${LABEL_DIR}" \
                --num_threads 8 \
                --batch_size 10 \
                --display 1 \
                --output_dir ./out \
                --checkpoint_frequency 100 \
                --train_data_on_the_fly

Thank you so much for your help!!!

from calamari.

ChWick commented on August 16, 2024

Thanks for the provided information!

I havn't found the reason for the warning, yet, however I was able to successfully train a model on your provided data (and ignoring the warning). Probably the display parameter does not what you expected: When setting to a value in [0,1] the output is shown relatively to an epoch, I guess what you want is to display each iteration (this is not possible), but you can set it to a number greater one to see the learning progress, e. g. --display 10.
Moreover, check if you really need the --train_data_on_the_fly parameter since it slows down the computation really hard (60K examples should fit in the RAM completely).

from calamari.

realjoenguyen commented on August 16, 2024

But this:

2019-02-13 03:30:22.041879: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
is not the error?
Because I often see "skipping batch" , so I think the batch is skipped and the training process got errors?

Thank you for your help!

Edited: did you see the warning and "no valid path found" in CTC when using my dataset?

from calamari.

realjoenguyen commented on August 16, 2024

Also when I use --train_data_on_the_fly, I got this error:


Resolving input files
Found 60000 files in the dataset
datset = <calamari_ocr.ocr.datasets.file_dataset.FileDataSet object at 0x7fd6046751d0>
Preloading dataset type DataSetMode.TRAIN with size 60000
Loading Dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 60000/60000 [07:59<00:00, 125.26it/s]
Traceback (most recent call last):
  File "./calamari_ocr/scripts/train.py", line 315, in <module>
    main()
  File "./calamari_ocr/scripts/train.py", line 311, in main
    run(args)
  File "./calamari_ocr/scripts/train.py", line 299, in run
    progress_bar=not args.no_progress_bars
  File "/root/TA/calamari/calamari_ocr/ocr/trainer.py", line 112, in train
    self.dataset.preload(processes=checkpoint_params.processes, progress_bar=progress_bar)
  File "/root/TA/calamari/calamari_ocr/ocr/datasets/input_dataset.py", line 60, in preload
    texts = self.text_processor.apply(txts, processes=processes, progress_bar=progress_bar)
  File "/root/TA/calamari/calamari_ocr/ocr/text_processing/text_processor.py", line 17, in apply
    return parallel_map(self._apply_single, txts, desc="Text Preprocessing", processes=processes, progress_bar=progress_bar)
  File "/root/TA/calamari/calamari_ocr/utils/multiprocessing.py", line 40, in parallel_map
    with multiprocessing.Pool(processes=processes, maxtasksperchild=max_tasks_per_child) as pool:
  File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
    self._repopulate_pool()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
    w.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

from calamari.

ChWick commented on August 16, 2024

Edited: did you see the warning and "no valid path found" in CTC when using my dataset?

Yes, my output (using --display 10)

#00000000: loss=516.87536621 ler=1.88759577 dt=44.58602881s
 PRED: '‪É&Ợ5Ù95ọ5ẠọẰừọSọỢỰẰĩgầVỢĨMọSIẰụẰŨĨ&ỚSĨòỢừỘẰọĨừọSỢọĨừọỢọSĨỘọừòSậWĨŨỘọẠọĨẰĨỐĨ9Ũ‬'
 TRUE: '‪Địa chỉ: Trần Hưng Đạo, Phường Lê Bình, Quận Cái Răng, Cần Thơ‬'
#00000010: loss=336.84377636 ler=1.15732210 dt=4.88046891s
 PRED: '‪‬'
 TRUE: '‪Địa chỉ: ấp Long Bình, Xã Long Điền A, Huyện Chợ Mới, An Giang‬'
2019-02-13 09:49:08.596539: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
WARNING: Infinite loss. Skipping batch.
#00000020: loss=260.75162315 ler=1.04867650 dt=3.11120788s
 PRED: '‪T H H‬'
 TRUE: '‪Mã số thuế: CÔNG TY TNHH DỊCH VỤ VÀ ĐÀO TẠO TRÍ ĐỨC‬'
#00000030: loss=229.56856613 ler=1.00270691 dt=2.37745589s
 PRED: '‪‬'
 TRUE: '‪Mã số thuế: DOANH NGHIỆP TƯ NHÂN THÚY TÀI‬'
#00000040: loss=213.85240898 ler=0.98507622 dt=2.02556430s
 PRED: '‪‬'
 TRUE: '‪Địa chỉ: 22/3/2 Phú Mộng, Phường Kim Long, Thành phố Huế, Thừa Thiên Huế.‬'

The error means that in a single iteration an error occurred during the computation of the loss/gradients, which is why this single iteration batch is ignored, i.e. the weights are not updated. It is shown approximately every 40-50 iterations which means that 1 out of 400-500 files (batch size 10) is probably corrupted and thus ignored.

from calamari.

ChWick commented on August 16, 2024

Also when I use --train_data_on_the_fly, I got this error:

For the smaller dataset I require approx 3 GB RAM for training when loading into the RAM. Probably the whole dataset is too large to fit completely in the RAM which is why you have to use `--train_data_on_the_fly' here no more than 4 GB RAM should be required.

Please test if the number of files are the reason for this OutOfMemory error.

from calamari.

realjoenguyen commented on August 16, 2024

Thank you,
Can you suggest how I can avoid "No valid path" in CTC loss?

from calamari.

thak123 commented on August 16, 2024

@gofortargets how did you generate the dataset for training ?

from calamari.

srikanthsampathi commented on August 16, 2024

how to stop training , number of itr= 8790

from calamari.

srikanthsampathi commented on August 16, 2024

loss=0.55423099 ler=0.15204727 dt=4.05446383s is reached with itr 8790 , when will the training be stopped or should I stop manually and use the model number for prediction?

from calamari.

ChWick commented on August 16, 2024

@srikanthsampathi Three options:

Specify the maximum number of iterations --max_iters=10000
Use early stopping --validation VALIDATION_DATASET
Manually stop

from calamari.

Issue on CTC loss when training on new data about calamari HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent