Coder Social home page Coder Social logo

Comments (14)

ChWick avatar ChWick commented on August 16, 2024 2

Usually, that means that something is wrong with your data. Can you

  • Check if Calamari trains on a single line (overfitting)
  • Paste an example line with its corresponding ground truth
  • Paste the command you used to call Calamari

from calamari.

realjoenguyen avatar realjoenguyen commented on August 16, 2024 2

from calamari.

ChWick avatar ChWick commented on August 16, 2024 1

As I said, usually this is a mismatch of a single GT (text, image)-pair. Most probably a line that is corrupt (e. g. an image that is rotated by 90deg, or completely white, ...). Unfortunately, I did not find such a line when scrolling through your data.

An idea to find the 'bad' files:

  • Train a model on all files
  • Predict all training files
  • Files with a low accuracy (0%) or an error should be the reason
  • Check those files and their text files

from calamari.

realjoenguyen avatar realjoenguyen commented on August 16, 2024

Hi there,

  1. This is the output when I run calamari:
#0.000000: loss=475.23452759 ler=1.92407715 dt=80.18787289s
 PRED: '‪ÉWÉ.ọồSỮgỢọẰ&ĨọWọụọừọĨọSỢọĨụSứỰòọỢỘẠọWĨòĨẰÙọừỘỪụẰĨừẰụỢỘĨŨgEọiĨVẰừẰỎẲŨọẰọĨV58ừVỄụŨ‬'
 TRUE: '‪Địa chỉ: Trần Hưng Đạo, Phường Lê Bình, Quận Cái Răng, Cần Thơ‬'
2019-02-13 03:30:22.041879: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
WARNING: Infinite loss. Skipping batch.

So there are 2 different samples here.

  1. My training data contains 60.000 samples and I get the small data (random 1000 samples from these 60.000 training samples). It still got the same error. Here is the link: ("files" for images and "labels" for OCR text label)
    https://drive.google.com/drive/folders/1E2L8D7ZtrGQi7zOLeTLZSRbfOMdNbVF1?usp=sharing

  2. here is the command:


#!/usr/bin/env bash
IMAGE_DIR=./data/small/files/*
LABEL_DIR=./data/small/labels/*

python3 ./calamari_ocr/scripts/train.py \
                --files "${IMAGE_DIR}" \
                --text_files "${LABEL_DIR}" \
                --num_threads 8 \
                --batch_size 10 \
                --display 1 \
                --output_dir ./out \
                --checkpoint_frequency 100 \
                --train_data_on_the_fly


Thank you so much for your help!!!

from calamari.

ChWick avatar ChWick commented on August 16, 2024

Thanks for the provided information!

I havn't found the reason for the warning, yet, however I was able to successfully train a model on your provided data (and ignoring the warning). Probably the display parameter does not what you expected: When setting to a value in [0,1] the output is shown relatively to an epoch, I guess what you want is to display each iteration (this is not possible), but you can set it to a number greater one to see the learning progress, e. g. --display 10.
Moreover, check if you really need the --train_data_on_the_fly parameter since it slows down the computation really hard (60K examples should fit in the RAM completely).

from calamari.

realjoenguyen avatar realjoenguyen commented on August 16, 2024

But this:

2019-02-13 03:30:22.041879: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
is not the error?
Because I often see "skipping batch" , so I think the batch is skipped and the training process got errors?

Thank you for your help!

Edited: did you see the warning and "no valid path found" in CTC when using my dataset?

from calamari.

realjoenguyen avatar realjoenguyen commented on August 16, 2024

Also when I use --train_data_on_the_fly, I got this error:


Resolving input files
Found 60000 files in the dataset
datset = <calamari_ocr.ocr.datasets.file_dataset.FileDataSet object at 0x7fd6046751d0>
Preloading dataset type DataSetMode.TRAIN with size 60000
Loading Dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 60000/60000 [07:59<00:00, 125.26it/s]
Traceback (most recent call last):
  File "./calamari_ocr/scripts/train.py", line 315, in <module>
    main()
  File "./calamari_ocr/scripts/train.py", line 311, in main
    run(args)
  File "./calamari_ocr/scripts/train.py", line 299, in run
    progress_bar=not args.no_progress_bars
  File "/root/TA/calamari/calamari_ocr/ocr/trainer.py", line 112, in train
    self.dataset.preload(processes=checkpoint_params.processes, progress_bar=progress_bar)
  File "/root/TA/calamari/calamari_ocr/ocr/datasets/input_dataset.py", line 60, in preload
    texts = self.text_processor.apply(txts, processes=processes, progress_bar=progress_bar)
  File "/root/TA/calamari/calamari_ocr/ocr/text_processing/text_processor.py", line 17, in apply
    return parallel_map(self._apply_single, txts, desc="Text Preprocessing", processes=processes, progress_bar=progress_bar)
  File "/root/TA/calamari/calamari_ocr/utils/multiprocessing.py", line 40, in parallel_map
    with multiprocessing.Pool(processes=processes, maxtasksperchild=max_tasks_per_child) as pool:
  File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
    self._repopulate_pool()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
    w.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

from calamari.

ChWick avatar ChWick commented on August 16, 2024

Edited: did you see the warning and "no valid path found" in CTC when using my dataset?

Yes, my output (using --display 10)

#00000000: loss=516.87536621 ler=1.88759577 dt=44.58602881s
 PRED: '‪É&Ợ5Ù95ọ5ẠọẰừọSọỢỰẰĩgầVỢĨMọSIẰụẰŨĨ&ỚSĨòỢừỘẰọĨừọSỢọĨừọỢọSĨỘọừòSậWĨŨỘọẠọĨẰĨỐĨ9Ũ‬'
 TRUE: '‪Địa chỉ: Trần Hưng Đạo, Phường Lê Bình, Quận Cái Răng, Cần Thơ‬'
#00000010: loss=336.84377636 ler=1.15732210 dt=4.88046891s
 PRED: '‪‬'
 TRUE: '‪Địa chỉ: ấp Long Bình, Xã Long Điền A, Huyện Chợ Mới, An Giang‬'
2019-02-13 09:49:08.596539: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
WARNING: Infinite loss. Skipping batch.
#00000020: loss=260.75162315 ler=1.04867650 dt=3.11120788s
 PRED: '‪T H H‬'
 TRUE: '‪Mã số thuế: CÔNG TY TNHH DỊCH VỤ VÀ ĐÀO TẠO TRÍ ĐỨC‬'
#00000030: loss=229.56856613 ler=1.00270691 dt=2.37745589s
 PRED: '‪‬'
 TRUE: '‪Mã số thuế: DOANH NGHIỆP TƯ NHÂN THÚY TÀI‬'
#00000040: loss=213.85240898 ler=0.98507622 dt=2.02556430s
 PRED: '‪‬'
 TRUE: '‪Địa chỉ: 22/3/2 Phú Mộng, Phường Kim Long, Thành phố Huế, Thừa Thiên Huế.‬'

The error means that in a single iteration an error occurred during the computation of the loss/gradients, which is why this single iteration batch is ignored, i.e. the weights are not updated. It is shown approximately every 40-50 iterations which means that 1 out of 400-500 files (batch size 10) is probably corrupted and thus ignored.

from calamari.

ChWick avatar ChWick commented on August 16, 2024

Also when I use --train_data_on_the_fly, I got this error:

For the smaller dataset I require approx 3 GB RAM for training when loading into the RAM. Probably the whole dataset is too large to fit completely in the RAM which is why you have to use `--train_data_on_the_fly' here no more than 4 GB RAM should be required.

Please test if the number of files are the reason for this OutOfMemory error.

from calamari.

realjoenguyen avatar realjoenguyen commented on August 16, 2024

Thank you,
Can you suggest how I can avoid "No valid path" in CTC loss?

from calamari.

thak123 avatar thak123 commented on August 16, 2024

@gofortargets how did you generate the dataset for training ?

from calamari.

srikanthsampathi avatar srikanthsampathi commented on August 16, 2024

how to stop training , number of itr= 8790

from calamari.

srikanthsampathi avatar srikanthsampathi commented on August 16, 2024

loss=0.55423099 ler=0.15204727 dt=4.05446383s is reached with itr 8790 , when will the training be stopped or should I stop manually and use the model number for prediction?

from calamari.

ChWick avatar ChWick commented on August 16, 2024

@srikanthsampathi Three options:

  • Specify the maximum number of iterations --max_iters=10000
  • Use early stopping --validation VALIDATION_DATASET
  • Manually stop

from calamari.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.