Comments (14)
Usually, that means that something is wrong with your data. Can you
- Check if Calamari trains on a single line (overfitting)
- Paste an example line with its corresponding ground truth
- Paste the command you used to call Calamari
from calamari.
from calamari.
As I said, usually this is a mismatch of a single GT (text, image)-pair. Most probably a line that is corrupt (e. g. an image that is rotated by 90deg, or completely white, ...). Unfortunately, I did not find such a line when scrolling through your data.
An idea to find the 'bad' files:
- Train a model on all files
- Predict all training files
- Files with a low accuracy (0%) or an error should be the reason
- Check those files and their text files
from calamari.
Hi there,
- This is the output when I run calamari:
#0.000000: loss=475.23452759 ler=1.92407715 dt=80.18787289s
PRED: 'ÉWÉ.ọồSỮgỢọẰ&ĨọWọụọừọĨọSỢọĨụSứỰòọỢỘẠọWĨòĨẰÙọừỘỪụẰĨừẰụỢỘĨŨgEọiĨVẰừẰỎẲŨọẰọĨV58ừVỄụŨ'
TRUE: 'Địa chỉ: Trần Hưng Đạo, Phường Lê Bình, Quận Cái Răng, Cần Thơ'
2019-02-13 03:30:22.041879: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
WARNING: Infinite loss. Skipping batch.
So there are 2 different samples here.
-
My training data contains 60.000 samples and I get the small data (random 1000 samples from these 60.000 training samples). It still got the same error. Here is the link: ("files" for images and "labels" for OCR text label)
https://drive.google.com/drive/folders/1E2L8D7ZtrGQi7zOLeTLZSRbfOMdNbVF1?usp=sharing -
here is the command:
#!/usr/bin/env bash
IMAGE_DIR=./data/small/files/*
LABEL_DIR=./data/small/labels/*
python3 ./calamari_ocr/scripts/train.py \
--files "${IMAGE_DIR}" \
--text_files "${LABEL_DIR}" \
--num_threads 8 \
--batch_size 10 \
--display 1 \
--output_dir ./out \
--checkpoint_frequency 100 \
--train_data_on_the_fly
Thank you so much for your help!!!
from calamari.
Thanks for the provided information!
I havn't found the reason for the warning, yet, however I was able to successfully train a model on your provided data (and ignoring the warning). Probably the display
parameter does not what you expected: When setting to a value in [0,1] the output is shown relatively to an epoch, I guess what you want is to display each iteration (this is not possible), but you can set it to a number greater one to see the learning progress, e. g. --display 10
.
Moreover, check if you really need the --train_data_on_the_fly
parameter since it slows down the computation really hard (60K examples should fit in the RAM completely).
from calamari.
But this:
2019-02-13 03:30:22.041879: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
is not the error?
Because I often see "skipping batch" , so I think the batch is skipped and the training process got errors?
Thank you for your help!
Edited: did you see the warning and "no valid path found" in CTC when using my dataset?
from calamari.
Also when I use --train_data_on_the_fly
, I got this error:
Resolving input files
Found 60000 files in the dataset
datset = <calamari_ocr.ocr.datasets.file_dataset.FileDataSet object at 0x7fd6046751d0>
Preloading dataset type DataSetMode.TRAIN with size 60000
Loading Dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 60000/60000 [07:59<00:00, 125.26it/s]
Traceback (most recent call last):
File "./calamari_ocr/scripts/train.py", line 315, in <module>
main()
File "./calamari_ocr/scripts/train.py", line 311, in main
run(args)
File "./calamari_ocr/scripts/train.py", line 299, in run
progress_bar=not args.no_progress_bars
File "/root/TA/calamari/calamari_ocr/ocr/trainer.py", line 112, in train
self.dataset.preload(processes=checkpoint_params.processes, progress_bar=progress_bar)
File "/root/TA/calamari/calamari_ocr/ocr/datasets/input_dataset.py", line 60, in preload
texts = self.text_processor.apply(txts, processes=processes, progress_bar=progress_bar)
File "/root/TA/calamari/calamari_ocr/ocr/text_processing/text_processor.py", line 17, in apply
return parallel_map(self._apply_single, txts, desc="Text Preprocessing", processes=processes, progress_bar=progress_bar)
File "/root/TA/calamari/calamari_ocr/utils/multiprocessing.py", line 40, in parallel_map
with multiprocessing.Pool(processes=processes, maxtasksperchild=max_tasks_per_child) as pool:
File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
context=self.get_context())
File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
self._repopulate_pool()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
w.start()
File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
from calamari.
Edited: did you see the warning and "no valid path found" in CTC when using my dataset?
Yes, my output (using --display 10
)
#00000000: loss=516.87536621 ler=1.88759577 dt=44.58602881s
PRED: 'É&Ợ5Ù95ọ5ẠọẰừọSọỢỰẰĩgầVỢĨMọSIẰụẰŨĨ&ỚSĨòỢừỘẰọĨừọSỢọĨừọỢọSĨỘọừòSậWĨŨỘọẠọĨẰĨỐĨ9Ũ'
TRUE: 'Địa chỉ: Trần Hưng Đạo, Phường Lê Bình, Quận Cái Răng, Cần Thơ'
#00000010: loss=336.84377636 ler=1.15732210 dt=4.88046891s
PRED: ''
TRUE: 'Địa chỉ: ấp Long Bình, Xã Long Điền A, Huyện Chợ Mới, An Giang'
2019-02-13 09:49:08.596539: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
WARNING: Infinite loss. Skipping batch.
#00000020: loss=260.75162315 ler=1.04867650 dt=3.11120788s
PRED: 'T H H'
TRUE: 'Mã số thuế: CÔNG TY TNHH DỊCH VỤ VÀ ĐÀO TẠO TRÍ ĐỨC'
#00000030: loss=229.56856613 ler=1.00270691 dt=2.37745589s
PRED: ''
TRUE: 'Mã số thuế: DOANH NGHIỆP TƯ NHÂN THÚY TÀI'
#00000040: loss=213.85240898 ler=0.98507622 dt=2.02556430s
PRED: ''
TRUE: 'Địa chỉ: 22/3/2 Phú Mộng, Phường Kim Long, Thành phố Huế, Thừa Thiên Huế.'
The error means that in a single iteration an error occurred during the computation of the loss/gradients, which is why this single iteration batch is ignored, i.e. the weights are not updated. It is shown approximately every 40-50 iterations which means that 1 out of 400-500 files (batch size 10) is probably corrupted and thus ignored.
from calamari.
Also when I use
--train_data_on_the_fly
, I got this error:
For the smaller dataset I require approx 3 GB RAM for training when loading into the RAM. Probably the whole dataset is too large to fit completely in the RAM which is why you have to use `--train_data_on_the_fly' here no more than 4 GB RAM should be required.
Please test if the number of files are the reason for this OutOfMemory
error.
from calamari.
Thank you,
Can you suggest how I can avoid "No valid path" in CTC loss?
from calamari.
@gofortargets how did you generate the dataset for training ?
from calamari.
how to stop training , number of itr= 8790
from calamari.
loss=0.55423099 ler=0.15204727 dt=4.05446383s is reached with itr 8790 , when will the training be stopped or should I stop manually and use the model number for prediction?
from calamari.
@srikanthsampathi Three options:
- Specify the maximum number of iterations
--max_iters=10000
- Use early stopping
--validation VALIDATION_DATASET
- Manually stop
from calamari.
Related Issues (20)
- Cannot convert a symbolic Tensor - Cannot even initialize the Predictor object HOT 2
- Characters coordinates HOT 1
- training: Cannot convert a symbolic Tensor to a numpy array HOT 7
- HDF5 dataset format: how to convert HOT 4
- calamari-train: warmstart not working without also giving network spec
- featreq: when warmstart-training, init weights of new chars from existing ones HOT 2
- calamari-eval: skip missing pairs HOT 3
- calamari-eval: unknown arguments HOT 6
- calamari-eval: confusion table miscalculates relative frequency HOT 3
- Error when convert old trained model to latest version model HOT 1
- Got exception during training HOT 4
- calamari-ocr 2.2.2 on ubuntu 22.04 partial success, difficulty with GPU software
- Prediction from calamari trained .pb model HOT 5
- Issue while using the model and json HOT 8
- setup.py on Ubuntu20.04: tensorflow is wrong version HOT 7
- Model very sensitive on PNG input HOT 3
- calamari/1.0: hold Tensorflow and Protobuf dependencies HOT 6
- What is the accuracy on Chinese/Japanese text? HOT 2
- Attention layer
- "No training configuration" for code that should not have one HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from calamari.