I have tried the parameter settings in your paper to train the model, but the performa

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for your reply <a class="user-mention notranslate" data-hovercard-type="user" d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Parameter settings for training about textboxes HOT 14 CLOSED

mhliao commented on September 24, 2024

Parameter settings for training

from textboxes.

Comments (14)

MhLiao commented on September 24, 2024

@lillyPJ You should change the batch size to 32. If you set the batch size to 8, you did not complete one epoch for 800,000 images when trained with 50k iterations. This may be one of the problem.
By the way, did you use data augmentation?

from textboxes.

lillyPJ commented on September 24, 2024

Thanks for your reply @MhLiao . I have changed the batch size to 16 (due to my GPU limit), and doubled the iterations before( the step of for the learning rate was changed either). But it did not work. Even with the batch size being set to 32, the epoch is only about 2. Is that enough for the training ?
The data augmentation I used follows your paper and SSD:

layer {
name: "data"
type: "AnnotatedData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: false
mean_value: 104
mean_value: 117
mean_value: 123
resize_param {
prob: 1
resize_mode: WARP
height: 300
width: 300
interp_mode: LINEAR
interp_mode: AREA
interp_mode: NEAREST
interp_mode: CUBIC
interp_mode: LANCZOS4
}
emit_constraint {
emit_type: CENTER
}
}
data_param {
source: "examples/VGG/VGG_train_lmdb"
batch_size: 16
backend: LMDB
}
annotated_data_param {
batch_sampler {
max_sample: 1
max_trials: 1
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.1
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.3
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.5
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.7
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.9
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
max_jaccard_overlap: 1.0
}
max_sample: 1
max_trials: 50
}
label_map_file: "data/VGG/labelmap.prototxt"
}
}

from textboxes.

MhLiao commented on September 24, 2024

It seems OK for your setting. The difference that I noticed is only the batch size. How was your precision and recall when you used the batch size of 16.2017-02-09 14:15:34>lillyPJ 写道： Thanks for your reply @MhLiao . I have changed the batch size to 16 (due to my GPU limit), and doubled the iterations before( the step of for the learning rate was changed either). But it did not work. Even with the batch size being set to 32, the epoch is only about 2. Is that enough for the training ? The data augmentation I used follows your paper and SSD: layer { name: "data" type: "AnnotatedData" top: "data" top: "label" include { phase: TRAIN } transform_param { mirror: false mean_value: 104 mean_value: 117 mean_value: 123 resize_param { prob: 1 resize_mode: WARP height: 300 width: 300 interp_mode: LINEAR interp_mode: AREA interp_mode: NEAREST interp_mode: CUBIC interp_mode: LANCZOS4 } emit_constraint { emit_type: CENTER } } data_param { source: "examples/VGG/VGG_train_lmdb" batch_size: 16 backend: LMDB } annotated_data_param { batch_sampler { max_sample: 1 max_trials: 1 } batch_sampler { sampler { min_scale: 0.3 max_scale: 1.0 min_aspect_ratio: 0.3 max_aspect_ratio: 2.0 } sample_constraint { min_jaccard_overlap: 0.1 } max_sample: 1 max_trials: 50 } batch_sampler { sampler { min_scale: 0.3 max_scale: 1.0 min_aspect_ratio: 0.3 max_aspect_ratio: 2.0 } sample_constraint { min_jaccard_overlap: 0.3 } max_sample: 1 max_trials: 50 } batch_sampler { sampler { min_scale: 0.3 max_scale: 1.0 min_aspect_ratio: 0.3 max_aspect_ratio: 2.0 } sample_constraint { min_jaccard_overlap: 0.5 } max_sample: 1 max_trials: 50 } batch_sampler { sampler { min_scale: 0.3 max_scale: 1.0 min_aspect_ratio: 0.3 max_aspect_ratio: 2.0 } sample_constraint { min_jaccard_overlap: 0.7 } max_sample: 1 max_trials: 50 } batch_sampler { sampler { min_scale: 0.3 max_scale: 1.0 min_aspect_ratio: 0.3 max_aspect_ratio: 2.0 } sample_constraint { min_jaccard_overlap: 0.9 } max_sample: 1 max_trials: 50 } batch_sampler { sampler { min_scale: 0.3 max_scale: 1.0 min_aspect_ratio: 0.3 max_aspect_ratio: 2.0 } sample_constraint { max_jaccard_overlap: 1.0 } max_sample: 1 max_trials: 50 } label_map_file: "data/VGG/labelmap.prototxt" } } — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

from textboxes.

lillyPJ commented on September 24, 2024

Pretraining (100k iteration on VGG-dataset) performance: recall = 0.67, precision = 0.62, f-measure = 0.65.
Training (4k iteration on ICDAR2013) performance: recall =0.77 , precision =0.63 , f-measure = 0.69.
It seems many words has multiple overlapped (but not precise) bounding boxes.

from textboxes.

lillyPJ commented on September 24, 2024

Could you provide your recall, precision, final train-loss of the pretraining stage and training stage?

from textboxes.

MhLiao commented on September 24, 2024

@lillyPJ The final performance: recall=0.74, precision=0.86, f-measure=0.80 when use 700*700 input images. You may try to adjust the detection threshold and NMS threshold to achieve better performance.

from textboxes.

lufo816 commented on September 24, 2024

@lillyPJ Hi, text in VGG synthetic data is oriented and it's label has 4 points? Can you tell me how to use this data for trainging? Thanks a lot!

from textboxes.

lillyPJ commented on September 24, 2024

For simplicity, you can use xmin = min(x1, x2, x3, x4), ymin = min(y1, y2, y3, y4), xmax = max(x1, x2, x3, x4), ymax = max(y1, y2, y3, y4) for training. @lufo816

from textboxes.

lufo816 commented on September 24, 2024

@lillyPJ Thanks!

from textboxes.

HelloTobe commented on September 24, 2024

@lillyPJ Hi, can you tell me how to calculate the pricision, recall and f-measure?

Could you provide me with the source codes (matlab or python)?

from textboxes.

HelloTobe commented on September 24, 2024

@lillyPJ Hi,
In your step1 and step2, what's your test data and what's the test batchsize respectively?

Do you split the SynthText into train and test?

Do you use the paper's default train code? (train_icdar13.py)

from textboxes.

HelloTobe commented on September 24, 2024

@MhLiao Hi,
the results you mention above, in @lillyPJ The final performance: recall=0.74, precision=0.86, f-measure=0.80 when use 700*700 input images. You may try to adjust the detection threshold and NMS threshold to achieve better performance.

What protocol do you use to get the results? The evaluation_nms.m file in your codes or the ICDAR 2013

protocol?

I test the TextBoxes_icdar13.caffemodel (you provide with us) in different protocols.

for single scale (700*700 input) ( score>0.6 )
evaluation_nms.m: recall=0.7641, precision=0.8528, f-measure=0.7959
ICDAR 2013: recall=0.7273, precison=0.8276, f-measure=0.7742

for multiple scales: (score>0.9)
evaluation_nms.m: recall=0.8292, precision=0.8764, f-measure=0.8562
ICDAR 2013: recall=0.8046, precision=0.8402, f-measure=0.8220

Performance
Using the given test code, you can achieve an F-measure of about 80% on ICDAR 2013 with a single scale.
Using the given multi-scale test code, you can achieve an F-measure of about 85% on ICDAR 2013 with a non-maximum suppression.

It seems only testing by evaluation_nms.m can achieve the results.

P.S. I write the ICDAR 2013 protocol above by myself. Maybe i make mistakes in it.

from textboxes.

MhLiao commented on September 24, 2024

@HelloTobe For single scale input, you can upload it to the ICDAR 2013 website for evaluation; for multi-scale input, you can upload it to the ICDAR 2013 website for evaluation after nms. The website is: http://rrc.cvc.uab.es

from textboxes.

HelloTobe commented on September 24, 2024

@MhLiao Thanks.

from textboxes.

Parameter settings for training about textboxes HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent