Coder Social home page Coder Social logo

Comments (14)

MhLiao avatar MhLiao commented on September 24, 2024

@lillyPJ You should change the batch size to 32. If you set the batch size to 8, you did not complete one epoch for 800,000 images when trained with 50k iterations. This may be one of the problem.
By the way, did you use data augmentation?

from textboxes.

lillyPJ avatar lillyPJ commented on September 24, 2024

Thanks for your reply @MhLiao . I have changed the batch size to 16 (due to my GPU limit), and doubled the iterations before( the step of for the learning rate was changed either). But it did not work. Even with the batch size being set to 32, the epoch is only about 2. Is that enough for the training ?
The data augmentation I used follows your paper and SSD:

layer {
name: "data"
type: "AnnotatedData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: false
mean_value: 104
mean_value: 117
mean_value: 123
resize_param {
prob: 1
resize_mode: WARP
height: 300
width: 300
interp_mode: LINEAR
interp_mode: AREA
interp_mode: NEAREST
interp_mode: CUBIC
interp_mode: LANCZOS4
}
emit_constraint {
emit_type: CENTER
}
}
data_param {
source: "examples/VGG/VGG_train_lmdb"
batch_size: 16
backend: LMDB
}
annotated_data_param {
batch_sampler {
max_sample: 1
max_trials: 1
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.1
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.3
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.5
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.7
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.9
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.3
max_aspect_ratio: 2.0
}
sample_constraint {
max_jaccard_overlap: 1.0
}
max_sample: 1
max_trials: 50
}
label_map_file: "data/VGG/labelmap.prototxt"
}
}

from textboxes.

MhLiao avatar MhLiao commented on September 24, 2024

from textboxes.

lillyPJ avatar lillyPJ commented on September 24, 2024

Pretraining (100k iteration on VGG-dataset) performance: recall = 0.67, precision = 0.62, f-measure = 0.65.
Training (4k iteration on ICDAR2013) performance: recall =0.77 , precision =0.63 , f-measure = 0.69.
It seems many words has multiple overlapped (but not precise) bounding boxes.

from textboxes.

lillyPJ avatar lillyPJ commented on September 24, 2024

Could you provide your recall, precision, final train-loss of the pretraining stage and training stage?

from textboxes.

MhLiao avatar MhLiao commented on September 24, 2024

@lillyPJ The final performance: recall=0.74, precision=0.86, f-measure=0.80 when use 700*700 input images. You may try to adjust the detection threshold and NMS threshold to achieve better performance.

from textboxes.

lufo816 avatar lufo816 commented on September 24, 2024

@lillyPJ Hi, text in VGG synthetic data is oriented and it's label has 4 points? Can you tell me how to use this data for trainging? Thanks a lot!

from textboxes.

lillyPJ avatar lillyPJ commented on September 24, 2024

For simplicity, you can use xmin = min(x1, x2, x3, x4), ymin = min(y1, y2, y3, y4), xmax = max(x1, x2, x3, x4), ymax = max(y1, y2, y3, y4) for training. @lufo816

from textboxes.

lufo816 avatar lufo816 commented on September 24, 2024

@lillyPJ Thanks!

from textboxes.

HelloTobe avatar HelloTobe commented on September 24, 2024

@lillyPJ Hi, can you tell me how to calculate the pricision, recall and f-measure?

Could you provide me with the source codes (matlab or python)?

from textboxes.

HelloTobe avatar HelloTobe commented on September 24, 2024

@lillyPJ Hi,
In your step1 and step2, what's your test data and what's the test batchsize respectively?

Do you split the SynthText into train and test?

Do you use the paper's default train code? (train_icdar13.py)

from textboxes.

HelloTobe avatar HelloTobe commented on September 24, 2024

@MhLiao Hi,
the results you mention above, in @lillyPJ The final performance: recall=0.74, precision=0.86, f-measure=0.80 when use 700*700 input images. You may try to adjust the detection threshold and NMS threshold to achieve better performance.

What protocol do you use to get the results? The evaluation_nms.m file in your codes or the ICDAR 2013

protocol?

I test the TextBoxes_icdar13.caffemodel (you provide with us) in different protocols.

for single scale (700*700 input) ( score>0.6 )
evaluation_nms.m: recall=0.7641, precision=0.8528, f-measure=0.7959
ICDAR 2013: recall=0.7273, precison=0.8276, f-measure=0.7742

for multiple scales: (score>0.9)
evaluation_nms.m: recall=0.8292, precision=0.8764, f-measure=0.8562
ICDAR 2013: recall=0.8046, precision=0.8402, f-measure=0.8220

Performance
Using the given test code, you can achieve an F-measure of about 80% on ICDAR 2013 with a single scale.
Using the given multi-scale test code, you can achieve an F-measure of about 85% on ICDAR 2013 with a non-maximum suppression.

It seems only testing by evaluation_nms.m can achieve the results.

P.S. I write the ICDAR 2013 protocol above by myself. Maybe i make mistakes in it.

from textboxes.

MhLiao avatar MhLiao commented on September 24, 2024

@HelloTobe For single scale input, you can upload it to the ICDAR 2013 website for evaluation; for multi-scale input, you can upload it to the ICDAR 2013 website for evaluation after nms. The website is: http://rrc.cvc.uab.es

from textboxes.

HelloTobe avatar HelloTobe commented on September 24, 2024

@MhLiao Thanks.

from textboxes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.