how can i use this to generate cineses word image

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

can use my own word type, and word ? like : chinese about synthtext HOT 21 CLOSED

Jayhello commented on July 24, 2024

can use my own word type, and word ? like : chinese

from synthtext.

Comments (21)

crazylyf commented on July 24, 2024

Yes, it can be used to generate non-ASCII characters like Chinese, but you will need to do some adaptation.

from synthtext.

ankush-me commented on July 24, 2024

You would need to (at least) make the following changes:

Add Chinese fonts --> generate the corresponding font_px2pt.cp
Get a Chinese text source (to replace newsgroup.txt)

from synthtext.

crazylyf commented on July 24, 2024

Be careful of Chinese fonts, some characters in your vocabulary may be not covered. Some font contain more than 10k characters, while others contain ~4k common Chinese characters.

from synthtext.

Jayhello commented on July 24, 2024

@crazylyf I want to use this "SynthText" to generate images with natural chinese words. then to do chinese words recognition in the image.
my question is ：

chinese words is about 3K, and more complicated than English words. can this be possible ( with high a accuracy)
I may need lots of samples , i don't know how to train and define network( code by caffe )
Does you did the reference works? or can you give any ideas?

from synthtext.

crazylyf commented on July 24, 2024

@Jayhello
1. Yes, definitely. There are many end-2-end trainable networks for text-line/word recognition. Although they are mostly focused on English, they are adaptable to Chinese.
2. As to reference works, you can refer to "http://arxiv.org/abs/1507.05717", which has an good torch implementation at: (https://github.com/bgshih/crnn). It is nearly off-the-shelf for recognition. If you try it, may be you want to use warp_ctc(https://github.com/baidu-research/warp-ctc) to replace the built-in CTC implementation, which runs on CPU and thus relatively slow.
3. Is 3k characters really enough for you?

from synthtext.

Jayhello commented on July 24, 2024

@crazylyf
thank you very much for you reply !! I have read the paper your recommend
and my question is

how to prepare training images , for example the character " 你" in chinese, how many image about this
char to prepare, and the image form ( the image i want to recognize which has a complicated background,like the image below . and how the trained image size is provided)

I know i should localities the char sequence firstly and then to recognition.
For location chars this https://github.com/MhLiao/TextBoxes is useful

from synthtext.

crazylyf commented on July 24, 2024

@Jayhello
Sorry, I have exact answer on how many characters for each words to prepare. Usually, one generate samples from given corpus, which contains quite diverse character frequency, and common character like "你" has much larger occurrence.
The text on your example seems added afterwards via some photo editing tool, it may be different from the synthesized text here, which suppose that the text is located on well defined regions. Perhaps you should try release or loosen this constraint to suit your case.

from synthtext.

Jayhello commented on July 24, 2024

the origin image is below, the mark in the image is located by deep learnhttps://github.com/MhLiao/TextBoxes

and are you a chinese people?

from synthtext.

crazylyf commented on July 24, 2024

Yeah

from synthtext.

Jayhello commented on July 24, 2024

@crazylyf
我生成图片拿去识别训练的话，应该用怎样的图片呢？
如果用下面的第一张，那就像是OCR了，没有什么意义？下面的第二张这样？
那得多少张图片呢？每个字要 1K 张图片？一张图片包含很多字？

from synthtext.

crazylyf commented on July 24, 2024

@Jayhello
除非做文档识别，不然肯定采用第二种图片。
具体每个字需要多少个样本，没有相关数据，我也没有做过相关实验。个人认为大致在几十个，具体还看应用场景。

from synthtext.

crazylyf commented on July 24, 2024

@Jayhello
我有一个163邮箱，用户名是crazylyf。有兴趣私聊吧

from synthtext.

Jayhello commented on July 24, 2024

@crazylyf
非常感谢哈，我也有代码往图片上面打上文字以及坐标。
你没有[email protected] 吗？

from synthtext.

Jayhello commented on July 24, 2024

@crazylyf
SynthText 应该也可以生成这样的图片吧

from synthtext.

xiaomaxiao commented on July 24, 2024

@crazylyf https://github.com/MhLiao/TextBoxes 你是重新训练了吗

from synthtext.

crazylyf commented on July 24, 2024

@xiaomaxiao TextBoxes？没有

from synthtext.

xiaomaxiao commented on July 24, 2024

@crazylyf 直接可以用于汉字检测了？

from synthtext.

crazylyf commented on July 24, 2024

不好意思，没看原文，不过我想应该可以吧。

from synthtext.

xiaomaxiao commented on July 24, 2024

@crazylyf 哇那真是不错，CTPN也可以直接检测汉字。TextbOX 再CPU下的速度如何？

from synthtext.

crazylyf commented on July 24, 2024

没试过TextBoxes，不太清楚。

from synthtext.

xiaomaxiao commented on July 24, 2024

抱歉@错人了， @Jayhello

from synthtext.

can use my own word type, and word ? like : chinese about synthtext HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent