guanshuicheng / invoice Goto Github PK

View Code? Open in Web Editor NEW

1.6K 1.6K 413.0 7.78 MB

增值税发票OCR识别，使用flask微服务架构，识别type：增值税电子普通发票，增值税普通发票，增值税专用发票；识别字段为：发票代码、发票号码、开票日期、校验码、税后金额等

License: MIT License

Python 10.41% Makefile 0.22% C 83.01% Shell 0.16% Cuda 5.58% C++ 0.24% Batchfile 0.01% Cython 0.38%

crnn-ctc deeplearning flask invoice keras-tensorflow python3 torch yolov3

invoice's People

Contributors

Stargazers

Watchers

Forkers

gaussyoung lgb020 sotsam suningwz bobokingbao luka-scut roughsoft sihangsong bapleliu quuhua911 bobqiu zyxyuanxiao ricky2022 aiscientists huangdehui2013 hxdevcn dgo2dance aiwenforgit vilon888 rymmx-gls 1026295417 mzq308734881 yangmyc tianfxcse wall-ee neugls leaveneaos zhilangtaosha simliving 0xcreo happog kernelforce youngquan tdlist gentletorch liuweiping2020 ronnielee135 liuyanhang xrosliang frankiegu icloveri aliushn becklyn77 yours2008 xiaotianht javapioneer jadeluo msxx001 zhaowqstar cansijyun yangyin2016 xianglizuel yuongboy5896 briantmali dulizhi yinmingjun tingfv siyecao99999 kiciro qzct421567 deeplearning2012 zhiliangpersonal yangpingyan zonghaofan wishgale guoyin90 forstmaple crasybest jackyyvan seanko wwwanghao miknyko sibo-git neveroldmilk adewin plttlp wangcuimei loongle aaferrero hugh2632 advancer-debug bugfyi nigo81 yuanshoujing yjdqk justintung tchen7 clam314 gongdj wuxiaolianggit mc261670164 nothinglz cquptprotecthair perfectbullet lwzbuaa wscym xiaomimi3456 shanhedian2017 aiqingma jieyin19

invoice's Issues

请求参数错误

[请求的时候不是直接上传图片吗？请求参数是？]

返回结果是：
{
"FileName": {},
"code": 101,
"data": {},
"message": "请求参数错误",
"ocrIdentifyTime": {}
}

Issue regarding uploading file filtering

Hello,
While trying the tool, I find that the uploading file functionality relies on using the user-provided filename extension which could be a security issue as described in CWE-646: Reliance on File Name or Extension of Externally-Supplied File.
Attacker could obfuscate the file name extension and drop malicious code on the server for the further attack.
Thanks for reading.

Hi this app work with another languajes e.g. English, Spanish?

closed

您好，请问可以加入对价税合计的识别吗，谢谢

Can "models.rar" be opened?

I've download "models.rar" but can not open it. Any one fit the same problem?

You must feed a value for placeholder tensor 'Placeholder_368' with dtype float and shape [2]

这样算是跑起来了吗？还是怎么回事

请问训练的脚本在哪里呢

刚刚接触这块，不知道你是如何训练的，以及训练的代码在哪儿？

AttributeError: module 'tensorflow_core.keras.backend' has no attribute 'get_session'

I got the error like this :

Traceback (most recent call last):
File "app.py", line 14, in
from model_post_type import ocr as OCR
File "/Users/xianglingyun/invoice_ocr/invoice-master/model_post_type.py", line 32, in
from text import keras_detect_type as detect
File "/Users/xianglingyun/invoice_ocr/invoice-master/text/keras_detect_type.py", line 19, in
sess = K.get_session()

can you tell me how to fix it?

我想问下识别更多区域得模型有吗

数据标注问题

您好,
我使用您提供的YOLO3模型进行预测的时候,预测出的框如下图:

请问一下,在训练模型之前的数据标注阶段,是以发票什么部分作为YOLO3的目标进行检测?
我翻看了其他issue中提到的 chinese-ocr,还是有些不太明白,可以具体的请教一下数据是如何标注的吗?

谢谢!

yolo训练

您好，方便的话，能否给下您YOLO 文字检测的训练代码。Email：[email protected]

你好，有2个问题请教下

1.我现在跑起来后，内存暂用非常高，大约会暂用4G左右
2.识别速度比较慢，需要5s左右，把识别发票类型代码注释后，稍微快了点3s左右
刚刚开始学习识别，请教这2个问题怎么解决，谢谢。

postman 的参数请提供下呢

使用postman测试报错

显存释放问题

多次调用接口会出现显存的累积，如何释放？

训练模型的思路可以提供一下么

这是去年根据chineseocr微调的吗？

我看评论说这个识别的区域只有5个？

你是微调做的吗？微调数据量检测和ocr两部分分别是多少？

ps：居然遇到了一个也是姓管的，哈哈哈

请问您可以公开下发票识别的数据集么

我最近也在做类似的研究，但苦于数据集不足，自己也收集了一部分数据，但还是不够，请问能公开下您的训练集么，感谢！

corrupted size vs. prev_size

corrupted size vs. prev_size
已放弃 (核心已转储)

使用了test-invoice文件夹中的发票，
当上传一张发票时候直接报以上错误，程序便自动结束。

完整信息如下
[{'text': '天津增值税电子普通发票', 'cx': 451.0, 'cy': 80.5, 'w': 309.0616503223912, 'h': 25.99999999999999, 'degree': 0.18592417856631746}]
['电子普通']
['普通发票']
[]
corrupted size vs. prev_size
已放弃 (核心已转储)

have a problem

question: cv2.error: OpenCV(3.4.3) C:\projects\opencv-python\opencv\modules\dnn\src\darknet\darknet_importer.cpp:207: error: (-212:Parsing error) Failed to parse NetParameter file: D:\新建文件夹\invoice\models\text.cfg in function 'cv::dnn::experimental_dnn_34_v7::readNetFromDarknet'

python3.6.7 +opencv3.4.3.18

yolo3区域检测问题

您好，关于yolo3区域检测是如何只检测5个文字目标区域的，能加下微信交流下么

No such file or directory

FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\ZKRH001\Desktop\invoice-master\models\ocr-lstm.pth'

Traceback (most recent call last):
File "C:\Users\ZKRH001\Desktop\invoice-master\app.py", line 14, in
from model_post_type import ocr as OCR
File "C:\Users\ZKRH001\Desktop\invoice-master\model_post_type.py", line 10, in
from crnn.crnn_torch import crnnOcr as crnnOcr ##torch版本ocr
File "C:\Users\ZKRH001\Desktop\invoice-master\crnn\crnn_torch.py", line 38, in
model,converter = crnnSource()
File "C:\Users\ZKRH001\Desktop\invoice-master\crnn\crnn_torch.py", line 26, in crnnSource
trainWeights = torch.load(ocrModel,map_location=lambda storage, loc: storage)
File "C:\Users\ZKRH001\AppData\Roaming\Python\Python39\site-packages\torch\serialization.py", line 594, in load
with _open_file_like(f, 'rb') as opened_file:
File "C:\Users\ZKRH001\AppData\Roaming\Python\Python39\site-packages\torch\serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "C:\Users\ZKRH001\AppData\Roaming\Python\Python39\site-packages\torch\serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\ZKRH001\Desktop\invoice-master\models\ocr-lstm.pth'
请按任意键继续. . .

环境版本问题，求详细requirements

大佬能不能帮忙把requirements内的版本号加上啊，像pip freeze生成的那种

请教下window下这是什么问题？

如何改变http://127.0.0.1:1111/invoice-ocr中的ip和端口

Running on http://0.0.0.0:1111/ (Press CTRL+C to quit)
运行app.py之后打开的服务，通过postman进行post请求http://127.0.0.1:1111/invoice-ocr，能够实现json输出。
请问我想把改变http://127.0.0.1:1111/invoice-ocr中的ip和端口，怎么该更改呢，我尝试过改变ip和端口，都不能发送请求

项目跑不起来

有没有兄弟跑起来的 tensorflow 1.14.0 版本 python版本 3.6.7
报错信息：
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_368' with dtype float and shape [2]
[[{{node Placeholder_368}}]]

您好，请问可以用keras的YOLOV3训练检测区域吗

tensorflow 什么时候升级到2.x？

也考虑一下升级环境到Python 3.8吧。毕竟3.6明年就停止支持了

Allocation of 411041792 exceeds 10% of system memory.

how to deal with the problem? the memory is too smaller? I use 8G RAM for win10 systerm

API調用

請問一下要怎麼POST照片

识别区域是如何训练的

感觉作者的文字检测中，相比于Chines OCR 还包含了识别区域的检测，请问作者如何标注和训练识别区域的，谢谢

Process finished with exit code -1073740791错误该如何解决

服务调用问题

Serving Flask app "app" (lazy loading)
Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
Debug mode: off
Running on http://0.0.0.0:11111/ (Press CTRL+C to quit)

在浏览器上 http://0.0.0.0:11111/invoice-ocr 网页打不开

报错说yolo3那里的变量没有初始化，请问怎么解决呢

Traceback (most recent call last):
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 2091, in call
return self.wsgi_app(environ, start_response)
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 2076, in wsgi_app
response = self.handle_exception(e)
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "C:\Users\HP\Desktop\invoice-master\app.py", line 110, in invoice_ocr
Recognition_invoice = Recognition_invoice(whole_path)
File "C:\Users\HP\Desktop\invoice-master\app.py", line 81, in Recognition_invoice
result_type = OCR(img1)
File "C:\Users\HP\Desktop\invoice-master\model_post_type.py", line 156, in ocr
alph=0.01, ##对检测的文本行进行向右、左延伸的倍数
File "C:\Users\HP\Desktop\invoice-master\model_post_type.py", line 124, in model
text_recs = text_detect(**config) ##文字检测
File "C:\Users\HP\Desktop\invoice-master\model_post_type.py", line 45, in text_detect
boxes, scores = detect.text_detect(np.array(img))
File "C:\Users\HP\Desktop\invoice-master\text\keras_detect_type.py", line 57, in text_detect
K.learning_phase(): 0
File "C:\Users\HP\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 958, in run
run_metadata_ptr)
File "C:\Users\HP\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1181, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\HP\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1359, in _do_run
run_metadata)
File "C:\Users\HP\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable conv2d_74/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/conv2d_74/kernel)
[[node conv2d_74/Conv2D/ReadVariableOp (defined at C:\Users\HP\Desktop\invoice-master\text\keras_yolo3.py:49) ]]

求模型训练代码

作者你好，我在学习这个OCR项目，想得到所有模型的训练代码，不知道方便发一下吗。
我的微信：wsh_2766659938
邮箱：[email protected]

请问有没有办法解决pycharm函数注释提示问题？

You need configured Python 2 SDK to render Epydoc docstrings

curl post "请求参数错误"

(base) xianglingyun@xianglingyundeMacBook-Pro invoice-master % curl http://127.0.0.1:11111/invoice-ocr -X POST -d 'file=@电子发票-test.png'
{"FileName":{},"code":101,"data":{},"message":"请求参数错误","ocrIdentifyTime":{}}

when I running the script, It will have above issue

数据集

可以提供部分数据集吗？想train一下代码。谢谢

如何进行检测标注

您好，能交流一下您做检测训练时是如何标注吗的，方便的话能加一下您微信吗

Postman里是在Body的Key写file,Value选择图片吗，返回如下报错

<TITLE>Error Message</TITLE> <STYLE id=L_10060_1>A { FONT-WEIGHT: bold; FONT-SIZE: 10pt; COLOR: #005a80; FONT-FAMILY: tahoma } A:hover { FONT-WEIGHT: bold; FONT-SIZE: 10pt; COLOR: #0d3372; FONT-FAMILY: tahoma } TD { FONT-SIZE: 8pt; FONT-FAMILY: tahoma } TD.titleBorder { BORDER-RIGHT: #955319 1px solid; BORDER-TOP: #955319 1px solid; PADDING-LEFT: 8px; FONT-WEIGHT: bold; FONT-SIZE: 12pt; VERTICAL-ALIGN: middle; BORDER-LEFT: #955319 0px solid; COLOR: #955319; BORDER-BOTTOM: #955319 1px solid; FONT-FAMILY: tahoma; HEIGHT: 35px; BACKGROUND-COLOR: #d2b87a; TEXT-ALIGN: left } TD.titleBorderx { BORDER-RIGHT: #955319 0px solid; BORDER-TOP: #955319 1px solid; PADDING-LEFT: 8px; FONT-WEIGHT: bold; FONT-SIZE: 12pt; VERTICAL-ALIGN: middle; BORDER-LEFT: #955319 1px solid; COLOR: #978c79; BORDER-BOTTOM: #955319 1px solid; FONT-FAMILY: tahoma; HEIGHT: 35px; BACKGROUND-COLOR: #d2b87a; TEXT-ALIGN: left } .TitleDescription { FONT-WEIGHT: bold; FONT-SIZE: 12pt; COLOR: black; FONT-FAMILY: tahoma } SPAN.explain { FONT-WEIGHT: normal; FONT-SIZE: 10pt; COLOR: #934225 } SPAN.TryThings { FONT-WEIGHT: normal; FONT-SIZE: 10pt; COLOR: #934225 } .TryList { MARGIN-TOP: 5px; FONT-WEIGHT: normal; FONT-SIZE: 8pt; COLOR: black; FONT-FAMILY: tahoma } .X { BORDER-RIGHT: #955319 1px solid; BORDER-TOP: #955319 1px solid; FONT-WEIGHT: normal; FONT-SIZE: 12pt; BORDER-LEFT: #955319 1px solid; COLOR: #7b3807; BORDER-BOTTOM: #955319 1px solid; FONT-FAMILY: verdana; BACKGROUND-COLOR: #d1c2b4 } .adminList { MARGIN-TOP: 2px } </STYLE>

X

Network Access Message: The page cannot be displayed

	Explanation: The request timed out before the page could be retrieved. Try the following: Refresh page: Search for the page again by clicking the Refresh button. The timeout may have occurred due to Internet congestion. Check spelling: Check that you typed the Web page address correctly. The address may have been mistyped. Contact website: You may want to contact the website administrator to make sure the Web page still exists. You can do this by using the e-mail address or phone number listed on the website home page. If you are still not able to view the requested page, try contacting your administrator or Helpdesk. </TD> </TR> </TBODY> </TABLE> <TABLE id=spacer> <TBODY> <TR> <TD height=15></TD> </TR> </TBODY> </TABLE> <TABLE width=400> <TBODY> <TR> <TD noWrap width=25></TD> <TD width=400 id=L_10060_10> <B>Technical Information (for support personnel)</B> <UL class=adminList> <LI id=L_10060_11>Error Code 10060: Connection timeout <LI id=L_10060_12>Background: The gateway could not receive a timely response from the website you are trying to access. This might indicate that the network is congested, or that the website is experiencing technical difficulties. <LI id=L_10060_13>Date: 7/17/2020 7:59:50 AM [GMT] <LI id=L_10060_14>Server: HKUTMPWV004.cn.asia.ad.pwcinternal.com <LI id=L_10060_15>Source: Firewall </UL> </TD> </TR> </TBODY> </TABLE> </BODY> </HTML>

Explanation: The request timed out before the page could be retrieved.

Try the following:

Refresh page: Search for the page again by clicking the Refresh button. The timeout may have occurred due to Internet congestion.
Check spelling: Check that you typed the Web page address correctly. The address may have been mistyped.
Contact website: You may want to contact the website administrator to make sure the Web page still exists. You can do this by using the e-mail address or phone number listed on the website home page.

If you are still not able to view the requested page, try contacting your administrator or Helpdesk.

                            </TD>
                        </TR>
                    </TBODY>
                </TABLE>
                <TABLE id=spacer>
                    <TBODY>
                        <TR>
                            <TD height=15></TD>
                        </TR>
                    </TBODY>
                </TABLE>
                <TABLE width=400>
                    <TBODY>
                        <TR>
                            <TD noWrap width=25></TD>
                            <TD width=400 id=L_10060_10>
                                <B>Technical Information (for support personnel)</B>
                                <UL class=adminList>
                                    <LI id=L_10060_11>Error Code 10060: Connection timeout

                                        <LI id=L_10060_12>Background: The gateway could not receive a timely response from the website you are trying to access. This might indicate that the network is congested, or that the website is experiencing technical difficulties.

                                            <LI id=L_10060_13>Date: 7/17/2020 7:59:50 AM [GMT]

                                                <LI id=L_10060_14>Server: HKUTMPWV004.cn.asia.ad.pwcinternal.com

                                                    <LI id=L_10060_15>Source: Firewall

  
                                                    </UL>
                                                </TD>
                                            </TR>
                                        </TBODY>
                                    </TABLE>
                                </BODY>
                            </HTML>

requirements.txt能否给一个每个依赖包的版本

由于很多包的兼容性问题，执行pip install -r requirements.txt会有各种兼容性问题，有没有那位大佬能提供一套完整兼容的依赖版本的requirements.txt，在此感谢！！！

如何训练检测和识别过程?这个过程是合并还是单独?

你好！我已经分别写好了检测训练脚本和识别训练脚本，请问训练的时候是要一起训练吗？检测和识别部分是如何联系在一起的呢？

能不能搞个dockerfile，一键部署

已经生成了gpu_nms.so 现在import gpu_nms 还是不成功

识别区域太少了，楼主有没有兴趣识别更多区域

dependabot/pip/tensorflow-1.15.2 分支运行报错

1.15.2 这个版本支持吗？

Traceback (most recent call last):
File "app.py", line 14, in
from model_post_type import ocr as OCR
File "/home/test/invoice/model_post_type.py", line 32, in
from text import keras_detect_type as detect
File "/home/test/invoice/text/keras_detect_type.py", line 23, in
box_score = box_layer([*textModel.output,image_shape,input_shape],anchors, num_classes)
File "/home/test/invoice/text/keras_yolo3.py", line 366, in box_layer
boxes = concatenate(boxes, axis=0)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/merge.py", line 705, in concatenate
return Concatenate(axis=axis, **kwargs)(inputs)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 802, in call
base_layer_utils.create_keras_history(inputs)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 184, in create_keras_history
_, created_layers = _create_keras_history_helper(tensors, set(), [])
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 231, in _create_keras_history_helper
layer_inputs, processed_ops, created_layers)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 231, in _create_keras_history_helper
layer_inputs, processed_ops, created_layers)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 231, in _create_keras_history_helper
layer_inputs, processed_ops, created_layers)
[Previous line repeated 2 more times]
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 229, in _create_keras_history_helper
constants[i] = backend.function([], op_input)([])
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py", line 3476, in call
run_metadata=self.run_metadata)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1472, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_367' with dtype float and shape [2]
[[{{node Placeholder_367}}]]

tensorflow 1.13.1 可能出现错误，回退到1.12能够运行

tensorflow/tensorflow#27906

python3 app.py

按照文档装完依赖，跑的时候抛错