Coder Social home page Coder Social logo

guanshuicheng / invoice Goto Github PK

View Code? Open in Web Editor NEW
1.6K 1.6K 413.0 7.78 MB

增值税发票OCR识别,使用flask微服务架构,识别type:增值税电子普通发票,增值税普通发票,增值税专用发票;识别字段为:发票代码、发票号码、开票日期、校验码、税后金额等

License: MIT License

Python 10.41% Makefile 0.22% C 83.01% Shell 0.16% Cuda 5.58% C++ 0.24% Batchfile 0.01% Cython 0.38%
crnn-ctc deeplearning flask invoice keras-tensorflow python3 torch yolov3

invoice's People

Contributors

dependabot[bot] avatar guanshuicheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

invoice's Issues

请求参数错误

[请求的时候不是直接上传图片吗?请求参数是?]

返回结果是:
{
"FileName": {},
"code": 101,
"data": {},
"message": "请求参数错误",
"ocrIdentifyTime": {}
}

Issue regarding uploading file filtering

Hello,
While trying the tool, I find that the uploading file functionality relies on using the user-provided filename extension which could be a security issue as described in CWE-646: Reliance on File Name or Extension of Externally-Supplied File.
Attacker could obfuscate the file name extension and drop malicious code on the server for the further attack.
Thanks for reading.

AttributeError: module 'tensorflow_core.keras.backend' has no attribute 'get_session'

I got the error like this :

Traceback (most recent call last):
File "app.py", line 14, in
from model_post_type import ocr as OCR
File "/Users/xianglingyun/invoice_ocr/invoice-master/model_post_type.py", line 32, in
from text import keras_detect_type as detect
File "/Users/xianglingyun/invoice_ocr/invoice-master/text/keras_detect_type.py", line 19, in
sess = K.get_session()

can you tell me how to fix it?

数据标注问题

您好,
我使用您提供的YOLO3模型进行预测的时候,预测出的框如下图:
example
请问一下,在训练模型之前的数据标注阶段,是以发票什么部分作为YOLO3的目标进行检测?
我翻看了其他issue中提到的 chinese-ocr,还是有些不太明白,可以具体的请教一下数据是如何标注的吗?

谢谢!

你好,有2个问题请教下

1.我现在跑起来后,内存暂用非常高,大约会暂用4G左右
2.识别速度比较慢,需要5s左右,把识别发票类型代码注释后,稍微快了点3s左右
刚刚开始学习识别,请教这2个问题怎么解决,谢谢。

这是去年根据chineseocr微调的吗?

我看评论说这个识别的区域只有5个?

你是微调做的吗?微调数据量检测和ocr两部分分别是多少?

ps:居然遇到了一个也是姓管的,哈哈哈

corrupted size vs. prev_size

corrupted size vs. prev_size
已放弃 (核心已转储)

使用了test-invoice文件夹中的发票,
当上传一张发票时候直接报以上错误,程序便自动结束。

完整信息如下
[{'text': '天津增值税电子普通发票', 'cx': 451.0, 'cy': 80.5, 'w': 309.0616503223912, 'h': 25.99999999999999, 'degree': 0.18592417856631746}]
['电子普通']
['普通发票']
[]
corrupted size vs. prev_size
已放弃 (核心已转储)

have a problem

question: cv2.error: OpenCV(3.4.3) C:\projects\opencv-python\opencv\modules\dnn\src\darknet\darknet_importer.cpp:207: error: (-212:Parsing error) Failed to parse NetParameter file: D:\新建文件夹\invoice\models\text.cfg in function 'cv::dnn::experimental_dnn_34_v7::readNetFromDarknet'

python3.6.7 +opencv3.4.3.18

yolo3区域检测问题

您好,关于yolo3区域检测是如何只检测5个文字目标区域的,能加下微信交流下么

No such file or directory

FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\ZKRH001\Desktop\invoice-master\models\ocr-lstm.pth'

Traceback (most recent call last):
File "C:\Users\ZKRH001\Desktop\invoice-master\app.py", line 14, in
from model_post_type import ocr as OCR
File "C:\Users\ZKRH001\Desktop\invoice-master\model_post_type.py", line 10, in
from crnn.crnn_torch import crnnOcr as crnnOcr ##torch版本ocr
File "C:\Users\ZKRH001\Desktop\invoice-master\crnn\crnn_torch.py", line 38, in
model,converter = crnnSource()
File "C:\Users\ZKRH001\Desktop\invoice-master\crnn\crnn_torch.py", line 26, in crnnSource
trainWeights = torch.load(ocrModel,map_location=lambda storage, loc: storage)
File "C:\Users\ZKRH001\AppData\Roaming\Python\Python39\site-packages\torch\serialization.py", line 594, in load
with _open_file_like(f, 'rb') as opened_file:
File "C:\Users\ZKRH001\AppData\Roaming\Python\Python39\site-packages\torch\serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "C:\Users\ZKRH001\AppData\Roaming\Python\Python39\site-packages\torch\serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\ZKRH001\Desktop\invoice-master\models\ocr-lstm.pth'
请按任意键继续. . .

项目跑不起来

有没有兄弟跑起来的 tensorflow 1.14.0 版本 python版本 3.6.7
报错信息:
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_368' with dtype float and shape [2]
[[{{node Placeholder_368}}]]

识别区域是如何训练的

感觉作者的文字检测中,相比于Chines OCR 还包含了识别区域的检测,请问作者如何标注和训练识别区域的,谢谢

报错说yolo3那里的变量没有初始化,请问怎么解决呢

Traceback (most recent call last):
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 2091, in call
return self.wsgi_app(environ, start_response)
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 2076, in wsgi_app
response = self.handle_exception(e)
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "D:\Anacaonda\envs\invoice-master\lib\site-packages\flask\app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "C:\Users\HP\Desktop\invoice-master\app.py", line 110, in invoice_ocr
Recognition_invoice = Recognition_invoice(whole_path)
File "C:\Users\HP\Desktop\invoice-master\app.py", line 81, in Recognition_invoice
result_type = OCR(img1)
File "C:\Users\HP\Desktop\invoice-master\model_post_type.py", line 156, in ocr
alph=0.01, ##对检测的文本行进行向右、左延伸的倍数
File "C:\Users\HP\Desktop\invoice-master\model_post_type.py", line 124, in model
text_recs = text_detect(**config) ##文字检测
File "C:\Users\HP\Desktop\invoice-master\model_post_type.py", line 45, in text_detect
boxes, scores = detect.text_detect(np.array(img))
File "C:\Users\HP\Desktop\invoice-master\text\keras_detect_type.py", line 57, in text_detect
K.learning_phase(): 0
File "C:\Users\HP\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 958, in run
run_metadata_ptr)
File "C:\Users\HP\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1181, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\HP\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1359, in _do_run
run_metadata)
File "C:\Users\HP\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable conv2d_74/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/conv2d_74/kernel)
[[node conv2d_74/Conv2D/ReadVariableOp (defined at C:\Users\HP\Desktop\invoice-master\text\keras_yolo3.py:49) ]]

求模型训练代码

作者你好,我在学习这个OCR项目,想得到所有模型的训练代码,不知道方便发一下吗。
我的微信:wsh_2766659938
邮箱:[email protected]

数据集

可以提供部分数据集吗?想train一下代码。谢谢

如何进行检测标注

您好,能交流一下您做检测训练时是如何标注吗的,方便的话能加一下您微信吗

Postman里是在Body的Key写file,Value选择图片吗,返回如下报错

<TITLE>Error Message</TITLE> <STYLE id=L_10060_1>A { FONT-WEIGHT: bold; FONT-SIZE: 10pt; COLOR: #005a80; FONT-FAMILY: tahoma } A:hover { FONT-WEIGHT: bold; FONT-SIZE: 10pt; COLOR: #0d3372; FONT-FAMILY: tahoma } TD { FONT-SIZE: 8pt; FONT-FAMILY: tahoma } TD.titleBorder { BORDER-RIGHT: #955319 1px solid; BORDER-TOP: #955319 1px solid; PADDING-LEFT: 8px; FONT-WEIGHT: bold; FONT-SIZE: 12pt; VERTICAL-ALIGN: middle; BORDER-LEFT: #955319 0px solid; COLOR: #955319; BORDER-BOTTOM: #955319 1px solid; FONT-FAMILY: tahoma; HEIGHT: 35px; BACKGROUND-COLOR: #d2b87a; TEXT-ALIGN: left } TD.titleBorderx { BORDER-RIGHT: #955319 0px solid; BORDER-TOP: #955319 1px solid; PADDING-LEFT: 8px; FONT-WEIGHT: bold; FONT-SIZE: 12pt; VERTICAL-ALIGN: middle; BORDER-LEFT: #955319 1px solid; COLOR: #978c79; BORDER-BOTTOM: #955319 1px solid; FONT-FAMILY: tahoma; HEIGHT: 35px; BACKGROUND-COLOR: #d2b87a; TEXT-ALIGN: left } .TitleDescription { FONT-WEIGHT: bold; FONT-SIZE: 12pt; COLOR: black; FONT-FAMILY: tahoma } SPAN.explain { FONT-WEIGHT: normal; FONT-SIZE: 10pt; COLOR: #934225 } SPAN.TryThings { FONT-WEIGHT: normal; FONT-SIZE: 10pt; COLOR: #934225 } .TryList { MARGIN-TOP: 5px; FONT-WEIGHT: normal; FONT-SIZE: 8pt; COLOR: black; FONT-FAMILY: tahoma } .X { BORDER-RIGHT: #955319 1px solid; BORDER-TOP: #955319 1px solid; FONT-WEIGHT: normal; FONT-SIZE: 12pt; BORDER-LEFT: #955319 1px solid; COLOR: #7b3807; BORDER-BOTTOM: #955319 1px solid; FONT-FAMILY: verdana; BACKGROUND-COLOR: #d1c2b4 } .adminList { MARGIN-TOP: 2px } </STYLE>
X
Network Access Message: The page cannot be displayed
Explanation: The request timed out before the page could be retrieved.

Try the following:
  • Refresh page: Search for the page again by clicking the Refresh button. The timeout may have occurred due to Internet congestion.
  • Check spelling: Check that you typed the Web page address correctly. The address may have been mistyped.
  • Contact website: You may want to contact the website administrator to make sure the Web page still exists. You can do this by using the e-mail address or phone number listed on the website home page.
If you are still not able to view the requested page, try contacting your administrator or Helpdesk.

                            </TD>
                        </TR>
                    </TBODY>
                </TABLE>
                <TABLE id=spacer>
                    <TBODY>
                        <TR>
                            <TD height=15></TD>
                        </TR>
                    </TBODY>
                </TABLE>
                <TABLE width=400>
                    <TBODY>
                        <TR>
                            <TD noWrap width=25></TD>
                            <TD width=400 id=L_10060_10>
                                <B>Technical Information (for support personnel)</B>
                                <UL class=adminList>
                                    <LI id=L_10060_11>Error Code 10060: Connection timeout

                                        <LI id=L_10060_12>Background: The gateway could not receive a timely response from the website you are trying to access. This might indicate that the network is congested, or that the website is experiencing technical difficulties.

                                            <LI id=L_10060_13>Date: 7/17/2020 7:59:50 AM [GMT]

                                                <LI id=L_10060_14>Server: HKUTMPWV004.cn.asia.ad.pwcinternal.com

                                                    <LI id=L_10060_15>Source: Firewall

  
                                                    </UL>
                                                </TD>
                                            </TR>
                                        </TBODY>
                                    </TABLE>
                                </BODY>
                            </HTML>

dependabot/pip/tensorflow-1.15.2 分支运行报错

1.15.2 这个版本支持吗?

Traceback (most recent call last):
File "app.py", line 14, in
from model_post_type import ocr as OCR
File "/home/test/invoice/model_post_type.py", line 32, in
from text import keras_detect_type as detect
File "/home/test/invoice/text/keras_detect_type.py", line 23, in
box_score = box_layer([*textModel.output,image_shape,input_shape],anchors, num_classes)
File "/home/test/invoice/text/keras_yolo3.py", line 366, in box_layer
boxes = concatenate(boxes, axis=0)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/merge.py", line 705, in concatenate
return Concatenate(axis=axis, **kwargs)(inputs)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 802, in call
base_layer_utils.create_keras_history(inputs)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 184, in create_keras_history
_, created_layers = _create_keras_history_helper(tensors, set(), [])
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 231, in _create_keras_history_helper
layer_inputs, processed_ops, created_layers)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 231, in _create_keras_history_helper
layer_inputs, processed_ops, created_layers)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 231, in _create_keras_history_helper
layer_inputs, processed_ops, created_layers)
[Previous line repeated 2 more times]
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 229, in _create_keras_history_helper
constants[i] = backend.function([], op_input)([])
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py", line 3476, in call
run_metadata=self.run_metadata)
File "/test/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1472, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_367' with dtype float and shape [2]
[[{{node Placeholder_367}}]]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.