Coder Social home page Coder Social logo

ocr_tablenet's People

Contributors

agnes-u avatar tomassosorio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ocr_tablenet's Issues

Error when loading landscape image

Hi,

I am getting an error when trying to load an image in landscape mode. Image dimensions: 2128 × 1695

ValueError: operands could not be broadcast together with shapes (896,896,4) (3,) (896,896,4)

Any thought??

how to use it to train a model on my own dataset?

Thank you very much for your work. I am very interested in your work. However, when I plan to use this model to train on my own data set, I encountered some difficulties and hope to get your help.
my dataset is in the form of csv files, just like CSV datasets metioned in (https://github.com/yhenon/pytorch-retinanet#annotations-format).

CSV datasets
The CSVGenerator provides an easy way to define your own datasets. It uses two CSV files: one file containing annotations and one file containing a class name to ID mapping.

Annotations format
The CSV file with annotations should contain one annotation per line. Images with multiple bounding boxes should use one row per bounding box. Note that indexing for pixel values starts at 0. The expected format of each line is:

path/to/image.jpg,x1,y1,x2,y2,class_name
Some images may not contain any labeled objects. To add these images to the dataset as negative examples, add an annotation where x1, y1, x2, y2 and class_name are all empty:

path/to/image.jpg,,,,,
A full example:

/data/imgs/img_001.jpg,837,346,981,456,cow
/data/imgs/img_002.jpg,215,312,279,391,cat
/data/imgs/img_002.jpg,22,5,89,84,bird
/data/imgs/img_003.jpg,,,,,
This defines a dataset with 3 images. img_001.jpg contains a cow. img_002.jpg contains a cat and a bird. img_003.jpg contains no interesting objects/animals.

Class mapping format
The class name to ID mapping file should contain one mapping per line. Each line should use the following format:

class_name,id
Indexing for classes starts at 0. Do not include a background class as it is implicit.

For example:

cow,0
cat,1
bird,2

Whether the wireless table can be detected

Excuse me, I have two questions:
I want to do table detection only without identifying the table structure. What should I do?
Whether the wireless table can be detected?

How to process png image

Thank you for you wonderful work at first!
I want to process png file with the repo, but in predict.py about line 50, It raise ValueError: operands could not be broadcast together with shapes (896,896) (3,) (896,896) , What should I do to avoid this.

Python version

Which python version is needed such that it is compatible with all the required modules ?

Import error

Hi there @tomassosorio

Quick introduction: I need to extract data from PDF/images containing tables. Unfortunately, I have several different formats and traditional tools (PDFPlumber, Tabula, Camelot) do not seem to work for every possible format.
So now I'm trying a DL approach, and looking for some TableNet implementation code I found this repo.

I'm trying to use you code on Google Colab, but unfortunately I was not able to make it work. Notice that I have very little experience with DL libraries, so I apologise if my question is trivial.

Anyway. here's my code:

# Mount drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

!pip install -r  /content/drive/MyDrive/TableNet/requirements.txt

!python /content/drive/MyDrive/TableNet/predict.py --model_weights='/content/drive/MyDrive/TableNet/best_model.ckpt' --image_path='/content/drive/MyDrive/TableNet/TablesImages/Test_table.png'

This is the error I get:

Traceback (most recent call last):
  File "/content/drive/MyDrive/TableNet/predict.py", line 19, in <module>
    from tablenet import TableNetModule
  File "/content/drive/MyDrive/TableNet/tablenet/__init__.py", line 3, in <module>
    from .marmot import MarmotDataModule
  File "/content/drive/MyDrive/TableNet/tablenet/marmot.py", line 7, in <module>
    import pytorch_lightning as pl
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/__init__.py", line 66, in <module>
    from pytorch_lightning import metrics
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.metric import Metric
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/metric.py", line 23, in <module>
    from pytorch_lightning.metrics.utils import _flatten, dim_zero_cat, dim_zero_mean, dim_zero_sum
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/utils.py", line 18, in <module>
    from pytorch_lightning.utilities import rank_zero_warn
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/__init__.py", line 24, in <module>
    from pytorch_lightning.utilities.apply_func import move_data_to_device
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py", line 25, in <module>
    from torchtext.data import Batch
  File "/usr/local/lib/python3.7/dist-packages/torchtext/__init__.py", line 6, in <module>
    from . import experimental
  File "/usr/local/lib/python3.7/dist-packages/torchtext/experimental/__init__.py", line 2, in <module>
    from . import transforms
  File "/usr/local/lib/python3.7/dist-packages/torchtext/experimental/transforms.py", line 4, in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
ImportError: /usr/local/lib/python3.7/dist-packages/torchtext/_torchtext.so: undefined symbol: _ZNK3c104Type14isSubtypeOfExtERKSt10shared_ptrIS0_EPSo

I have to admit I have no idea what is causing the error. Could you please help me?

Thanks a lot and great work!

A Bug?

Hi I am just testing this model on the regular images that I randomly took from papers.
However, I always encounter the error:

ValueError: operands could not be broadcast together with shapes (896,896,4) (3,) (896,896,4)

I figure out this may come from the Normalize and I change the input to

image = Image.open(image_path).convert("RGB")

and it works. Maybe a fix?

Suggest to loosen the dependency on albumentations

Hi, your project OCR_tablenet(commit id: 5f4d781) requires "albumentations==0.5.2" in its dependency. After analyzing the source code, we found that the following versions of albumentations can also be suitable, i.e., albumentations 0.5.1, since all functions that you directly (7 APIs: albumentations.pytorch.transforms.ToTensorV2.init, albumentations.augmentations.transforms.Resize.init, albumentations.core.composition.Compose.init, albumentations.augmentations.transforms.RandomResizedCrop.init, albumentations.augmentations.transforms.VerticalFlip.init, albumentations.augmentations.transforms.HorizontalFlip.init, albumentations.augmentations.transforms.Normalize.init) or indirectly (propagate to 12 albumentations's internal APIs and 0 outsider APIs) used from the package have not been changed in these versions, thus not affecting your usage.

Therefore, we believe that it is quite safe to loose your dependency on albumentations from "albumentations==0.5.2" to "albumentations>=0.5.1,<=0.5.2". This will improve the applicability of OCR_tablenet and reduce the possibility of any further dependency conflict with other projects.

May I pull a request to further loosen the dependency on albumentations?

By the way, could you please tell us whether such an automatic tool for dependency analysis may be potentially helpful for maintaining dependencies easier during your development?

add license

please add license information in the project

Error: invalid load key, '<'

I am using Google Colab. Upon running python predict.py, I am getting the following error:

Traceback (most recent call last):
  File "predict.py", line 142, in <module>
    predict()
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "predict.py", line 135, in predict
    pred = Predict(model_weights, transforms)
  File "predict.py", line 38, in __init__
    self.model = TableNetModule.load_from_checkpoint(checkpoint_path)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/core/saving.py", line 137, in load_from_checkpoint
    checkpoint = pl_load(checkpoint_path, map_location=lambda storage, loc: storage)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/utilities/cloud_io.py", line 32, in load
    return torch.load(f, map_location=map_location)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 764, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

Also, I am unable to install many dependencies in the requirements.txt file using pip 21.0.1
For eg. numpy 1.20.0 is not available.

How to improve the recognition effect non-training set?

Thanks for your project!
Table position detection performs well. But when I use non-training set images for testing ,The table information extraction effect is very poor ,using pytesseract OCR. Maybe there is a problem with my picture format?

unable to predict on an image using "predict.py".

Traceback (most recent call last):
File "predict.py", line 142, in
predict()
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "predict.py", line 135, in predict
pred = Predict(model_weights, transforms)
File "predict.py", line 38, in init
self.model = TableNetModule.load_from_checkpoint('/content/best_model.ckpt')
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/saving.py", line 137, in load_from_checkpoint
checkpoint = pl_load(checkpoint_path, map_location=lambda storage, loc: storage)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/cloud_io.py", line 32, in load
return torch.load(f, map_location=map_location)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 587, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 242, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory

error while predicting with default image

all the requirements are installed but raising error
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.

SList Object to Pandas DataFrame Conversion

Hi, I get " SList Object" after predicting my test images. I need to convert it to CSV/ pandas data frame format. Can anybody suggest to me how can I do it?

Thanks in advance.

RuntimeError: CUDA out of memory.

Tried to allocate 1.91 GiB (GPU 0; 7.93 GiB total capacity; 5.33 GiB already allocated; 1.91 GiB free; 5.36 GiB reserved in total by PyTorch)

When i try to run python train.py

Entry point not found - torchvision

Hi,

I get a pop-up error when running py predict.py:

"The procedure entry point
torchCheckFail@detail[....] could not be located in the dynamic link library C:\Users[...]\tablenetVirtualEnv\Lib\site-packages\torchvision_C.pyd.
"

It can be linked with the problem that I cannot install torchvision 0.8.2. It doesn't find it.
I can't find the version for Windows (i am using windows 10, python 3.9.2) either: https://pypi.org/project/torchvision/0.8.2/#files

So I try with 0.9.2 but there is this pop up. In the end I can ignore it and it will do an extraction:

py predict.py
[ 0 1 2 3 4 5
0 Wellname ‘Toss (ft) LOT (psi) ‘Sv (psi) ‘Sv-LOT (psi) Normalized
1 117-4 4265 2785 2871 116 LoTisv
2 117-4 631 4465 4782 216 0.96
3 117-4 7615 5690 5045 246 0.94
4 117-4 9528 7439 m2 333 0.96
5 172 414 2042 089 247 0.96
6 172 821 4684 sigt 505 0.92
7 172 6361 6815 464 0.90
8 172 8550 10368 10150 218 0.93
9 Popeye 11982 aria 2828 247 1.02
10 Average 4491 4742 4901 16 0.95
11 161-4 6778 6075 7105 159 0.96
12 161-4 2639 2784 130 0.97
13 161-4 9154 27a 2857 145 0.98
14 205-2 4442 5989 6105 16 0.95
15 205-2 4518 6685 6786 16 0.96
16 205-2 8091 6670 eats 401 0.98
17 205-2 9812 2087 3103 145 0.99
18 205-2 8845 3085 161 16 0.98
19 205-42 4806 4597 arta 16 0.96
20 205.414 4862 4713 4829 16 0.96
21 205.14 6588 16 0.97
22 205.414 NaN NaN 131 0.98
23 Genesis 6703 NaN NaN 0.97
24 Average NaN NaN NaN
25 NaN NaN NaN NaN NaN]

How could I remove that problem?

encounter ModuleNotFoundError: No module named 'tensorboard' when running in Conda virtual environment

I use anaconda to manage the virtual environment and run the package to avoid package conflicts
then I encountered the module not fund error.
I checked with pip list and found the package is installed.
that means the package tensorboard installed but could not be found by conda
later I figured out it needs to run the additional commands if running the script under conda virtual environment
$ conda install -c conda-forge tensorboard==2.4.1
$ conda install -y -c conda-forge protobuf==3.14.0

Reference:
https://stackoverflow.com/questions/61320572/modulenotfounderror-no-module-named-tensorboard
https://stackoverflow.com/questions/58686400/can-not-get-pytorch-working-with-tensorboard

ValueError: operands could not be broadcast together with shapes (896,896,4) (3,) (896,896,4)

i am testing the model with different cases and something strange is, if I just feed it a screenshot .png picture of a table (sample attached)
sample_table2
and I run the command
python predict.py --model_weights='./tablenet_pretrained.ckpt' --image_path='./sample_table2.png'

it gives me value error operands could not be broadcast together with shapes (896,896,4) (3,) (896,896,4)

here is the full error log:

  File "/home/bluespinach/Documents/projects/tool_exp/OCR_tablenet/predict.py", line 148, in <module>
    predict()
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/bluespinach/Documents/projects/tool_exp/OCR_tablenet/predict.py", line 144, in predict
    print(pred.predict(image))
  File "/home/bluespinach/Documents/projects/tool_exp/OCR_tablenet/predict.py", line 50, in predict
    processed_image = self.transforms(image=np.array(image))["image"]
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/albumentations/core/composition.py", line 182, in __call__
    data = t(force_apply=force_apply, **data)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/albumentations/core/transforms_interface.py", line 89, in __call__
    return self.apply_with_params(params, **kwargs)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/albumentations/core/transforms_interface.py", line 102, in apply_with_params
    res[key] = target_function(arg, **dict(params, **target_dependencies))
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/albumentations/augmentations/transforms.py", line 1496, in apply
    return F.normalize(image, self.mean, self.std, self.max_pixel_value)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/albumentations/augmentations/functional.py", line 141, in normalize
    img -= mean
ValueError: operands could not be broadcast together with shapes (896,896,4) (3,) (896,896,4) 

something strange is, if I input the .png file which is transferred from a pdf page, it works well. here is my testing sample, the pdf page
output_page_0

could you help to address why the smaller table screenshot picture doesn't work?

Thank you very much!

Best regards,
JKyang01

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.