Summary of your issue I have the same table continuing on multiple

According to this test case, the header could be multiline. <a href="https://github.co

Repeating headers about tabula-py HOT 4 CLOSED

chezou commented on July 28, 2024

Repeating headers

from tabula-py.

Comments (4)

chezou commented on July 28, 2024

According to this test case, the header could be multiline. https://github.com/tabulapdf/tabula-java/blob/0af623a6e0af756f71af417e3c9c5731e9f3f6bf/src/test/java/technology/tabula/TestBasicExtractor.java#L375
So I think defining "header" is a little bit tough.

Currently, we can extract ignoring repeating header with extracting each page and dropping header, but it should be inefficient. If you want to efficient extraction, could you please file an issue on tabula-java?

from tabula-py.

pranali3215 commented on July 28, 2024

Hi,

Thanks for the reply.
I agree, that the current method is inefficient.

I have raised an issue with tabula-java for the same.
Here is the link: tabulapdf/tabula-java#186

from tabula-py.

raghurammanyam commented on July 28, 2024

when i convert pdf into multiple pages ..the data is not reading from all the pages of pdf by using tabula

from tabula-py.

chezou commented on July 28, 2024

Close this issue since tabula-java issue was closed.

While I'm neutral for this feature, currently, I don't have any plan to implement it, but it would be nice to have a document for it, or PRs are welcome.

from tabula-py.

Repeating headers about tabula-py HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent