Coder Social home page Coder Social logo

Comments (4)

chezou avatar chezou commented on July 28, 2024

Thanks for creating an issue.

If you want to use the same option to all pages, I would suggest to call tabula.template.load_template directly.

Here is the example:

>>> import tabula
>>> fname = "./tests/resources/data.tabula-template.json"
>>> o = tabula.template.load_template(fname)
>>> o
[TabulaOption(pages=1, guess=False, area=[124.0, 154.0, 531.745, 565.57], relative_area=False, lattice=False, stream=True, password=None, silent=None, columns=None, relative_columns=False, format=None, batch=None, output_path=None, options='', multiple_tables=True), TabulaOption(pages=2, guess=True, area=[[123.999, 154.0, 210.444, 453.88], [410.996, 154.0, 497.441, 487.54]], relative_area=False, lattice=False, stream=False, password=None, silent=None, columns=None, relative_columns=False, format=None, batch=None, output_path=None, options='', multiple_tables=True), TabulaOption(pages=3, guess=True, area=[123.999, 154.0, 322.899, 235.855], relative_area=False, lattice=False, stream=False, password=None, silent=None, columns=None, relative_columns=False, format=None, batch=None, output_path=None, options='', multiple_tables=True)]
>>> o[0]
TabulaOption(pages=1, guess=False, area=[124.0, 154.0, 531.745, 565.57], relative_area=False, lattice=False, stream=True, password=None, silent=None, columns=None, relative_columns=False, format=None, batch=None, output_path=None, options='', multiple_tables=True)
>>> o[0].pages
1
>>> o[0].pages="all"
>>> tabula.read_pdf(pdf_path, options=" ".join(o[0].build_option_list()))
'pages' argument isn't specified.Will extract only from page 1 by default.
Got stderr: Aug. 22, 2023 9:08:52 P.M. org.apache.pdfbox.pdmodel.font.FileSystemFontProvider loadDiskCache
WARNING: New fonts found, font cache will be re-built
Aug. 22, 2023 9:08:52 P.M. org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
WARNING: Building on-disk font cache, this may take a while
Aug. 22, 2023 9:08:53 P.M. org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
WARNING: Finished building on-disk font cache, found 808 fonts

[             Unnamed: 0   mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  carb
0             Mazda RX4  21.0    6  160.0  110  3.90  2.620  16.46   0   1     4     4
1         Mazda RX4 Wag  21.0    6  160.0  110  3.90  2.875  17.02   0   1     4     4
2            Datsun 710  22.8    4  108.0   93  3.85  2.320  18.61   1   1     4     1
3        Hornet 4 Drive  21.4    6  258.0  110  3.08  3.215  19.44   1   0     3     1
4     Hornet Sportabout  18.7    8  360.0  175  3.15  3.440  17.02   0   0     3     2
5               Valiant  18.1    6  225.0  105  2.76  3.460  20.22   1   0     3     1
6            Duster 360  14.3    8  360.0  245  3.21  3.570  15.84   0   0     3     4
7             Merc 240D  24.4    4  146.7   62  3.69  3.190  20.00   1   0     4     2
8              Merc 230  22.8    4  140.8   95  3.92  3.150  22.90   1   0     4     2
9              Merc 280  19.2    6  167.6  123  3.92  3.440  18.30   1   0     4     4
10            Merc 280C  17.8    6  167.6  123  3.92  3.440  18.90   1   0     4     4
11           Merc 450SE  16.4    8  275.8  180  3.07  4.070  17.40   0   0     3     3
12           Merc 450SL  17.3    8  275.8  180  3.07  3.730  17.60   0   0     3     3
13          Merc 450SLC  15.2    8  275.8  180  3.07  3.780  18.00   0   0     3     3
14   Cadillac Fleetwood  10.4    8  472.0  205  2.93  5.250  17.98   0   0     3     4
15  Lincoln Continental  10.4    8  460.0  215  3.00  5.424  17.82   0   0     3     4
16    Chrysler Imperial  14.7    8  440.0  230  3.23  5.345  17.42   0   0     3     4
17             Fiat 128  32.4    4   78.7   66  4.08  2.200  19.47   1   1     4     1
18          Honda Civic  30.4    4   75.7   52  4.93  1.615  18.52   1   1     4     2
19       Toyota Corolla  33.9    4   71.1   65  4.22  1.835  19.90   1   1     4     1
20        Toyota Corona  21.5    4  120.1   97  3.70  2.465  20.01   1   0     3     1
21     Dodge Challenger  15.5    8  318.0  150  2.76  3.520  16.87   0   0     3     2
22          AMC Javelin  15.2    8  304.0  150  3.15  3.435  17.30   0   0     3     2
23           Camaro Z28  13.3    8  350.0  245  3.73  3.840  15.41   0   0     3     4
24     Pontiac Firebird  19.2    8  400.0  175  3.08  3.845  17.05   0   0     3     2
25            Fiat X1-9  27.3    4   79.0   66  4.08  1.935  18.90   1   1     4     1
26        Porsche 914-2  26.0    4  120.3   91  4.43  2.140  16.70   0   1     5     2
27         Lotus Europa  30.4    4   95.1  113  3.77  1.513  16.90   1   1     5     2
28       Ford Pantera L  15.8    8  351.0  264  4.22  3.170  14.50   0   1     5     4
29         Ferrari Dino  19.7    6  145.0  175  3.62  2.770  15.50   0   1     5     6
30        Maserati Bora  15.0    8  301.0  335  3.54  3.570  14.60   0   1     5     8
31           Volvo 142E  21.4    4  121.0  109  4.11  2.780  18.60   1   1     4     2,    Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
5           5.4          3.9           1.7          0.4  setosa,    Unnamed: 0  Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species
0         145           6.7          3.3           5.7          2.5  virginica
1         146           6.7          3.0           5.2          2.3  virginica
2         147           6.3          2.5           5.0          1.9  virginica
3         148           6.5          3.0           5.2          2.0  virginica
4         149           6.2          3.4           5.4          2.3  virginica
5         150           5.9          3.0           5.1          1.8  virginica,      len supp  dose
0    4.2   VC   0.5
1   11.5   VC   0.5
2    7.3   VC   0.5
3    5.8   VC   0.5
4    6.4   VC   0.5
5   10.0   VC   0.5
6   11.2   VC   0.5
7   11.2   VC   0.5
8    5.2   VC   0.5
9    7.0   VC   0.5
10  16.5   VC   1.0
11  16.5   VC   1.0
12  15.2   VC   1.0
13  17.3   VC   1.0
14  22.5   VC   1.0]

Of course, there is room for improvement to pass TabulaOption to tabula.read_pdf directly, but before that, I'd love to hear your feedback.

from tabula-py.

chezou avatar chezou commented on July 28, 2024

Close since no response.

from tabula-py.

ZeeD avatar ZeeD commented on July 28, 2024

uuhhh.. sorry, I didn't reply sooner, but this is a hobby project I'm working on.
While I understand your suggestion, this means that the template are not longer only defined in the json file, but explicitly manipulated... I think that at the moment I'll stuck with multiple templates and a simple logic to choose what to use for the extraction

from tabula-py.

chezou avatar chezou commented on July 28, 2024

Thanks for your response.

Unfortunately, tabula-py also doesn't know the page size of a PDF, so we can only use pages="all" option for handling unknown pages.

from tabula-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.