Coder Social home page Coder Social logo

pdftables / python-pdftables-api Goto Github PK

View Code? Open in Web Editor NEW
80.0 8.0 31.0 43 KB

Python library to interact with https://pdftables.com API

Home Page: https://pdftables.com/api

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
pdf-to-excel pdftables pdftables-api pdf pdf-extractor pdf-converter pdf-conversion

python-pdftables-api's Introduction

pdftables-api

Python library to interact with the PDFTables.com API.

Supported versions of Python are listed in ci-build.yml.

Installation

pip: (requires git installed)

pip install git+https://github.com/pdftables/python-pdftables-api.git

pip: (without git)

pip install https://github.com/pdftables/python-pdftables-api/archive/master.tar.gz

Locally:

python setup.py install

Upgrading

If using pip, then use pip with the --upgrade flag, e.g.

pip install --upgrade git+https://github.com/pdftables/python-pdftables-api.git

Usage

Sign up for an account at PDFTables.com and then visit the API page to see your API key.

Replace my-api-key below with your API key.

import pdftables_api

c = pdftables_api.Client('my-api-key')
c.xlsx('input.pdf', 'output.xlsx')

Formats

To convert to CSV, XML or HTML simply change c.xlsx to be c.csv, c.xml or c.html respectively.

To specify Excel (single sheet) or Excel (multiple sheets) use c.xlsx_single or c.xlsx_multiple.

Test

python -m unittest test.test_pdftables_api

Configuring a timeout

If you are converting a large document (hundreds or thousands of pages), you may want to increase the timeout.

Here is an example of the sort of error that might be encountered:

ReadTimeout: HTTPSConnectionPool(host='pdftables.com', port=443): Read timed out. (read timeout=300)

The below example allows 60 seconds to connect to our server, and 1 hour to convert the document:

import pdftables_api

c = pdftables_api.Client('my-api-key', timeout=(60, 3600))
c.xlsx('input.pdf', 'output.xlsx')

python-pdftables-api's People

Contributors

dependabot[bot] avatar djui avatar mattiephillips avatar pwaller avatar stevenmaude avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-pdftables-api's Issues

Release on PyPI

Release a version on PyPI, so it can be installed directly with pip.

API not working for files over 100KB

Greetings,

I am submitting a large set of files and only smaller files under 100KB are getting processed all others do not error out or provide any error message. I have adjusted the timeout parameter and this does not fix the issue.

Thanks!

An established connection was aborted by the software in your host machine

Error: An established connection was aborted by the software in your host machine

Traceback (most recent call last):
File "D:/Arbeit/parsererods.py", line 4, in
c.xlsx('sprugk4d.pdf', 'output.xlsx')
File "D:\Users\jamie\AppData\Local\Programs\Python\Python39\lib\site-packages\pdftables_api\pdftables_api.py", line 59, in xlsx
return self.xlsx_multiple(pdf_path, xlsx_path)
...
...

requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionAbortedError(10053, 'An established connection was aborted by the software in your host machine', None, 10053, None))

Doesn't work on google colab

Hello. I try to use it on google colab and here is the error message.
Am I doing something wrong or it does not work on google colab? Thanks

APIException Traceback (most recent call last)
in ()
2
3 c = pdftables_api.Client('my-api-key')
----> 4 c.xlsx('input.pdf', 'output.xlsx')

3 frames
/usr/local/lib/python3.6/dist-packages/pdftables_api/pdftables_api.py in request(self, pdf_fo, out_format, query_params, **requests_params)
157 raise APIException("Unknown file format")
158 elif response.status_code == 401:
--> 159 raise APIException("Unauthorized API key")

My-api-key

doubt

Please explain what is "my-api-key"? Do we need mention anything key?

If so what is that??

Convert multiple files from a folder

import pdftables_api
import os

c = pdftables_api.Client('XYZ')
root_dir = 'C:\PythonTests\pdf'

for directory, subdirectories, files in os.walk(root_dir):
    for file in files:
        print (file)
        c.html(file, file) 

File "c:/PythonTests/for.py", line 10
c.html('file, file)
^
SyntaxError: EOL while scanning string literal

I am trying to get every single file and convert it. How to fix this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.