Coder Social home page Coder Social logo

pdfreader's Introduction

pdfreader

Info

See the tutorials & documentation for more information.

Author & Maintainer

Maksym Polshcha <[email protected]>

See GitHub for the latest source.

About

pdfreader is a Pythonic API for:
  • extracting texts, images and other data from PDF documents (plain or protected)
  • accessing different objects within PDF documents
pdfreader is NOT a tool (maybe one day it become!):
  • to create or update PDF files
  • to split PDF files into pages or other pieces
  • convert PDFs to any other format

Nevertheless it can be used as a part of such tools.

See Tutorials & Documentation.

Features

  • Extracts texts (plain text and formatted text objects)
  • Extract PDF forms data (pure strings and formatted text objects)
  • Supports all PDF encodings, CMap, predefined cmaps.
  • Extracts images and image masks as Pillow/PIL Images
  • Supports encrypted and password-protected PDF documents
  • Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)
  • Follows PDF-1.7 specification
  • Lazy objects access allows to process huge PDF documents quite fast

Installation

pdfreader can be installed with pip:

$ python -m pip install pdfreader

Or easy_install from setuptools:

$ python -m easy_install pdfreader

You can also download the project source and do:

$ python setup.py install

Tutorial and Documentation

Tutorial, real-life examples and documentation

Support, Bugs & Feature Requests

pdfreader uses GitHub issues to keep track of bugs, feature requests, etc.

References

Donation

If this project is helpful, you can treat me to coffee :-)

image

pdfreader's People

Contributors

alyetama avatar djbrown avatar maxpmaxp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.