Coder Social home page Coder Social logo

digideskio / paperless Goto Github PK

View Code? Open in Web Editor NEW

This project forked from the-paperless-project/paperless

0.0 0.0 0.0 3.85 MB

Scan, index, and archive all of your paper documents

License: GNU General Public License v3.0

CSS 48.79% HTML 4.82% JavaScript 25.22% Shell 0.66% Python 20.51%

paperless's Introduction

Paperless

Read the documentation at https://paperless.readthedocs.org/ Join the chat at https://gitter.im/danielquinn/paperless Travis Dependencies

Scan, index, and archive all of your paper documents

I hate paper. Environmental issues aside, it's a tech person's nightmare:

  • There's no search feature
  • It takes up physical space
  • Backups mean more paper

In the past few months I've been bitten more than a few times by the problem of not having the right document around. Sometimes I recycled a document I needed (who keeps water bills for two years?) and other times I just lost it... because paper. I wrote this to make my life easier.

How it Works

  1. Buy a document scanner like this one.
  2. Set it up to "scan to FTP" or something similar. It should be able to push scanned images to a server without you having to do anything. If your scanner doesn't know how to automatically upload the file somewhere, you can always do that manually. Paperless doesn't care how the documents get into its local consumption directory.
  3. Have the target server run the Paperless consumption script to OCR the PDF and index it into a local database.
  4. Use the web frontend to sift through the database and find what you want.
  5. Download the PDF you need/want via the web interface and do whatever you like with it. You can even print it and send it as if it's the original. In most cases, no one will care or notice.

Here's what you get:

The before and after

Stability

Paperless is still under active development (just look at the git commit history) so don't expect it to be 100% stable. I'm using it for my own documents, but I'm crazy like that. If you use this and it breaks something, you get to keep all the shiny pieces.

Requirements

This is all really a quite simple, shiny, user-friendly wrapper around some very powerful tools.

  • ImageMagick converts the images between colour and greyscale.
  • Tesseract does the character recognition.
  • Unpaper despeckles and deskews the scanned image.
  • GNU Privacy Guard is used as the encryption backend.
  • Python 3 is the language of the project.
    • Pillow loads the image data as a python object to be used with PyOCR.
    • PyOCR is a slick programmatic wrapper around tesseract.
    • Django is the framework this project is written against.
    • Python-GNUPG decrypts the PDFs on-the-fly to allow you to download unencrypted files, leaving the encrypted ones on-disk.

Documentation

It's all available on ReadTheDocs.

Similar Projects

There's another project out there called Mayan EDMS that has a surprising amount of technical overlap with Paperless. Also based on Django and using a consumer model with Tesseract and unpaper, Mayan EDMS is much more featureful and comes with a slick UI as well. It may be that Paperless is better suited for low-resource environments (like a Rasberry Pi), but to be honest, this is just a guess as I haven't tested this myself. One thing's for certain though, Paperless is a much better name.

Important Note

Document scanners are typically used to scan sensitive documents. Things like your social insurance number, tax records, invoices, etc. While paperless encrypts the original PDFs via the consumption script, the OCR'd text is not encrypted and is therefore stored in the clear (it needs to be searchable, so if someone has ideas on how to do that on encrypted data, I'm all ears). This means that paperless should never be run on an untrusted host. Instead, I recommend that if you do want to use it, run it locally on a server in your own home.

Donations

As with all Free software, the power is less in the finances and more in the collective efforts. I really appreciate every pull request and bug report offered up by Paperless' users, so please keep that stuff coming. If however, you're not one for coding/design/documentation, and would like to contribute financially, I won't say no ;-)

The thing is, I'm doing ok for money, so I would instead ask you to donate to the United Nations High Commissioner for Refugees. They're doing important work and they need the money a lot more than I do.

paperless's People

Contributors

bmartin5692 avatar caffeineflo avatar ckut avatar danielquinn avatar gitter-badger avatar issmirnov avatar jaimeobregon avatar jamiemagee avatar jat255 avatar mrwacky42 avatar phryneas avatar pitkley avatar rubenwaterman avatar stevenvandervalk avatar stringlytyped avatar synchrone avatar the01 avatar tikitu avatar wttw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.