Coder Social home page Coder Social logo

standardgalactic / pdfcat Goto Github PK

View Code? Open in Web Editor NEW

This project forked from map7/pdfcat

0.0 2.0 0.0 25.85 MB

PDFCat is a document management system for PDFs which are scanned in.

Ruby 82.37% Perl 0.23% JavaScript 4.81% CSS 3.70% Shell 0.08% HTML 8.82%

pdfcat's Introduction

PDFcat

PDFcat is a document management system to be ran on your lan. The idea is that you setup your scanner to scan to the upload directory and then catalogue the result though the web interface.

Installation

Tested using Debian 7 stable.

  1. Install requirements
    sudo apt-get install pdftk texlive-extra-utils libmagickwand-dev imagemagick
        
  2. Install postgresql
    sudo apt-get install postgresql libpq-dev
        
  3. Create pdfcat role
    sudo su postgres
    createrole <your user>
    exit
        
  4. Install ruby 1.8
    sudo apt-get install ruby1.8 rubygems1.8
        
  5. Install bundler
    sudo gem install bundler
        
  6. Clone this repository
  7. Install required gems
    cd pdfcat
    bundle
        
  8. Edit the db/database.yml & change username to your username
  9. Create database
    rake db:create
    rake db:migrate
    rake db:seed
        
  10. Start server
    script/server
        

Usage

  1. Install on a production server within your network
  2. Create an storage & upload directory eg: /home/<user>/upload & /home/<user>/storage
  3. Use samba to share the upload directory to the network
  4. Point your PDF scanner to the upload directory
  5. Do a test by scanning to the upload directory and catalogue the result

OCR

I use Abbyy FineReader Engine for Linux. You have to have a license from them to use this product.

I’ve created a simple rake task to do the OCR at night as it can use a lot of resources. As I have only brought a 12,000 page a year license I’ve coded my rake task to do 20 pdfs a night. (Most of my pdfs only have 1 page so this is ok). This means I most likely won’t go over my yearly quota.

To use do the following

  1. Purchase a license
  2. Add the following to your crontab
    00	0	* * *	root	/bin/bash -c 'cd /var/www/pdfcat_app/current; export RAILS_ENV=production; bundle exec rake pdfs:ocr'
        

    I use rbenv so mine looks like so;

    00	21	* * *	map7	/bin/bash -c 'export HOME="/home/map7";export PATH="/usr/local/rbenv/bin:$PATH" ; cd /var/www/pdfcat_app/current; eval "$(rbenv init -)"; export RAILS_ENV=production; bundle exec rake pdfs:ocr'
        

Contributing

  1. Fork it!
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -am ‘Add some feature’
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request :D

Tests

Running test suite

Running all

rake

Individual

spec spec/models/pdf_spec.rb

  • [X] Add in OCR support (20/01/2016) - abbyy finereader 11.3 CLI for Linux
  • [X] upgrade to ruby 1.9.x (20/01/2016)
  • [X] Add a pdf preview to the listing & show areas
  • [ ] Allow more than just PDFs to be stored
  • [ ] Write more file tests
  • [ ] upgrade to the latest ruby (2.2)
  • [ ] upgrade to the latest rails (4.2)
  • [ ] Improve the email facility to use dropbox linking or yousend it for large files

History

Created in 2009 in Rails 2.x it started as a simple pdf document management system for one company. Now it will handle multiple companies, staff and categories. It also has shortcut keys throughout.

License

MIT

pdfcat's People

Contributors

map7 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.