PDFcat is a document management system to be ran on your lan. The idea is that you setup your scanner to scan to the upload directory and then catalogue the result though the web interface.
Tested using Debian 7 stable.
- Install requirements
sudo apt-get install pdftk texlive-extra-utils libmagickwand-dev imagemagick
- Install postgresql
sudo apt-get install postgresql libpq-dev
- Create pdfcat role
sudo su postgres createrole <your user> exit
- Install ruby 1.8
sudo apt-get install ruby1.8 rubygems1.8
- Install bundler
sudo gem install bundler
- Clone this repository
- Install required gems
cd pdfcat bundle
- Edit the db/database.yml & change username to your username
- Create database
rake db:create rake db:migrate rake db:seed
- Start server
script/server
- Install on a production server within your network
- Create an storage & upload directory eg: /home/<user>/upload & /home/<user>/storage
- Use samba to share the upload directory to the network
- Point your PDF scanner to the upload directory
- Do a test by scanning to the upload directory and catalogue the result
I use Abbyy FineReader Engine for Linux. You have to have a license from them to use this product.
I’ve created a simple rake task to do the OCR at night as it can use a lot of resources. As I have only brought a 12,000 page a year license I’ve coded my rake task to do 20 pdfs a night. (Most of my pdfs only have 1 page so this is ok). This means I most likely won’t go over my yearly quota.
To use do the following
- Purchase a license
- Add the following to your crontab
00 0 * * * root /bin/bash -c 'cd /var/www/pdfcat_app/current; export RAILS_ENV=production; bundle exec rake pdfs:ocr'
I use rbenv so mine looks like so;
00 21 * * * map7 /bin/bash -c 'export HOME="/home/map7";export PATH="/usr/local/rbenv/bin:$PATH" ; cd /var/www/pdfcat_app/current; eval "$(rbenv init -)"; export RAILS_ENV=production; bundle exec rake pdfs:ocr'
- Fork it!
- Create your feature branch: git checkout -b my-new-feature
- Commit your changes: git commit -am ‘Add some feature’
- Push to the branch: git push origin my-new-feature
- Submit a pull request :D
Running all
rake
Individual
spec spec/models/pdf_spec.rb
- [X] Add in OCR support (20/01/2016) - abbyy finereader 11.3 CLI for Linux
- [X] upgrade to ruby 1.9.x (20/01/2016)
- [X] Add a pdf preview to the listing & show areas
- [ ] Allow more than just PDFs to be stored
- [ ] Write more file tests
- [ ] upgrade to the latest ruby (2.2)
- [ ] upgrade to the latest rails (4.2)
- [ ] Improve the email facility to use dropbox linking or yousend it for large files
Created in 2009 in Rails 2.x it started as a simple pdf document management system for one company. Now it will handle multiple companies, staff and categories. It also has shortcut keys throughout.
MIT