Coder Social home page Coder Social logo

seabreg / paranoidf Goto Github PK

View Code? Open in Web Editor NEW

This project forked from patrickdw123/paranoidf

0.0 1.0 0.0 66.75 MB

ParanoiDF - PDF Analysis Suite based on PeePDF by Jose Miguel Esparza (http://peepdf.eternal-todo.com/). Tools added: Password cracking, redaction recovery, DRM removal, malicious JavaScript extraction, and more.

License: GNU General Public License v3.0

Python 98.10% Makefile 0.05% Java 0.38% Shell 1.43% Batchfile 0.03%

paranoidf's Introduction

ParanoiDF

The swiss army knife of PDF Analysis Tools. Based on peepdf - http://peepdf.eternal-todo.com. This README builds on the peepdf README.

This tool was developed as part of my M.Sc dissertation/project for the School of Computing (University of Kent, Canterbury, UK). The man behind the idea for this tool was Julio Hernandez-Castro (www.azhala.com).

Home Page

Features

See https://github.com/patrickdw123/ParanoiDF/wiki.

Dependancies

  • In order to crack passwords:
    • PdfCrack needed (apt-get install pdfcrack)
  • In order to remove DRM (editing, copying Etc.):
    • Calibre's ebook-convert needed (apt-get install calibre)
  • In order to decrypt PDFs:
    • qpdf needed (apt-get install qpdf)
  • In order to use the command redact:
    • NLTK (Natural Language ToolKit) needed (apt-get install python-nltk)
    • Java (Stanford parser is written in Java) needed (apt-get install default-jre)
  • To support XML output "lxml" is needed:
  • Included modules: lzw, colorama, jsbeautifier, ccitt, pythonaes (Thanks to all the developers!!)

Installation

No installation is needed apart of the commented dependencies, just execute:

python paranoiDF.py

Execution

There are two important options when ParanoidF is executed:

-f: Ignores the parsing errors. Analysing malicious files propably leads to parsing errors, so this parameter should be set. -l: Sets the loose mode, so does not search for the endobj tag because it's not obligatory. Helpful with malformed files.

  • Simple execution

Shows the statistics of the file after being decoded/decrypted and analysed:

python paranoiDF.py [options] pdf_file
  • Interactive console

Executes the interactive console, giving a wide range of tools to play with.

python paranoiDF.py -i 
  • Batch execution

It's possible to use a commands file to specify the commands to be executed in the batch mode. This type of execution is good to automatise analysis of several files:

python paranoiDF.py [options] -s script_file 

Some Hints

If the information shown when a PDF file is parsed is not enough to know if it's harmful or not, the following commands can help to do it:

  • tree

Shows the tree graph of the file or specified version. Here we can see suspicious elements.

  • offsets

Shows the physical map of the file or the specified version of the document. This is helpful to see unusual big objects or big spaces between objects.

  • search

Search the specified string or hexadecimal string in the objects (decoded and encrypted streams included).

  • object/rawobject

Shows the (raw) content of the object.

  • stream/rawstream

Shows the (raw) content of the stream.

TODO (with date that I intend to start work on)

V2.0:

  • Refine and test redaction thresholds: 30/08/2014
  • Add automation of retrieval of redaction box information such as font size, font and redaction box coordinates: 30/08/2014
  • Add other encryption algorithms (such as the AES): 1/10/2014
  • Digital Signatures analysis: 1/10/2014
  • Add a GUI

Bugs

Feel free to send bugs/criticisms/praises/comments to patrickdw123(at)gmail(dot)com.

paranoidf's People

Contributors

patrickdw123 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.