Coder Social home page Coder Social logo

mufeedvh / pdfrip Goto Github PK

View Code? Open in Web Editor NEW
505.0 4.0 65.0 5.33 MB

A multi-threaded PDF password cracking utility equipped with commonly encountered password format builders and dictionary attacks.

License: MIT License

Rust 100.00%
rust security security-tools pdf password password-cracker hashcat

pdfrip's Introduction

PDFRip

A multi-threaded PDF password cracking utility equipped with commonly encountered password format builders and dictionary attacks.

๐Ÿ“– Table of Contents

โ„น๏ธ Introduction

pdfrip is a fast multithreaded PDF password cracking utility written in Rust with support for wordlist-based dictionary attacks, date, number range, and alphanumeric brute-forcing, and a custom query builder for password formats.

Features

  • Fast: Performs about 50k-100k+ passwords per second utilizing full CPU cores.
  • Custom Query Builder: You can write your own queries like STRING{69-420} which would generate and use a wordlist with the full number range.
  • Date Bruteforce: You can pass in a year which would bruteforce all 365 days of the year in DDMMYYYY format which is a pretty commonly used password format for PDFs.
  • Number Bruteforce: Just give a number range like 5000-100000 and it would bruteforce with the whole range.
  • Default Bruteforce: Specify a maximum and optionally a minimum length for the password search and all passwords of length 4 up to the specified maximum consisting of letters and numbers (a-zA-Z0-9) will be tried

Installation

Install with cargo:

$ cargo install --git https://github.com/mufeedvh/pdfrip.git

Install Rust/Cargo

Build From Source

Prerequisites:

  • Git
  • Rust
  • Cargo (Automatically installed when installing Rust)
  • A C linker (Only for Linux, generally comes pre-installed)
$ git clone https://github.com/mufeedvh/pdfrip.git
$ cd pdfrip/
$ cargo build --release

The first command clones this repository into your local machine and the last two commands enters the directory and builds the source in release mode.

Usage

Get a list of all the arguments:

$ pdfrip --help

Start a dictionary attack with a wordlist:

$ pdfrip -f encrypted.pdf wordlist rockyou.txt

Bruteforce number ranges for the password:

$ pdfrip -f encrypted.pdf range 1000 9999

Bruteforce all dates in a span (inclusive in both ends) of years for the password in DDMMYYYY format:

$ pdfrip -f encrypted.pdf date 1900 2000

Bruteforce arbitrary strings of length 4-8:

$ pdfrip -f encrypted.pdf default-query --max-length 8

Bruteforce arbitrary strings of length 3:

$ pdfrip -f encrypted.pdf default-query --max-length 3 --min-length 3

Build a custom query to generate a wordlist: (useful when you know the password format)

$ pdfrip -f encrypted.pdf custom-query ALICE{1000-9999}

$ pdfrip -f encrypted.pdf custom-query DOC-ID{0-99}-FILE

Enable preceding zeros for custom queries: (which would make {10-5000} to {0010-5000} matching the end range's digits)

$ pdfrip -f encrypted.pdf custom-query ALICE{10-9999} --add-preceding-zeros

Contribution

Ways to contribute:

  • Suggest a feature
  • Report a bug
  • Fix something and open a pull request
  • Help me document the code
  • Spread the word

License

Licensed under the MIT License, see LICENSE for more information.

pdfrip's People

Contributors

lets-go-worker avatar limitedatonement avatar mufeedvh avatar zasekle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pdfrip's Issues

performance across PDF versions

What's the expected performance across PDF versions / types? (Edit: maybe have a table added to the README?)

For comparison, GPU performance under hashcat on a single GTX 1080, across supported PDF versions, is as follows:

$ for hashtype in 10400 10420 10500 25400 10600 10700; do \
    hashcat -b -w 4 -O -m $hashtype --quiet; done

-------------------------------------------------
* Hash-Mode 10400 (PDF 1.1 - 1.3 (Acrobat 2 - 4))
-------------------------------------------------

Speed.#1.........:   414.1 MH/s (403.66ms) @ Accel:1024 Loops:256 Thr:32 Vec:1

--------------------------------------------------------------
* Hash-Mode 10420 (PDF 1.1 - 1.3 (Acrobat 2 - 4), collider #2)
--------------------------------------------------------------

Speed.#1.........:  6510.0 MH/s (102.36ms) @ Accel:1024 Loops:1024 Thr:32 Vec:1

------------------------------------------------------------------
* Hash-Mode 10500 (PDF 1.4 - 1.6 (Acrobat 5 - 8)) [Iterations: 70]
------------------------------------------------------------------

Speed.#1.........: 14044.6 kH/s (30.67ms) @ Accel:1024 Loops:70 Thr:32 Vec:1

----------------------------------------------------------------------------------------
* Hash-Mode 25400 (PDF 1.4 - 1.6 (Acrobat 5 - 8) - user and owner pass) [Iterations: 70]
----------------------------------------------------------------------------------------

Speed.#1.........: 14294.5 kH/s (30.56ms) @ Accel:1024 Loops:70 Thr:32 Vec:1

-----------------------------------------------
* Hash-Mode 10600 (PDF 1.7 Level 3 (Acrobat 9))
-----------------------------------------------

Speed.#1.........:  3128.5 MH/s (423.99ms) @ Accel:128 Loops:512 Thr:1024 Vec:1

----------------------------------------------------------------------
* Hash-Mode 10700 (PDF 1.7 Level 8 (Acrobat 10 - 11)) [Iterations: 64]
----------------------------------------------------------------------

Speed.#1.........:    34875 H/s (586.47ms) @ Accel:32 Loops:8 Thr:256 Vec:1

Package Versions Wrong

I cloned and run cargo build and got the following:

     Updating crates.io index
error: failed to select a version for the requirement `clap = "^4.3.21"`
candidate versions found which didn't match: 3.2.25, 3.2.24, 3.2.23, ...
location searched: crates.io index
required by package `pdfrip v2.0.0 (/home/lawsa/source/pdfrip)`

wanted new feature

need musk attack where we can define position of digit numeric special charater uppercase and lower case

can't crack the aes256 ones

I can't crack the pdf i created. and password is 123456.
it can only crack acrobat 6 and 7 versions. not X versions.

settings

Parse command for Year Range in ddmmyyyy format

Let's say I know the password is in DDMMYYYY format but I don't know the year.
So how do I parse the command for the year range 1900 to 2000 so that pdfrip checks all dates between 1900 & 2000 in DDMMYYYY format?

Split PDFRip into multiple crates

I think this would be useful in reference to #17 and #18 as well as to improve maintainability.

PDFRip is basically a big monolithic crate right now, which is fine for smaller software but It will probably not age well as the number of features (and therefore probably the number of dependencies we have) increase, potentially causing issues regarding different versions of crate dependencies being incompatible.

Additionally splitting it into multiple crates allows us to enforce a stricter dependency chain between our different parts.

E.g. We can ensure Clap (argument parsing) is only accessible from our main crate, guaranteeing engine.rs is unable to somehow depend on it.

I propose to split PDFRip into the following crates with the following responsibilities:

  • Main binary - Essentially becoming a presentation layer responsible for interactions with the user. i.e. Argument parsing and/or Api related things as well as selecting a logging frontend and, if possible with the proposed dependencies, the progressbar.
  • Engine crate, responsible for the business logic of cracking passwords. It should depend on our Producer and Cracker crates.
  • Producer crate. Defines the "Producer" trait and contains the different cracking methods, i.e. "custom-query", "default-query" and such.
  • Cracker crate, implements the PDFCracker struct. Allowing us to decouple the rest of the crate from depending on the PDF crate we're currently using for our decryption logic.

We can probably do all this inside this one repository by utilizing Cargo's workspace feature

The proposed structure can be represented in the following way:

image

The image was generated using https://structurizr.com/dsl with the code from the attached file.
structurizr.txt

Rewrite PDFRip to take an Async approach

PDFRip currently performs it's tasks with traditional threads causing certain logic to become more cumbersome compared to how I think it could be implemented in an async environment.
This is supported by how engine.rs has seen bugs such as #14 suggesting it's a tad too complicated.

I propose reimplementing PDFRip to be async instead by utilizing the Tokio ecosystem.

The benefits are:

  • We have access to Tokio's async runtime, potentially improving performance.
  • We have access to tokio-util's CancellationTokens and TaskTrackers as well as Tokio's Select! macro and ctrl-c signal handling.
    • This Could be useful when implementing #17.
  • We can remove our dependency on Crossbeam and utilize the channels in Tokio.

The current problems that I think need to be resolved are

  • We will probably need to make the Producer traits Async, which is supported in rust since 1.75. This is to allow cancelling production of passwords when implementing #17 .
  • We must lock our minimum rust version to 1.75 if we use async traits.

Cannot run custom-query without any integer ranges.

Found this while working on tests for #23 .

thread 'main' panicked at crates/producer/src/custom_query.rs:54:70:
called `Result::unwrap()` on an `Err` value: ParseIntError { kind: Empty }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The prompt I used was
cargo run -- -f examples/ALICE_BANK_STATEMENT.pdf custom-query "I'M BATMAN"

Removing the single-tick (') still caused the same error.

Runnning
cargo run -- -f examples/ALICE_BANK_STATEMENT.pdf custom-query "I'M BATMAN{1337}"
produces

thread 'main' panicked at crates/producer/src/custom_query.rs:56:33:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I suggest either resolving this or ensure we report a cleaner error message.

Colors not working on Windows 10

Some colors works, others doesn't, here a screenshot of it:

image

I've tested both with powershell and cmd on Windows 10 Pro 21H1.

User name hidden due of privacy reasons.

Race condition in src/core/engine.rs

Developing #13 revealed there is a race condition in the engine where the engine exits before receiving a correct password from a producer, despite it eventually sending one.

Running
env LOG_LEVEL=info cargo run --release -- --filename examples/datetime-15012000.pdf date 1900 2000
does not successfully crack the PDF, while
env LOG_LEVEL=info cargo run --release -- --filename examples/datetime-15012000.pdf date 1900 2001
succeeds despite this generator sending passwords inclusively. I suspect there is a bug somewhere else that will need to be investigated

Adding debug logging to the DateProducer next() function shows the correct password is being sent but not recognized.

2023-12-03T21:34:13.151Z DEBUG pdfrip::core::production::dates > Sending 15012000 from DateProducer

This means there is a race condition somewhere. Likely in engine.rs.

I imagine the bruteforcer should be simplified since it is currently complex, clunky, inefficent and annoying.

Crash resulting from malloc error

$ pdfrip -f pw.pdf -n 128 default-query --min-length 7 --max-length 7
           .___ _____       .__        
______   __| _// ____\______|__|_____  
\____ \ / __ |\   __\\_  __ \  \____ \ 
|  |_> > /_/ | |  |   |  | \/  |  |_> >
|   __/\____ | |__|   |__|  |__|   __/ 
|__|        \/                 |__|    2.0.1

 2024-01-17T01:29:12.322Z INFO  pdfrip::core::engine > Starting password cracking job...
โ š [2d 15:01:46] [โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘] 30932726000/78364164096 39% 136332/s ETA: 4d
pdfrip(21967,0x16c9a7000) malloc: *** error for object 0x600001910f50: pointer being freed was not allocated
pdfrip(21967,0x16c9a7000) malloc: *** set a breakpoint in malloc_error_break to debug
pdfrip(21967,0x1725c3000) malloc: Heap corruption detected, free list is damaged at 0x60000190c040
*** Incorrect guard value: 10107014426694143842
pdfrip(21967,0x1725c3000) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Password of '123B' was not cracked by default query.

Hi - I have a PDF secured with the password '123B'. Using the query:
pdfrip -f file.pdf -n 12 default-query --max-length 4 --min-length 4

I'd expect the tool to find the password no problem - but it doesn't. If I reverse the password to B123, it finds it. I assumed the default query would run all permutations of a-zA-Z0-9 but it seems not?

Thanks!

Allow dd.mm.yyyy date format and others

It would be cool be to be able to specify the date format.

This small patch changes it to dd.mm.yyyy but of course it needs to be made configurable

diff --git a/crates/producer/src/dates.rs b/crates/producer/src/dates.rs
index ed11325..2ce0ec7 100644
--- a/crates/producer/src/dates.rs
+++ b/crates/producer/src/dates.rs
@@ -25,7 +25,7 @@ fn pregenerate_dates() -> Vec<String> {
                 month.to_string()
             };
 
-            results.push(format!("{}{}", date, month))
+            results.push(format!("{}.{}", date, month))
         }
     }
 
@@ -63,7 +63,7 @@ impl Producer for DateProducer {
 
             let next = self.inner.next().unwrap();
 
-            let password = format!("{:04}{:04}", next, self.current).into_bytes();
+            let password = format!("{:04}.{:04}", next, self.current).into_bytes();
             debug!(
                 "Sending {} from DateProducer",
                 String::from_utf8_lossy(&password)

Using Numbers in custom query

Hello, it seems the custom query is only allowing a string value to be fixed. Like if i want to search a query like...
-q {0-990}1234 then the program searches 0-9901234 and not the hundred possibilities. Can you please take a look. Thanks

Missing Documentation for Passwords with Letters

It looks like the modes of operatior are:

  • Provide a dictionary of all the passwords I want checked
  • Provide a number range to use as passwords to check (either using range or custom-query STRING{start-end})
  • Provide a year from which to create eight-digit numbers as passwords to check

Other operational modes are not listed. I would like to check all possible passwords with characters from a provided character set. I tried

for i in {a..z} {A..Z} {0..9}; do
    echo "$i" >> dictionary;
done;
./pdfrip -n 15 --filename ~/my.pdf wordlist dictionary;

but that only checked the 62 passwords provided in my word list. I'm guessing custom-query may be able to do what I want, but I don't see any documentation on it.

It would be nice if the default experience was just like pdfcrack, but utilizing multiple cores. At least documentation explaining how to do such a thing would be greatly appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.