Coder Social home page Coder Social logo

sk2andy / pdfsharp.extensions Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gheeres/pdfsharp.extensions

0.0 2.0 0.0 37 KB

Extensions methods for PDFSharp to simplify common operations, including image extraction.

License: MIT License

C# 100.00%

pdfsharp.extensions's Introduction

PDFSharp.Extensions

The following are extension methods for PDFSharp to support and simplify some common operations.

This project started due to a project requiring for images that where scanned into a PDF to be extracted to allow for portions of the embedded images to be permanently redacted. Most of the solutions and partial bits of code were incomplete or didn't cover my test cases and needs. Originally, I solve the problem using the iTextSharp library but the licensing model for that project prohibited me from continued use and development.

Licensed under the MIT license.


Image Utilities

Extension methods are provided for extracting images from an entire document, individual pages or specific images. Currently only RGB encoded images (/DeviceRGB) are supported with either /DCTDecode or /FlatEncode encoding. /Indexed colorspaces are also supported for /FlatEncode images including 1bpp images (black & white).

All images are extracted as System.Drawing.Image obects which can then be saved or manipulated as necessary.

Example

string filename = @"My.Sample.pdf";
Console.WriteLine("Processing file: {0}", filename);
using (PdfDocument document = PdfReader.Open(filename, PdfDocumentOpenMode.Import)) {
  int pageIndex = 0;
  foreach (PdfPage page in document.Pages) {
    int imageIndex = 0;
    foreach(Image image in page.GetImages()) {
      Console.WriteLine("\r\nExtracting image {1} from page {0}", pageIndex + 1, imageIndex + 1);
      
      // Save the file images to disk in the current directory.
      image.Save(String.Format(@"{0:00000000}-{1:000}.png", pageIndex + 1, imageIndex + 1, Path.GetFileName(filename)), ImageFormat.Png);
      imageIndex++;
    }
    pageIndex++;
  }
}

Notes

If you find a PDF file that contains an encoded image, which is a /DeviceRGB colorspace, that doesn't extract correctly, please send submit a issue and attach the offending PDF file or send via email.

Please do not complain about PDF files that are unable to be processed because they are iref encoded. This is a limitation of the PDFSharp libraries and on the product roadmap for implementation.

Helpers

A helper class has been created for a PdfDictionary object that will allow you to quickly inspect to see if the dictionary is an image. A number of other helper extension methods have been created for PdfItem and other base classes for common inspection related tasks.

PdfDictionary item;
if (item.IsImage()) {
  Image image = item.ToImage();
}

pdfsharp.extensions's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.