Coder Social home page Coder Social logo

Auto image proportion about pdf2image HOT 8 CLOSED

yakovmeister avatar yakovmeister commented on June 13, 2024 12
Auto image proportion

from pdf2image.

Comments (8)

AnandChowdhary avatar AnandChowdhary commented on June 13, 2024 13

I used pdf2json to parse the PDF and get its width and height, and then used these values in pdf2pic.

This helper method returns the width and height from a PDF Buffer:

import PDFParser from "pdf2json";

const getSizeFromPdf = (buffer: Buffer) =>
  new Promise<{ width: number; height: number }>((resolve, reject) => {
    const pdfParser = new PDFParser();
    pdfParser.parseBuffer(buffer);
    pdfParser.on('pdfParser_dataError', (errData: { parserError: string }) =>
      reject(errData.parserError),
    );
    pdfParser.on('pdfParser_dataReady', (pdfData: any) => {
      if (
        typeof pdfData.formImage !== 'object' ||
        !('Pages' in pdfData.formImage) ||
        !Array.isArray(pdfData.formImage.Pages)
      )
        return reject('Unable to parse PDF');
      // Values are in "page size" not pixels, so we multiply by 25 (e.g., 38.25 becomes 957px)
      resolve({
        width: Math.round(pdfData.formImage.Width * 25),
        height: Math.round(pdfData.formImage.Pages[0].Height * 25),
      });
    });
  });

You can use it like so, capturing exceptions:

let width = 1000;
let height = 1400;
try {
  const size = await getSizeFromPdf(buffer);
  if (size.width) width = size.width;
  if (size.height) height = size.height;
} catch (error) {}
const image = (await fromBuffer(buffer, {
  width,
  height,
}).bulk(1);

from pdf2image.

KaKi87 avatar KaKi87 commented on June 13, 2024 3

@yakovmeister Any news on this as well ?

PS : a PDF can have pages with different sizes.

from pdf2image.

mskec avatar mskec commented on June 13, 2024 2

You can keep the aspect ratio by setting width and height to undefined eg.

fromPath("./mypdf.pdf", { width: undefined, height: undefined });

Using density, you can get larger/smaller image.

You can also control height and keep the aspect ratio by setting only width to undefined eg.

fromPath("./mypdf.pdf", { width: undefined, height: 600 });

Hope this helps for me.

We will change defaults in the next major version and allow setting width while preserving the aspect ratio.

from pdf2image.

JasonLamv-t avatar JasonLamv-t commented on June 13, 2024 1

I found that in use, the ratio of length to width can be automatically recognized without specifying the width and height in the option, but they are transposed in the output picture.

Here is the result that I did not specify the length and width:
image
and the output:

{
  name: 'page.1.jpg',
  size: '768x512',
  fileSize: 42.859,
  path: '/tmp/images/page.1.jpg',
  page: 1
}

This is the result obtained by specifying a width of 512 and a height of 768:
image

from pdf2image.

yakovmeister avatar yakovmeister commented on June 13, 2024

this is quite hard as I need to look into pdf's metadata, but I'll think about this feature.

from pdf2image.

yusunglee2074 avatar yusunglee2074 commented on June 13, 2024

@JasonLamv-t I find numbers 512, 768 on height and width.
this lib always returns 512, 768.
what ever you represent 1000 x 600 or 600 x 1000 image.

from pdf2image.

rigwild avatar rigwild commented on June 13, 2024

You can use https://github.com/calipersjs/calipers-pdf to get the size of the document before extracting its pages:

;(async () => {
  const Calipers = require('calipers')('pdf')

  const filePath = path.resolve(__dirname, 'lol.pdf')

  const { pages: pagesSizes } = (await Calipers.measure(filePath)) as { pages: [{ width: number; height: number }] }
  const results: WriteImageResponse[] = []

  let pageNumber = 1
  for (const pageSize of pagesSizes) {
    results.push(
      (
        await pdf2Pic(filePath, {
          saveFilename: `${path.basename(filePath)}`,
          savePath: path.dirname(filePath),
          width: pageSize.width,
          height: pageSize.height,
          format: 'png',
          quality: 100
        }).bulk!(pageNumber)
      )[0]
    )
    pageNumber++
  }
  return results
})()

To install

sudo apt install pkg-config libpoppler-cpp-dev libpoppler-private-dev
pnpm i calipers calipers-pdf

from pdf2image.

The-CodeNinja avatar The-CodeNinja commented on June 13, 2024

I used pdf2json to parse the PDF and get its width and height, and then used these values in pdf2pic.

This helper method returns the width and height from a PDF Buffer:

import PDFParser from "pdf2json";

const getSizeFromPdf = (buffer: Buffer) =>
  new Promise<{ width: number; height: number }>((resolve, reject) => {
    const pdfParser = new PDFParser();
    pdfParser.parseBuffer(buffer);
    pdfParser.on('pdfParser_dataError', (errData: { parserError: string }) =>
      reject(errData.parserError),
    );
    pdfParser.on('pdfParser_dataReady', (pdfData: any) => {
      if (
        typeof pdfData.formImage !== 'object' ||
        !('Pages' in pdfData.formImage) ||
        !Array.isArray(pdfData.formImage.Pages)
      )
        return reject('Unable to parse PDF');
      // Values are in "page size" not pixels, so we multiply by 25 (e.g., 38.25 becomes 957px)
      resolve({
        width: Math.round(pdfData.formImage.Width * 25),
        height: Math.round(pdfData.formImage.Pages[0].Height * 25),
      });
    });
  });

You can use it like so, capturing exceptions:

let width = 1000;
let height = 1400;
try {
  const size = await getSizeFromPdf(buffer);
  if (size.width) width = size.width;
  if (size.height) height = size.height;
} catch (error) {}
const image = (await fromBuffer(buffer, {
  width,
  height,
}).bulk(1);

this code will only work with pdf2json versions below 2.0.0
There has been breaking changes in the library's API.

from pdf2image.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.