Comments (8)
I used pdf2json
to parse the PDF and get its width and height, and then used these values in pdf2pic
.
This helper method returns the width and height from a PDF Buffer:
import PDFParser from "pdf2json";
const getSizeFromPdf = (buffer: Buffer) =>
new Promise<{ width: number; height: number }>((resolve, reject) => {
const pdfParser = new PDFParser();
pdfParser.parseBuffer(buffer);
pdfParser.on('pdfParser_dataError', (errData: { parserError: string }) =>
reject(errData.parserError),
);
pdfParser.on('pdfParser_dataReady', (pdfData: any) => {
if (
typeof pdfData.formImage !== 'object' ||
!('Pages' in pdfData.formImage) ||
!Array.isArray(pdfData.formImage.Pages)
)
return reject('Unable to parse PDF');
// Values are in "page size" not pixels, so we multiply by 25 (e.g., 38.25 becomes 957px)
resolve({
width: Math.round(pdfData.formImage.Width * 25),
height: Math.round(pdfData.formImage.Pages[0].Height * 25),
});
});
});
You can use it like so, capturing exceptions:
let width = 1000;
let height = 1400;
try {
const size = await getSizeFromPdf(buffer);
if (size.width) width = size.width;
if (size.height) height = size.height;
} catch (error) {}
const image = (await fromBuffer(buffer, {
width,
height,
}).bulk(1);
from pdf2image.
@yakovmeister Any news on this as well ?
PS : a PDF can have pages with different sizes.
from pdf2image.
You can keep the aspect ratio by setting width
and height
to undefined
eg.
fromPath("./mypdf.pdf", { width: undefined, height: undefined });
Using density
, you can get larger/smaller image.
You can also control height and keep the aspect ratio by setting only width to undefined
eg.
fromPath("./mypdf.pdf", { width: undefined, height: 600 });
Hope this helps for me.
We will change defaults in the next major version and allow setting width while preserving the aspect ratio.
from pdf2image.
I found that in use, the ratio of length to width can be automatically recognized without specifying the width and height in the option, but they are transposed in the output picture.
Here is the result that I did not specify the length and width:
and the output:
{
name: 'page.1.jpg',
size: '768x512',
fileSize: 42.859,
path: '/tmp/images/page.1.jpg',
page: 1
}
This is the result obtained by specifying a width of 512 and a height of 768:
from pdf2image.
this is quite hard as I need to look into pdf's metadata, but I'll think about this feature.
from pdf2image.
@JasonLamv-t I find numbers 512, 768 on height and width.
this lib always returns 512, 768.
what ever you represent 1000 x 600 or 600 x 1000 image.
from pdf2image.
You can use https://github.com/calipersjs/calipers-pdf to get the size of the document before extracting its pages:
;(async () => {
const Calipers = require('calipers')('pdf')
const filePath = path.resolve(__dirname, 'lol.pdf')
const { pages: pagesSizes } = (await Calipers.measure(filePath)) as { pages: [{ width: number; height: number }] }
const results: WriteImageResponse[] = []
let pageNumber = 1
for (const pageSize of pagesSizes) {
results.push(
(
await pdf2Pic(filePath, {
saveFilename: `${path.basename(filePath)}`,
savePath: path.dirname(filePath),
width: pageSize.width,
height: pageSize.height,
format: 'png',
quality: 100
}).bulk!(pageNumber)
)[0]
)
pageNumber++
}
return results
})()
To install
sudo apt install pkg-config libpoppler-cpp-dev libpoppler-private-dev
pnpm i calipers calipers-pdf
from pdf2image.
I used
pdf2json
to parse the PDF and get its width and height, and then used these values inpdf2pic
.This helper method returns the width and height from a PDF Buffer:
import PDFParser from "pdf2json"; const getSizeFromPdf = (buffer: Buffer) => new Promise<{ width: number; height: number }>((resolve, reject) => { const pdfParser = new PDFParser(); pdfParser.parseBuffer(buffer); pdfParser.on('pdfParser_dataError', (errData: { parserError: string }) => reject(errData.parserError), ); pdfParser.on('pdfParser_dataReady', (pdfData: any) => { if ( typeof pdfData.formImage !== 'object' || !('Pages' in pdfData.formImage) || !Array.isArray(pdfData.formImage.Pages) ) return reject('Unable to parse PDF'); // Values are in "page size" not pixels, so we multiply by 25 (e.g., 38.25 becomes 957px) resolve({ width: Math.round(pdfData.formImage.Width * 25), height: Math.round(pdfData.formImage.Pages[0].Height * 25), }); }); });You can use it like so, capturing exceptions:
let width = 1000; let height = 1400; try { const size = await getSizeFromPdf(buffer); if (size.width) width = size.width; if (size.height) height = size.height; } catch (error) {} const image = (await fromBuffer(buffer, { width, height, }).bulk(1);
this code will only work with pdf2json versions below 2.0.0
There has been breaking changes in the library's API.
from pdf2image.
Related Issues (20)
- More info about params HOT 2
- Error write epipe Windows 10 HOT 5
- Density option is broken. HOT 3
- YOU DONT NEED THIS LIBRARY HOT 1
- bulk in version 2.2.0 HOT 2
- Convert buffer to base64 HOT 2
- Timestamp included in the base64 response HOT 1
- problem canvas HOT 3
- stream operator isn't terminated by valid EOL HOT 3
- Breaking Types HOT 1
- Error on processing more than 10 pages HOT 2
- Get image the same dimension as pdf HOT 2
- width and height should be allowed to be excluded (no defaults) HOT 1
- Unable to create image buffer from pdf buffer HOT 6
- Remove .page-number from the saved filename HOT 2
- generated images is broken HOT 10
- Store Converted pages to S3 Bucket HOT 2
- ImageMagick 6.9 not supported?
- Convert all pages error
- GraphicsMagick/ImageMagick: gm couldn't be executed HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdf2image.