Coder Social home page Coder Social logo

andytango / mupdf-js Goto Github PK

View Code? Open in Web Editor NEW
166.0 166.0 20.0 7.06 MB

πŸ“° Yet another Webassembly PDF renderer for node and the browser

Home Page: https://andytango.github.io/mupdf-js-demo/index.html

License: GNU Affero General Public License v3.0

TypeScript 100.00%
mupdf pdf pdf-converter pdf-viewer wasm webassembly

mupdf-js's People

Contributors

andytango avatar anthrax63 avatar dependabot[bot] avatar ihoey avatar malena205 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mupdf-js's Issues

initMuPdf

Following the readme I get the following error...

TypeError: initMuPdf is not a function

Instead needed to use createMuPdf()

Error when trying to run example from README.md

I am trying to run the example from the mupdf-js's README.md, in NodeJS environment (i am using the file in.pdf from disk), but I am getting an error:

My code:

import createMuPdf from "mupdf-js";
import fs from "fs";

async function handleSomePdf(file) {
  const mupdf = await createMuPdf();
  //const buf = await file.arrayBuffer();
  const arrayBuf = new Uint8Array(file); //buf);
  const doc = mupdf.load(arrayBuf);
}

handleSomePdf(fs.readFileSync('in.pdf'));

Error:

D:\src> npm start

> [email protected] start
> node index.js

file:///D:/src/index.js:5
  const mupdf = await createMuPdf();
                      ^

TypeError: createMuPdf is not a function
    at handleSomePdf (file:///D:/src/index.js:5:23)
    at file:///D:/src/index.js:11:1
    at ModuleJob.run (node:internal/modules/esm/module_job:198:25)
    at async Promise.all (index 0)
    at async ESMLoader.import (node:internal/modules/esm/loader:409:24)
    at async loadESM (node:internal/process/esm_loader:85:5)
    at async handleMainPromise (node:internal/modules/run_main:61:12)

Node.js v18.0.0

How can this be fixed?

isContinuation is incorrect for results spanning more than 2 lines when using searchPageText

isContinuation correctly is set to true for the first rectangle when a result spans 2 lines. However, when a result spans 3 lines, one would expect the second rectangle to be set to isContinuation: true as well, with only the last rectangle (representing the 3rd line) be set to isContinuation: false. This issue means the rectangles representing the search results end up grouped together incorrectly (the 3rd line ends up as a discrete search result).

[BUG] CompileError running client side

Describe the bug
When instantiating the library on a NextJS client component I'm getting web assembly errors.

To Reproduce
Steps to reproduce the behavior:

  • Call createMuPdf on a nextjs client component

Log output
Initially fails trying to import FS. If you mock out FS with the following web pack config

webpack(config, { webpack, isServer }) {
    if (!isServer) {
      config.resolve.fallback = {
        fs: false
      }
    }
    return config;
  },

Then the import works, but you get the following when instantiating mupdf.

RuntimeError: abort(CompileError: WebAssembly.Module doesn't parse at byte 0: module doesn't start with '\0asm'). Build with -s ASSERTIONS=1 for more info.

[BUG] Failed to parse URL

Describe the bug
when running the pdf

Log output
abort(TypeError: Failed to parse URL from */mupdf-js/dist/libmupdf.wasm). Build with -s ASSERTIONS=1 for more info.

Desktop

  • OS: Ubuntu
  • OS Version: Ubuntu
  • node
  • Node version: 18.10.0
  • Architecture: AMD64

[FEATURE] Use console.warn to emit errors not console.error

Is your feature request related to a problem? Please describe.

When this library emits warnings like "warning: PDF stream Length incorrect", it emits them in console.warn rather than console.error, but handles the PDF succesfully. We have alarms setup on our server to alarm when there is console.error messages. So this triggers false alarms.

Describe the solution you'd like
Use console.warn for warnings, and console.error for errors (or throw exception on error)

Describe alternatives you've considered
Monkey patch console.warn in our code.

drawPageAsPNG generates base64 with newline character

Describe the bug
Generating a PNG image using drawPageAsPNG has newline after data:image/png;base64,

To Reproduce

import { createMuPdf } from "mupdf-js"

let preview = []

// Initialize mupdf and call mupdfGeneratePNG
async function mupdfPreview(file) {
  const mupdf = await createMuPdf()
  const fileArrayBuffer = await file.arrayBuffer()
  const fileBuffer = new Uint8Array(fileArrayBuffer)
  const pdf = mupdf.load(fileBuffer)
  const pages = mupdf.countPages(pdf)
  return await mupdfGeneratePNG(mupdf, pdf, file, pages)
}

// Generate previews for pdf pages and return array of base64 PNGs
async function mupdfGeneratePNG(mupdf, pdf, file, pages) {
  return new Promise((resolve, reject) => {
    try {
      for (let i = 1; i < pages; i++) {
        const base64Image = mupdf.drawPageAsPNG(pdf, i, 10) // issue is here
        preview.push(image)
      }
      
      resolve(preview)
    } catch (error) {
      console.log(error)
      reject(error)
    }
  })
}

// Handle file input on change event
async function onChange(event) {
  const files = event.target.files
  Array.from(files).forEach(async (file) => {
      if (file.type === 'application/pdf') {
        await mupdfPreview(file)
      }
  })
}

// Listen for input file change
const input = document.querySelector('[type="file"]')
input.addEventListener('change', onChange)

Expected behavior
drawPageAsPNG should return a single line base64 string like:

...

Actual behavior
base64 string has newline after data:image/png;base64,

data:image/png;base64,
iVBORw0KGgoAAAANSUhEUgAA...

# in some environments the newline would be escaped and become `%0A`, resulting to:
data:image/png;base64,%0AiVBORw0KGgoAAAANSUhEUgAA...

Log output
Paste outputs from your terminal or browser logs to help explain your problem.

Screenshots
Screen Shot 2023-03-23 at 2 29 43 PM

Desktop (please complete the following information):

  • OS: Linux and MacOS
  • OS Version: Ubuntu 22.04 LTS (GNU/Linux 5.15.0-1031-aws x86_64) and MacOS Catalina version 10.15.7
  • Browser (or Node) Chrome and Node
  • Browser (or Node) version: Chrome Version 110.0.5481.177 (Official Build) (x86_64) and Node v18.14.2
  • Architecture: x86_64

Smartphone (please complete the following information):
Not applicable

Additional context
Nothing

[FEATURE] Support for Cloudflare Workers

Is your feature request related to a problem? Please describe.
There are currently no PDF readers that support Cloudflare workers.

Describe the solution you'd like
Being able to use mupdf-js on workers, considering the hardware and dependency limitations it has.

Describe alternatives you've considered
Currently, mupdf-js does not support it out of the box, and throws errors regarding not found modules (XMLHttpRequest)

Additional context

Feature Request: Expose more functions from MuPDF

The MuPDF WASM demo on the official site exposes more functions from the WASM. pageTextJSON can be used to generate a better text selection layer and the second function is essential for searching text. Could you please add these functions to MuPDF-js?

	mupdf.pageTextJSON = Module.cwrap('pageText', 'string', ['number', 'number', 'number']);
	mupdf.searchJSON = Module.cwrap('search', 'string', ['number', 'number', 'number', 'string']);

[BUG]

Describe the bug
when i open same pdf by mupdf official viewer it's fine but the html look weird in mupdf-js, it's a feature or bug?
image

getPageText crashes on larger PDFs

getPageText() crashes with a RuntimeError: memory access out of bounds exception on PDFs larger than around 20 pages. This happens in the browser / WebWorker and Node.js. On Node.js it crashes with bus error

[FEATURE] Alternative way to instantiate mupdf-js instance

Is your feature request related to a problem? Please describe.

I can instantitate mupdf-js on sveltekit front-end using await createMuPdf(). But I wanted to somehow instantiate mupdf in sveltekit's server side context. If I do await createMuPdf(), sveltekit would give the following error:

Error: Cannot use relative URL (/home/projects/sveltejs-kit-template-default-7koyei/node_modules/mupdf-js/dist/libmupdf.wasm) with global fetch β€” use event.fetch instead: https://kit.svelte.dev/docs/web-standards#fetch-apis

It seems createMuPdf is doing fetch('/home/projects/sveltejs-kit-template-default-7koyei/node_modules/mupdf-js/dist/libmupdf.wasm') which sveltekit does not like when done server side.

Here is the stackblitz reproduction of the issue: https://stackblitz.com/edit/sveltejs-kit-template-default-7koyei?file=src/routes/+server.js. Clicking on the button will send a post request to svelte server where createMuPdf is called and the above error happens.

Describe the solution you'd like

It would be nice if we could load libmupdf.wasm by passing its path on the disk with createMuPdf, something like:

import * as path from "path"
import { fileURLToPath } from "url"

const __filename = fileURLToPath(import.meta.url)
const __dirname = path.dirname(__filename)

const mupdfPath = path.join(__dirname, '../node_modules/mupdf-js/dist/libmupdf.wasm')
const mupdf = await createMuPdf(mupdfPath)

Since mupdf is already able to get path of libmupdf.wasm like '/home/projects/sveltejs-kit-template-default-7koyei/node_modules/mupdf-js/dist/libmupdf.wasm', loading should be possible.

Describe alternatives you've considered

I tried loading the wasm file using nodejs readFile and do WebAssembly.instantiate but I am having problems with importing. I don't really know about wasm so I am a bit lost.

import { readFile } from 'node:fs/promises'

const mupdfWasmBuffer = await readFile(path.join(__dirname, '../../node_modules/mupdf-js/dist/libmupdf.wasm'))
const mupdfModule = await WebAssembly.compile(mupdfWasmBuffer)
let importObject = {}

for (let imp of WebAssembly.Module.imports(mupdfModule)) {
    if (typeof importObject[imp.module] === "undefined") {
        importObject[imp.module] = {}
    }
    switch (imp.kind) {
        case "function": importObject[imp.module][imp.name] = () => {}; break;
        case "table": importObject[imp.module][imp.name] = new WebAssembly.Table({ initial: 0, element: "anyfunc" }); break;
        case "memory": importObject[imp.module][imp.name] = new WebAssembly.Memory({ initial: 12000, maximum: 32768 }); break;
        case "global": importObject[imp.module][imp.name] = 0; break;
    }
}
const mupdfInstance = await WebAssembly.instantiate(mupdfWasmBuffer, importObject)

general question regarding with functions

just found this mupdf version for JS, and I am curious, can mupdf-js also somehow extract and get properties of text blocks, annotations, and other objects in a PDF similar to PyMuPDF? i was looking for a web-based solution and PDF.js was too limited in this specific use case (or there are just things i didnt took a deeper dive).

the said properties are as follows:

  • bounding box
  • fill and stroke, and colors
  • text color
  • font
  • origin
  • direction of rotation
  • and others

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.