Coder Social home page Coder Social logo

abarker / pdfcropmargins Goto Github PK

View Code? Open in Web Editor NEW
352.0 7.0 32.0 10.22 MB

pdfCropMargins -- a program to crop the margins of PDF files

License: Other

Python 87.38% Shell 12.36% Batchfile 0.26%
python pdf pdf-converter pdf-document-processor crop cropper

pdfcropmargins's People

Contributors

abarker avatar harveyslash avatar namibj avatar yxlao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pdfcropmargins's Issues

Rotated pages do not get cropped properly

With the commandline pdf-crop-margins -u -s myscan.pdf, the rotated pages are restored before read-in. However, this creates a problem. The rotated pages are supposed to be treated as-is, since the scanner made some not-so-clever decisions and I rotated the pages manually so that they are in the proper direction. Is there an option in the command that let the read-in treat the rotations as-is?

PyPDF2 Python package was found

$ pdf-crop-margins input.pdf -p4 100 100 100 1000 -o output.pdf
Error in pdfCropMargins: No system PyPDF2 Python package
was found. Reinstall pdfCropMargins via pip or install that
dependency ('pip install pypdf2').
$ pip install pypdf2 --user
Requirement already satisfied: pypdf2 in /home/ti/.local/lib/python3.6/site-packages (2.12.1)
Requirement already satisfied: typing_extensions>=3.10.0.0 in /home/ti/.local/lib/python3.6/site-packages (from pypdf2) (4.1.1)
Requirement already satisfied: dataclasses in /home/ti/.local/lib/python3.6/site-packages (from pypdf2
) (0.8)

$ pip show PyPDF2
Name: PyPDF2
Version: 2.12.1
Summary: A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files
Home-page:
Author:
Author-email: Mathieu Fenniak [email protected]
License:Location: /home/ti/.local/lib/python3.6/site-packages
Requires: dataclasses, typing_extensions
Required-by: pdfCropMargins

$ python3 -c "import PyPDF2; print(PyPDF2.__version__)"
2.12.1

$ pdf-crop-margins
Error in pdfCropMargins: No system PyPDF2 Python package
was found. Reinstall pdfCropMargins via pip or install that
dependency ('pip install pypdf2').

SO: CentOS 7

Bug in interactions of -e -u and -pg options

Hi,

thanks for sharing this great app, however I found that sometimes the cropping just doesn't work as expected.
Specially when some pages are excluded (eg. the first one with a large picture cover) and the options uniform and evenodd are set - resulting in a seemingly uncropped top margin - at least with my test document.

I believe the bug lies in the main_pdfCropMargins.py modul starting at line 325 where a common bottom and top margin is searched after processing separately the even and odd pages:

min_bottom_margin = min([box[1] for box in combine_even_odd])
max_top_margin = max([box[3] for box in combine_even_odd])

Which means that the search is done on all the pages, including the otherwise excluded ones. I think the proper way of doing this would be something like this:

min_bottom_margin = min([box[1] for i, box in enumerate(combine_even_odd) if i in page_nums_to_crop])
max_top_margin = max([box[3] for i, box in enumerate(combine_even_odd) if i in page_nums_to_crop])

can't install gui

(Python 3.10.8, pip 22.3.1) [gui] option isn't accepted by pip:

$ pip3 install pdfCropMargins[gui] --user --upgrade
ERROR: You must give at least one requirement to install (see "pip help install")

Error when run with Python 3.8.2

Works with Python 3.7 but when using Python 3.8.2 on WIndows 10, it can't find ghostscript path (which is in the PATH environment, I check). When -gsp option is specified, there is an error shown bellow during cropping process.

Error:
Caught an unexpected exception in the pdfCropMargins program.
Unexpected error: <class 'PermissionError'>
Error message : [WinError 5] Access is denied
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 102, in main
main_crop()
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 1324, in main_crop
did_crop = create_gui(input_doc_fname, fixed_input_doc_fname, output_doc_fname,
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\gui.py", line 931, in create_gui
bounding_box_list = process_pdf_file(input_doc_fname, fixed_input_doc_fname,
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 1145, in process_pdf_file
bounding_box_list = get_bounding_box_list(doc_with_crop_and_media_boxes_name,
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\calculate_bounding_boxes.py", line 91, in get_bounding_box_list
bbox_list = get_bounding_box_list_render_image(input_doc_fname, input_doc)
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\calculate_bounding_boxes.py", line 136, in get_bounding_box_list_render_image
render_pdf_file_to_image_files(pdf_file_name, temp_image_file_root, program_to_use)
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\calculate_bounding_boxes.py", line 219, in render_pdf_file_to_image_files
ex.render_pdf_file_to_image_files__ghostscript_bmp(
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\external_program_calls.py", line 678, in render_pdf_file_to_image_files__ghostscript_bmp
comm_output = get_external_subprocess_output(command, env=gs_environment)
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\external_program_calls.py", line 264, in get_external_subprocess_output
p = subprocess.Popen(command_list, stdout=subprocess.PIPE,
File "c:\python\python38-32\lib\subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "c:\python\python38-32\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,

Backup seems to not work when using a filepath instead of filename

Hi @abarker. This is indeed an amazing tool.

My use-case is that I created a .py file with the code below somewhere else and the pdfs to crop were present elsewhere. However, I'm encountering a problem when I use the following command to crop and backup a pdf file:

crop(['-ap', '12', '-p', '15', '-u', '-mo', '-su', 'old', '/path/to/some/file'])

The file gets modified correctly but no backup is created. I guess, that's because of using the file path and not using the file name. It's important for use-cases where we can't have the script file and the documents we want to modify in the same directory.

Any guidance on this one?

Choose aspect ratio

The cropping works great. But is there a way to choose the aspect ratio of a page after the files have been cropped?
I use a tablet to read some books, and cropping the margins allows the text to get bigger. But most of the time the page size is very different from the tablet screen aspect ratio (which is 4:3)

Inverted image preview in GUI, due to negative image width

On Linux, when I run pdfcropmargins -gui FILE, the preview image of each page is rotated by 180 degrees.

Screenshot_20230215_170709_pdfCropMargins: 2017-12-Gareus-Lat pdf

Why does this happen?

  • Upon GUI startup, create_gui() calls max_image_size = get_usable_image_size(window, im_wid, im_ht, left_pixels).
  • get_usable_image_size:
    • usable_width, usable_height = get_window_size(window)
    • get_window_size:
      • width, height = get_window_size_tk() returns (200, 200)
        • I'm running on kwin_x11 with 100% display scaling but 120 X11 DPI.
        • The actual window which appears is larger than 200x200, closer to 794x789.
      • (multiply by 0.95, unimportant)
    • win_width, win_height = window.Size = 916x789 (not sure what's going on here!)
    • usable_im_width, usable_im_height = (usable_width - non_im_width-left_pixels, usable_height - non_im_height) and now usable_im_width is -251!
  • Afterwards you fetch the PDF page (as a P6 PPM, rotated 180 degrees because of a negative width) and write it into image_element (though sg.Image says "Should be a GIF or a PNG only", but PPM works anyway).

I'm guessing the issue is that get_window_size_tk calls root.winfo_width()/winfo_height() which returns 200x200, but this is smaller than the actual window from window.Size. This throws off the subsequent calculations (I didn't verify whether they're correct or not).

I don't know if root.winfo_width() is broken, or if you're calling it at a time it's not defined to be valid, or if you're not setting up the window correctly to measure what you want. I don't know how to fix this bug.

def get_window_size_tk():
    """Use tk to get an approximation to the usable screen area."""

Why does the function say "window size" but the docstring say "screen area"?

            # Go to fullscreen mode to get screen size.  This seems to work with
            # multiple monitors (which otherwise get counted at a combined size).
            root.attributes("-alpha", 0) # Invisible on most systems.
            #root.attributes("-fullscreen", True) # Set to actual full-screen size.

The comments say you enter fullscreen to measure screen(?) size, but you never actually do because it's commented out.

Latest commit e8d5928.

Exitcode 0 even if error occurs

Hi,
pdfCropMargins always exits with exitcode 0. Therefore it is not possible to determine if an error occured when calling pdfCropMargins as subprocess.
I am using Python 3.6 on a Win10 machine.

In the file pdfCropMargins.py in main() you are capturing the SystemExit exception twice and don't pass on the exitcode. This problem happens on line 81 and in your function cleanupIgnoringKeyboardInterrupt on line 65.
In the SystemExit exception block the exitcode has to be propagated manually otherwise the exitcode from the previous call to sys.exit(..) which causes the exception will be resetted.

In the exception handling on line 81 I would recommend to add the line exitCode = sys.exc_info()[1] to set the exitcode properly for cleanupIgnoringKeyboardInterrupt(exitCode) in the finally statement. And the SystemExit block on line 65 should be deleted.

Best regards,
Martin

Incompatible with PyMuPDF (fitz) 1.20.0

PyMuPDF renamed a large number of methods, and 1.20 removed all historical method aliases (breaking semver but shhhh...). According to the changelog, 1.20.0 was released a mere 4 days ago on 2022-06-15, but is picked by default by Pip when installing the package (and pdfCropMargins does not pin dependencies).

When running pdf-crop-margins's CLI with pdfCropMargins[gui] installed (bringing the optional PyMuPDF dependency), it fails calling several PyMuPDF methods:

  • Page.getDisplayList() -> get_displaylist()
  • Pixmap.setResolution() -> set_dpi()
  • Pixmap.getImageData() -> tobytes()

There may be more renamed methods you call, but changing these 3 calls was sufficient to make the CLI work. To make the GUI start up and load (and possibly save) a PDF file, I had to change more occurrences of getDisplayList and getImageData. If that isn't enough, a full list of renamed methods is at https://pymupdf.readthedocs.io/en/latest/znames.html.

Can you change to the new names, or do you have to preserve compatibility with the old PyMuPDF by probing the presence of the new names?


And additionally the PDF shown on the GUI was rotated by 180 degrees (reproduced on two PDF files), although the actual cropping is performed correctly. Is this a known bug?

Met an exception.

`Unexpected error: <class 'AttributeError'>
Error message : module 'signal' has no attribute 'SIGHUP'

File "c:\python\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 92, in main
for s in [signal.SIGABRT, signal.SIGTERM, signal.SIGHUP]:`

`import signal

dir(signal)
Out[2]:
['CTRL_BREAK_EVENT',
'CTRL_C_EVENT',
'Handlers',
'NSIG',
'SIGABRT',
'SIGBREAK',
'SIGFPE',
'SIGILL',
'SIGINT',
'SIGSEGV',
'SIGTERM',
'SIG_DFL',
'SIG_IGN',
'Signals',
'_IntEnum',
'builtins',
'cached',
'doc',
'file',
'loader',
'name',
'package',
'spec',
'_enum_to_int',
'_int_to_enum',
'_signal',
'default_int_handler',
'getsignal',
'set_wakeup_fd',
'signal']`

No attribute 'SIGHUP' is in the module 'signal', but I am sorry I don't know which attribute you really want to use.

Blank output PDF

I have used this program and another program (Briss 2.0 GitHub page) on the same pdf file.
With Briss I could crop it and the output PDF is technically fine. But I prefer the program pdfCropMargins, because it allows to enter crop values and to give an identical page size to each page.

But pdfCropMargins doesn't work on the PDF. Its output is a white pdf without any content but with the same amount of pages as the input PDF.
The pdfCropMargins gui produces a crop preview that is fine. The automatic cropping of the PDF works in the preview. But when I use the command-line I get this blank output PDF no matter whether I use automatic or manual settings.

using cgroups to not throttle the cpu

Recently i was using pdf-crop-margins to crop white margins in pdf of 4000+ pages and each page size is 6"x22".

The .ppm files in the temp folder are of 8.5mb size each. Ofcourse i have set TMP variable to a folder where i have enough memory

But my system gets very slow.

So i have decided to use cgroups:

I am using the following configuration

I have created a cgroup

cgcreate -g memory,cpu:groupname/cpulimited_simha

8GB memory out of 12GB and (5)/1024 = 0.5% of cpu

echo $(( 8 * 1024 * 1024 * 1024 )) > /sys/fs/cgroup/memory/groupname/cpulimited_simha/memory.limit_in_bytes
echo 5 > /sys/fs/cgroup/cpu/groupname/cpulimited_simha/cpu.shares

and then run the command as

cgexec -g memory,cpu:groupname/cpulimited_simha pdf-crop-margins -v -p4 100 0 100 100 file.pdf;

By this both the .ppm creation is fast and also the finding the binding box is also fast.

I had to try various combinations but this one helps my documents to get cropped fast.

I also while using heavy tasks on system use great suspender for chromium so that it will help the system be very smooth

I just wanted to share this

Please add some suggestions or correct me also.

Set custom bounding box

In PDFCrop there was an option to set a custom bounding box: --bbox.

  --bbox "<left> <bottom> <right> <top>"                       ($::opt_bbox)
                      override bounding box found by Ghostscript
                      with origin at the lower left corner

Is there any equivalent with pdfCropMargins?

I checked the documentation but did not find anything.

Doesn't work with PyPDF2 3.0.0

When I installed pdfCropMargins a few days ago, it installed PyPDF2 3.0.0 for me.

The "-p" argument does absolutely nothing when using PyPDF2 3.0.0. I've tried to figure out why it doesn't work, but I cannot figure it out. One strange thing is that "-ap" argument still works with 3.0.0. I.e. "pdfCropMargins -ap 100 -p 0 document1.pdf" crops 100 pixels from each side of the document, but leaves a lot of white space around the small object that is in the middle of the document I'm using for testing.

When using "-v", it looks like the cropbox is calculated correctly, it just isn't applied to the pdf.

I tried downgrading PyPDF2 to 2.12.1, it works correctly.

I got these versions:

$ python --version
Python 3.10.3
$ pip install pdfcropmargins
Collecting pdfcropmargins
  Using cached pdfCropMargins-1.1.8-py2.py3-none-any.whl (1.8 MB)
Collecting PyPDF2>=2.11.0
  Using cached pypdf2-3.0.0-py3-none-any.whl (232 kB)
Collecting pillow>=9.3.0
  Downloading Pillow-9.3.0-cp310-cp310-win_amd64.whl (2.5 MB)
     ---------------------------------------- 2.5/2.5 MB 9.2 MB/s eta 0:00:00
Collecting wheel
  Downloading wheel-0.38.4-py3-none-any.whl (36 kB)
Collecting PySimpleGUI>=4.40.0
  Using cached PySimpleGUI-4.60.4-py3-none-any.whl (509 kB)
Collecting PyMuPDF>=1.20.0
  Downloading PyMuPDF-1.21.1-cp310-cp310-win_amd64.whl (11.7 MB)
     ---------------------------------------- 11.7/11.7 MB 11.1 MB/s eta 0:00:00
Installing collected packages: PySimpleGUI, wheel, PyPDF2, PyMuPDF, pillow, pdfcropmargins
  Attempting uninstall: pillow
    Found existing installation: Pillow 9.2.0
    Uninstalling Pillow-9.2.0:
      Successfully uninstalled Pillow-9.2.0
Successfully installed PyMuPDF-1.21.1 PyPDF2-3.0.0 PySimpleGUI-4.60.4 pdfcropmargins-1.1.8 pillow-9.3.0 wheel-0.38.4

[notice] A new release of pip available: 22.1.2 -> 22.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip

Downgrading solves the issue:

$ pip uninstall PyPDF2
Found existing installation: PyPDF2 3.0.0
Uninstalling PyPDF2-3.0.0:
  Would remove:
    c:\users\denne\appdata\local\programs\python\python310\lib\site-packages\pypdf2-3.0.0.dist-info\*
    c:\users\denne\appdata\local\programs\python\python310\lib\site-packages\pypdf2\*
Proceed (Y/n)?
  Successfully uninstalled PyPDF2-3.0.0
  
$ pip install --user install pypdf2==2.12.1
Collecting install
  Downloading install-1.3.5-py3-none-any.whl (3.2 kB)
Collecting pypdf2==2.12.1
  Downloading pypdf2-2.12.1-py3-none-any.whl (222 kB)
     ---------------------------------------- 222.8/222.8 kB 6.9 MB/s eta 0:00:00
Installing collected packages: pypdf2, install
Successfully installed install-1.3.5 pypdf2-2.12.1

[notice] A new release of pip available: 22.1.2 -> 22.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip

PdfReadError can not be imported from PyPDF2.utils since PyPDF2 1.27.6

Hello,

I have encountered the following error message:

Error in pdfCropMargins: No system PyPDF2 Python package
was found.  Reinstall pdfCropMargins via pip or install that
dependency ('pip install pypdf2').

I have found out that this is caused by line 71 from main_pdfCropMargins.py:

from PyPDF2.utils import PdfReadError

PdfReadError has been moved to PyPDF2.errors since 1.27.6: see the changelog

Modifying the line to the one below helped me solve the issue. Maybe we could do some version checking before the import?

from PyPDF2.errors import PdfReadError

BR,
Peter

Bottom not cropped as expected on certain PDFs

I got one-page PDFs of websites where the bottom margin can't be cropped with pdfCropMargins (I tried different calculation methods without success). Further investigation and testing revealed, that the bottom-margin is partially shrinked: the more whitespace at the bottom in the original PDF, the larger the remaining margin in the cropped PDF — it's seems like the original margin is shrinked percentually. For reproduction, I attached the original and cropped files in two versions: one with a huge white bottom margin and one with a smaller white bottom margin respectively.
Hope this issue can be fixed.

(Notice: this PDF-cropper https://github.com/ho-tex/pdfcrop removes the bottom margins flawlessly.)

Test-PDF_less_whitespace_cropped.pdf
Test-PDF_less_whitespace_original.pdf
Test-PDF_cropped.pdf
Test-PDF_original.pdf

Error when run with Python 3.8.6

PS C:\Users\username> pip install pdfcropmargins
Processing c:\users\username\appdata\local\pip\cache\wheels\67\04\4e\171216647760de41e8d0c4d25abde3fdefe7ef25eaad6ac135\pdfcropmargins-0.2.15-py2.py3-none-any.whl
Requirement already satisfied: pillow>=7.1.0; python_version >= "3.0.0" in c:\python\lib\site-packages (from pdfcropmargins) (7.2.0)
Requirement already satisfied: PyPDF2 in c:\python\lib\site-packages (from pdfcropmargins) (1.26.0)
Requirement already satisfied: wheel in c:\python\lib\site-packages (from pdfcropmargins) (0.35.1)
Installing collected packages: pdfcropmargins
Successfully installed pdfcropmargins-0.2.15
PS C:\Users\username> pdf-crop-margins.exe .\1.pdf a.pdf

Caught an unexpected exception in the pdfCropMargins program.
Unexpected error: <class 'ModuleNotFoundError'>
Error message : No module named 'readline'

File "c:\python\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 48, in main
from .main_pdfCropMargins import main_crop
File "c:\python\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 54, in
import readline # Makes prompts go to stdout rather than stderr.

Keep center/maintain symmetry after automated cropping

I am really glad I found this tool, it seems to be a perfect solution to my problems regarding automated cropping. However, there is one thing I could not figure out: in some cases, the croppable amount of a pdf is not the same for the left and right sides (margins not equal), but I would like the cropped pdf to still have the same horizontal center point. Therefore, the cropping amount would need to be the minimum of the left and right margins. Is that already implemented in some way?

Plans to preserve pdf annotations?

Hello,

Thanks for the amazing pdfCropMargins!

Do you have any plans to crop PDF files while preserving pdf annotations like clickable links from the table of contents to specific pages of the PDF document?
I think this information is stored in a pdf file in a different way than the text content of the pdf file making such a feature quite challenging. If you have tried implementing something like this, do you think it's easy, demanding, difficult, or impossible to do? (!)

Release git tags

Hi!
Some package managers need to build the packages from source, thus needing to look at the specific commit used for the pypi release.
If that's not a lot more work for you, could you start using tags for pointing to the commit of releases ?

Thanks!

PermissionError...

Hi,

I installed your tool:

pip install pdfCropMargins --upgrade --user 

I'm using Win10 Pro. Local bin is in the Path.
After reboot I tried this command in CLI

pdf-crop-margins -v -s -u name_of_my_file.pdf

But unfortunatelly I got this error message (even multiple files I tried to use for the same command):

Unexpected error:  class 'PermissionError'
Error message : [WinError 32] The process cannot access the file because it is being used by another process:
'C:\\Users\\tomas\\AppData\\Local\\Temp\\pdfCropMarginsTmpDir_ai2f6zow\\pdfCropMarginsTmp_pecp2k1b.pdf'

Thanks for any help!

[Suggestion] Make GUI the default

The way I've been doing GUI + command line is that if someone enters no parameters, just runs the program, then I use the GUI version, versus adding the --gui flag as you have. My thought was that if they want to use the command line, then the user likely has command line experience and will know to type --help to get more about the CLI format.

Seperating header/footer whilst maintaining side to side widths

I am finding it difficult to combine multiple functions
The aim is to auto trim vertical whitespace after clipping top and bottom printer margins but still maintain one common uniform page width

testing
pdf-crop-margins -ap4 20 20 20 20 -u -p 0 sample.pdf
allows to 1st trim bad edge clutter such as in scans or printer top/bottom headings
the -u p0 keeps the results as the minimum uniform width but I want to then force the heights to the minimum is there a way to fix / trim ONLY width OR height

How to remove headers and footers permanently)?

Hello,

I don't know much about PDF, and am confused about *box (mediabox, cropbox, etc.) and the units used in *box and pdfCropMargins (pt vs. %).

What would be the right way to permanently — not just for viewing: The data must no longer be in the output file — remove the headers and footers on most pages of a PDF, while leaving some pages untouched (eg. the first page of each chapter)?

Thank you.

image

How to deal with this new error?

> pdf-crop-margins.exe -p 0 -o o.pdf 1.df -v
 ** On entry to DGEBAL parameter number  3 had an illegal value
 ** On entry to DGEHRD  parameter number  2 had an illegal value
 ** On entry to DORGHR DORGQR parameter number  2 had an illegal value
 ** On entry to DHSEQR parameter number  4 had an illegal value

Caught an unexpected exception in the pdfCropMargins program.
Unexpected error:  <class 'RuntimeError'>
Error message   :  The current Numpy installation ('c:\\python\\lib\\site-packages\\numpy\\__init__.py') fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86

  File "c:\python\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 48, in main
    from .main_pdfCropMargins import main_crop
  File "c:\python\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 78, in <module>
    from .calculate_bounding_boxes import get_bounding_box_list
  File "c:\python\lib\site-packages\pdfCropMargins\calculate_bounding_boxes.py", line 45, in <module>
    from PIL import Image, ImageFilter, __version__ as pillow_version
  File "c:\python\lib\site-packages\PIL\ImageFilter.py", line 20, in <module>
    import numpy
  File "c:\python\lib\site-packages\numpy\__init__.py", line 305, in <module>
    _win_os_check()
  File "c:\python\lib\site-packages\numpy\__init__.py", line 302, in _win_os_check
    raise RuntimeError(msg.format(__file__)) from None

Error in pdfCropMargins: No system PyPDF2 Python package was found

pypdf2 package is not found though it is installed.

  • OS: Windows 11
  • Python version: 3.10.1
  • pip version: 22.0.4

Install output:

> pip install pypdf2 pdfCropMargins --upgrade
Requirement already satisfied: pypdf2 in ....... (1.27.6)
Requirement already satisfied: pdfCropMargins in ....... (1.0.5)
Requirement already satisfied: wheel in ....... (from pdfCropMargins) (0.37.1)
Requirement already satisfied: pillow>=7.1.0 in ....... (from pdfCropMargins) (9.1.0)

Run output:

> pdf-crop-margins.exe .\test.pdf

Error in pdfCropMargins: No system PyPDF2 Python package
was found.  Reinstall pdfCropMargins via pip or install that
dependency ('pip install pypdf2').

Using Python - Readline

Hi,

When I'm trying to run it via python I see the source asks for readline moudle.
can you explain why? should I install it?
and how I can disable the automatic exit?

Thanks!

How to change the location of /tmp directory

I am trying to crop a pdf file of 4000 pages.

My /tmp directory does not have more than 6GB

So is there any way i can tell the pdf-crop-margins to use a preferred /tmp directory location

I have set

export TMPDIR=/mylocation

but it didnt help

Index is lost after cropping PDF

Hi I was using your tool to crop one of my textbooks, but after cropping the PDF the original index (table of contents) is lost.

Problem on Ubuntu 22.04: `'NoneType' object has no attribute 'producer'`

I'm on pdfCropMargins version 1.1.12, with these dependency versions:

> pip install -U pdfCropMargins
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pdfCropMargins in /home/lba/.local/lib/python3.10/site-packages (1.1.12)
Requirement already satisfied: pillow>=9.3.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (9.4.0)
Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (from pdfCropMargins) (0.37.1)
Requirement already satisfied: PyPDF2<3.0.0,>=2.11.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (2.12.1)
Requirement already satisfied: PySimpleGUI>=4.40.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (4.60.4)
Requirement already satisfied: PyMuPDF>=1.20.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (1.21.1)

When I try to run it, I see this error:

> pdf-crop-margins /tmp/in.pdf -o /tmp/foo.pdf

Caught an unexpected exception in the pdfCropMargins program.
Unexpected error:  <class 'AttributeError'>
Error message   :  'NoneType' object has no attribute 'producer'

  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/pdfCropMargins.py", line 59, in main
    output_doc_pathname, exit_code, stdout_str, stderr_str = crop()
  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/pdfCropMargins.py", line 173, in crop
    output_doc_pathname = main_crop(argv_list)
  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/main_pdfCropMargins.py", line 1574, in main_crop
    bounding_box_list, delta_page_nums = process_pdf_file(input_doc_pathname,
  File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/main_pdfCropMargins.py", line 1336, in process_pdf_file
    metadata_info.producer)

pdf-crop-margins --version seems to be about the only thing I can run that does not raise this error.

Thanks for pdfCropMargins and please let me know if there is any more info you need.

How to remove page numbers (footer and header)

Hi,
Thank you very much for this tool. I just switched to Linux and this is quite useful for me. I print a lot of academic papers and I like to remove margins to increase font size and then print 2 pages per sheet to save paper. I don't have a lot of bash knowledge but I managed to automatize this process.
However, I've some issues I'm not able to solve. I'm sorry if they're pretty easy, I spent many hours trying to get this work and writing here has been my last resort.

  • When I use pdfcropmargins, something like this (https://i.imgur.com/ZdJCHPy.png) converts to this (https://i.imgur.com/EjZVfcq.png). I'd like to find out the most eficient way of getting rid of the blank space of the page number.
    Using this cropmargins -p 10 -ap4 0 50 0 0 Document.pdf gives me Warning in pdfCropMargins: The cropbox could not be written to page 20. The error is: rect not in mediabox, but cropmargins -p 10 -ap4 0 40 0 0 Document.pdf works perfectly.
    The problem is that I'd have to set the ap4 manually, which would not allow me to automatize since that value will be different depending on the footer size of the document. Is there any other way to do this? This same happens with headers. I'd like to keep only the relevant part (https://i.imgur.com/hxQs82C.png).

EDIT: Okay, I kept trying with many more pdfs and it seems it does a great job with most academic papers. It only has problems when there is a large blank space between the content and the headers/footers. In these cases, what would be the best option? Choosing the ap4 value by eye?

Thank you very much and sorry if these are noob questions. Btw this may obvious, but I had to install python3-pip along with python3-tk.

calling pdfCropMargins from script

Is there any support for using this library from within another Python script, other than using os to call the program from the command line?

import os
os.system('pdf-crop-margins document.pdf -o cropped.pdf -p 0')

I tried something like the following:

from pdfCropMargins.main_pdfCropMargins import process_pdf_file
process_pdf_file("document.pdf","document.pdf","output.pdf")

but there's seemingly no way to pass the args to the function.

links affected

well working in cropping margins, like a charm, thanks a lot.
and regretfully find links in the cropped pdf not working in some readers, ie. foxit (win10) and pdfexpert (ios)
after optimizing the cropped pdf by acrobat, things get worse, all links not working even in adobe reader.
Bookmarks works well after cropping.

'Document' object has no attribute 'isEncrypted'

When running pdf-crop-margins -u -s -gui doc.pdf, it gives an exception like,

Caught an unexpected exception in the pdfCropMargins program.
Unexpected error:  <class 'AttributeError'>
Error message   :  'Document' object has no attribute 'isEncrypted'

  File "/home/tharindu/.local/lib/python3.8/site-packages/pdfCropMargins/pdfCropMargins.py", line 58, in main
    crop()
  File "/home/tharindu/.local/lib/python3.8/site-packages/pdfCropMargins/pdfCropMargins.py", line 96, in crop
    main_crop(argv_list)
  File "/home/tharindu/.local/lib/python3.8/site-packages/pdfCropMargins/main_pdfCropMargins.py", line 1410, in main_crop
    did_crop = create_gui(input_doc_fname, fixed_input_doc_fname, output_doc_fname,
  File "/home/tharindu/.local/lib/python3.8/site-packages/pdfCropMargins/gui.py", line 311, in create_gui
    num_pages = document_pages.open_document(fixed_input_doc_fname)
  File "/home/tharindu/.local/lib/python3.8/site-packages/pdfCropMargins/pymupdf_routines.py", line 80, in open_document
    if self.document.isEncrypted:

I don't have an idea whether the problem is with that pdf file. However, the pdf is behaving as usual in other tools.

How to deal with this error?

>pdf-crop-margins --pdftoppmLocal --percentRetain 0 1.pdf -o a.pdf

Warning in pdfCropMargins: The wildcards in the path
   a.pdf
failed to expand.  Treating as literal.

Caught an unexpected exception in the pdfCropMargins program.
Unexpected error:  <class 'PyPDF2.utils.PdfReadError'>
Error message   :  Multiple definitions in dictionary at byte 0x1ed72 for key /PageMode

  File "c:\python\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 58, in main
    crop()
  File "c:\python\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 96, in crop
    main_crop(argv_list)
  File "c:\python\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 1397, in main_crop
    process_pdf_file(input_doc_fname, fixed_input_doc_fname, output_doc_fname)
  File "c:\python\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 1116, in process_pdf_file
    all_page_nums = set(range(0, input_doc.getNumPages()))
  File "c:\python\lib\site-packages\PyPDF2\pdf.py", line 1155, in getNumPages
    self._flatten()
  File "c:\python\lib\site-packages\PyPDF2\pdf.py", line 1505, in _flatten
    catalog = self.trailer["/Root"].getObject()
  File "c:\python\lib\site-packages\PyPDF2\generic.py", line 516, in __getitem__
    return dict.__getitem__(self, key).getObject()
  File "c:\python\lib\site-packages\PyPDF2\generic.py", line 178, in getObject
    return self.pdf.getObject(self).getObject()
  File "c:\python\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "c:\python\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "c:\python\lib\site-packages\PyPDF2\generic.py", line 584, in readFromStream
    raise utils.PdfReadError("Multiple definitions in dictionary at byte %s for key %s" 

original file size unchanged

Thanks this is an amazing tool! Reading the documentation I am not sure I can do the following: Suppose I have a pdf with orginal paper size = W x H. Now I want to crop all possible white space but my output file should be still rescaled to have a paper size WxH

Is that possible?

Is it possible to use the tool entirely in-memory

Hi!

I'm wondering if I can use the tool without the file system.

More specifically, I'd like my input to be bytes, my output to be bytes, and also make sure no temporary/intermediate files are created in the process.

This is because I am using wkhtmltopdf's Python wrapper to render PDF-s from HTML templates. I am setting page-height to be something astronomically high to make sure I can generate a single-page (very long) PDF file, and then trim the bottom white space as much as I need.

I'd like to say something like:

result_bytes = crop(["-p4", "100", "0", "100", "100", "-a4", "0", "-28", "0", "0", input_bytes])

It doesn't seem to be supported by default, but perhaps I am missing something or there is a known approach to solve such issues?

GUI support for MacOS

➜  ~ pip install pdfCropMargins[gui] --user --upgrade
➜ no matches found: pdfCropMargins[gui]

the [gui] part of the install fails, not sure if there is a macOS version for the gui available?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.