abarker / pdfcropmargins Goto Github PK
View Code? Open in Web Editor NEWpdfCropMargins -- a program to crop the margins of PDF files
License: Other
pdfCropMargins -- a program to crop the margins of PDF files
License: Other
With the commandline pdf-crop-margins -u -s myscan.pdf
, the rotated pages are restored before read-in. However, this creates a problem. The rotated pages are supposed to be treated as-is, since the scanner made some not-so-clever decisions and I rotated the pages manually so that they are in the proper direction. Is there an option in the command that let the read-in treat the rotations as-is?
$ pdf-crop-margins input.pdf -p4 100 100 100 1000 -o output.pdf
Error in pdfCropMargins: No system PyPDF2 Python package
was found. Reinstall pdfCropMargins via pip or install that
dependency ('pip install pypdf2').
$ pip install pypdf2 --user
Requirement already satisfied: pypdf2 in /home/ti/.local/lib/python3.6/site-packages (2.12.1)
Requirement already satisfied: typing_extensions>=3.10.0.0 in /home/ti/.local/lib/python3.6/site-packages (from pypdf2) (4.1.1)
Requirement already satisfied: dataclasses in /home/ti/.local/lib/python3.6/site-packages (from pypdf2
) (0.8)
$ pip show PyPDF2
Name: PyPDF2
Version: 2.12.1
Summary: A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files
Home-page:
Author:
Author-email: Mathieu Fenniak [email protected]
License:Location: /home/ti/.local/lib/python3.6/site-packages
Requires: dataclasses, typing_extensions
Required-by: pdfCropMargins
$ python3 -c "import PyPDF2; print(PyPDF2.__version__)"
2.12.1
$ pdf-crop-margins
Error in pdfCropMargins: No system PyPDF2 Python package
was found. Reinstall pdfCropMargins via pip or install that
dependency ('pip install pypdf2').
SO: CentOS 7
Hi,
thanks for sharing this great app, however I found that sometimes the cropping just doesn't work as expected.
Specially when some pages are excluded (eg. the first one with a large picture cover) and the options uniform and evenodd are set - resulting in a seemingly uncropped top margin - at least with my test document.
I believe the bug lies in the main_pdfCropMargins.py modul starting at line 325 where a common bottom and top margin is searched after processing separately the even and odd pages:
min_bottom_margin = min([box[1] for box in combine_even_odd])
max_top_margin = max([box[3] for box in combine_even_odd])
Which means that the search is done on all the pages, including the otherwise excluded ones. I think the proper way of doing this would be something like this:
min_bottom_margin = min([box[1] for i, box in enumerate(combine_even_odd) if i in page_nums_to_crop])
max_top_margin = max([box[3] for i, box in enumerate(combine_even_odd) if i in page_nums_to_crop])
(Python 3.10.8
, pip 22.3.1
) [gui]
option isn't accepted by pip
:
$ pip3 install pdfCropMargins[gui] --user --upgrade
ERROR: You must give at least one requirement to install (see "pip help install")
Works with Python 3.7 but when using Python 3.8.2 on WIndows 10, it can't find ghostscript path (which is in the PATH environment, I check). When -gsp option is specified, there is an error shown bellow during cropping process.
Error:
Caught an unexpected exception in the pdfCropMargins program.
Unexpected error: <class 'PermissionError'>
Error message : [WinError 5] Access is denied
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 102, in main
main_crop()
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 1324, in main_crop
did_crop = create_gui(input_doc_fname, fixed_input_doc_fname, output_doc_fname,
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\gui.py", line 931, in create_gui
bounding_box_list = process_pdf_file(input_doc_fname, fixed_input_doc_fname,
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 1145, in process_pdf_file
bounding_box_list = get_bounding_box_list(doc_with_crop_and_media_boxes_name,
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\calculate_bounding_boxes.py", line 91, in get_bounding_box_list
bbox_list = get_bounding_box_list_render_image(input_doc_fname, input_doc)
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\calculate_bounding_boxes.py", line 136, in get_bounding_box_list_render_image
render_pdf_file_to_image_files(pdf_file_name, temp_image_file_root, program_to_use)
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\calculate_bounding_boxes.py", line 219, in render_pdf_file_to_image_files
ex.render_pdf_file_to_image_files__ghostscript_bmp(
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\external_program_calls.py", line 678, in render_pdf_file_to_image_files__ghostscript_bmp
comm_output = get_external_subprocess_output(command, env=gs_environment)
File "c:\python\python38-32\lib\site-packages\pdfCropMargins\external_program_calls.py", line 264, in get_external_subprocess_output
p = subprocess.Popen(command_list, stdout=subprocess.PIPE,
File "c:\python\python38-32\lib\subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "c:\python\python38-32\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
Hi @abarker. This is indeed an amazing tool.
My use-case is that I created a .py file with the code below somewhere else and the pdfs to crop were present elsewhere. However, I'm encountering a problem when I use the following command to crop and backup a pdf file:
crop(['-ap', '12', '-p', '15', '-u', '-mo', '-su', 'old', '/path/to/some/file'])
The file gets modified correctly but no backup is created. I guess, that's because of using the file path and not using the file name. It's important for use-cases where we can't have the script file and the documents we want to modify in the same directory.
Any guidance on this one?
The cropping works great. But is there a way to choose the aspect ratio of a page after the files have been cropped?
I use a tablet to read some books, and cropping the margins allows the text to get bigger. But most of the time the page size is very different from the tablet screen aspect ratio (which is 4:3)
On Linux, when I run pdfcropmargins -gui FILE
, the preview image of each page is rotated by 180 degrees.
Why does this happen?
create_gui()
calls max_image_size = get_usable_image_size(window, im_wid, im_ht, left_pixels)
.get_usable_image_size
:
usable_width, usable_height = get_window_size(window)
get_window_size
:
width, height = get_window_size_tk()
returns (200, 200)
win_width, win_height = window.Size
= 916x789 (not sure what's going on here!)usable_im_width, usable_im_height = (usable_width - non_im_width-left_pixels, usable_height - non_im_height)
and now usable_im_width
is -251!sg.Image
says "Should be a GIF or a PNG only", but PPM works anyway).I'm guessing the issue is that get_window_size_tk
calls root.winfo_width()/winfo_height()
which returns 200x200, but this is smaller than the actual window from window.Size
. This throws off the subsequent calculations (I didn't verify whether they're correct or not).
I don't know if root.winfo_width()
is broken, or if you're calling it at a time it's not defined to be valid, or if you're not setting up the window correctly to measure what you want. I don't know how to fix this bug.
def get_window_size_tk():
"""Use tk to get an approximation to the usable screen area."""
Why does the function say "window size" but the docstring say "screen area"?
# Go to fullscreen mode to get screen size. This seems to work with
# multiple monitors (which otherwise get counted at a combined size).
root.attributes("-alpha", 0) # Invisible on most systems.
#root.attributes("-fullscreen", True) # Set to actual full-screen size.
The comments say you enter fullscreen to measure screen(?) size, but you never actually do because it's commented out.
Latest commit e8d5928.
Hi,
pdfCropMargins always exits with exitcode 0. Therefore it is not possible to determine if an error occured when calling pdfCropMargins as subprocess.
I am using Python 3.6 on a Win10 machine.
In the file pdfCropMargins.py
in main()
you are capturing the SystemExit
exception twice and don't pass on the exitcode. This problem happens on line 81 and in your function cleanupIgnoringKeyboardInterrupt
on line 65.
In the SystemExit
exception block the exitcode has to be propagated manually otherwise the exitcode from the previous call to sys.exit(..)
which causes the exception will be resetted.
In the exception handling on line 81 I would recommend to add the line exitCode = sys.exc_info()[1]
to set the exitcode properly for cleanupIgnoringKeyboardInterrupt(exitCode)
in the finally
statement. And the SystemExit
block on line 65 should be deleted.
Best regards,
Martin
I have a pdf file with all pages of different page sizes.
I have attached the pdf file test2_cropped_cropped.pdf
Now in this file i want to change the page size of only the 2nd page and keep the remaining pages as they are.
How can we achieve this.
Ys
Simha Rupa DAs
test2_cropped_cropped.pdf
Thanks for the great tool. I just wonder whether it completely removes the margin or simply adds a mask like Preview in Mac?
PyMuPDF renamed a large number of methods, and 1.20 removed all historical method aliases (breaking semver but shhhh...). According to the changelog, 1.20.0 was released a mere 4 days ago on 2022-06-15, but is picked by default by Pip when installing the package (and pdfCropMargins does not pin dependencies).
When running pdf-crop-margins's CLI with pdfCropMargins[gui]
installed (bringing the optional PyMuPDF dependency), it fails calling several PyMuPDF methods:
There may be more renamed methods you call, but changing these 3 calls was sufficient to make the CLI work. To make the GUI start up and load (and possibly save) a PDF file, I had to change more occurrences of getDisplayList and getImageData. If that isn't enough, a full list of renamed methods is at https://pymupdf.readthedocs.io/en/latest/znames.html.
Can you change to the new names, or do you have to preserve compatibility with the old PyMuPDF by probing the presence of the new names?
And additionally the PDF shown on the GUI was rotated by 180 degrees (reproduced on two PDF files), although the actual cropping is performed correctly. Is this a known bug?
`Unexpected error: <class 'AttributeError'>
Error message : module 'signal' has no attribute 'SIGHUP'
File "c:\python\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 92, in main
for s in [signal.SIGABRT, signal.SIGTERM, signal.SIGHUP]:`
`import signal
dir(signal)
Out[2]:
['CTRL_BREAK_EVENT',
'CTRL_C_EVENT',
'Handlers',
'NSIG',
'SIGABRT',
'SIGBREAK',
'SIGFPE',
'SIGILL',
'SIGINT',
'SIGSEGV',
'SIGTERM',
'SIG_DFL',
'SIG_IGN',
'Signals',
'_IntEnum',
'builtins',
'cached',
'doc',
'file',
'loader',
'name',
'package',
'spec',
'_enum_to_int',
'_int_to_enum',
'_signal',
'default_int_handler',
'getsignal',
'set_wakeup_fd',
'signal']`
No attribute 'SIGHUP' is in the module 'signal', but I am sorry I don't know which attribute you really want to use.
I have used this program and another program (Briss 2.0 GitHub page) on the same pdf file.
With Briss I could crop it and the output PDF is technically fine. But I prefer the program pdfCropMargins, because it allows to enter crop values and to give an identical page size to each page.
But pdfCropMargins doesn't work on the PDF. Its output is a white pdf without any content but with the same amount of pages as the input PDF.
The pdfCropMargins gui produces a crop preview that is fine. The automatic cropping of the PDF works in the preview. But when I use the command-line I get this blank output PDF no matter whether I use automatic or manual settings.
Recently i was using pdf-crop-margins to crop white margins in pdf of 4000+ pages and each page size is 6"x22".
The .ppm files in the temp folder are of 8.5mb size each. Ofcourse i have set TMP
variable to a folder where i have enough memory
But my system gets very slow.
So i have decided to use cgroups:
I am using the following configuration
I have created a cgroup
cgcreate -g memory,cpu:groupname/cpulimited_simha
8GB memory out of 12GB and (5)/1024 = 0.5% of cpu
echo $(( 8 * 1024 * 1024 * 1024 )) > /sys/fs/cgroup/memory/groupname/cpulimited_simha/memory.limit_in_bytes
echo 5 > /sys/fs/cgroup/cpu/groupname/cpulimited_simha/cpu.shares
and then run the command as
cgexec -g memory,cpu:groupname/cpulimited_simha pdf-crop-margins -v -p4 100 0 100 100 file.pdf;
By this both the .ppm creation is fast and also the finding the binding box is also fast.
I had to try various combinations but this one helps my documents to get cropped fast.
I also while using heavy tasks on system use great suspender for chromium so that it will help the system be very smooth
I just wanted to share this
Please add some suggestions or correct me also.
In PDFCrop there was an option to set a custom bounding box: --bbox.
--bbox "<left> <bottom> <right> <top>" ($::opt_bbox)
override bounding box found by Ghostscript
with origin at the lower left corner
Is there any equivalent with pdfCropMargins?
I checked the documentation but did not find anything.
I was deperately looking for such an appliction.
i found it at https://tex.stackexchange.com/a/447756.
I was looking for this kind of application
I have a pdf with the margins are not same on each side.
I have a pdf after cropping it i want to have 10dp all around croped/cliped text.
How to do it.
In pdfcrop we do
pdfcrop -m 10 filename.pdf output.pdf
ANSWER: (this worked)
pdf-crop-margins -v -a -6 -p 0 input.pdf
When I installed pdfCropMargins a few days ago, it installed PyPDF2 3.0.0 for me.
The "-p" argument does absolutely nothing when using PyPDF2 3.0.0. I've tried to figure out why it doesn't work, but I cannot figure it out. One strange thing is that "-ap" argument still works with 3.0.0. I.e. "pdfCropMargins -ap 100 -p 0 document1.pdf" crops 100 pixels from each side of the document, but leaves a lot of white space around the small object that is in the middle of the document I'm using for testing.
When using "-v", it looks like the cropbox is calculated correctly, it just isn't applied to the pdf.
I tried downgrading PyPDF2 to 2.12.1, it works correctly.
I got these versions:
$ python --version
Python 3.10.3
$ pip install pdfcropmargins
Collecting pdfcropmargins
Using cached pdfCropMargins-1.1.8-py2.py3-none-any.whl (1.8 MB)
Collecting PyPDF2>=2.11.0
Using cached pypdf2-3.0.0-py3-none-any.whl (232 kB)
Collecting pillow>=9.3.0
Downloading Pillow-9.3.0-cp310-cp310-win_amd64.whl (2.5 MB)
---------------------------------------- 2.5/2.5 MB 9.2 MB/s eta 0:00:00
Collecting wheel
Downloading wheel-0.38.4-py3-none-any.whl (36 kB)
Collecting PySimpleGUI>=4.40.0
Using cached PySimpleGUI-4.60.4-py3-none-any.whl (509 kB)
Collecting PyMuPDF>=1.20.0
Downloading PyMuPDF-1.21.1-cp310-cp310-win_amd64.whl (11.7 MB)
---------------------------------------- 11.7/11.7 MB 11.1 MB/s eta 0:00:00
Installing collected packages: PySimpleGUI, wheel, PyPDF2, PyMuPDF, pillow, pdfcropmargins
Attempting uninstall: pillow
Found existing installation: Pillow 9.2.0
Uninstalling Pillow-9.2.0:
Successfully uninstalled Pillow-9.2.0
Successfully installed PyMuPDF-1.21.1 PyPDF2-3.0.0 PySimpleGUI-4.60.4 pdfcropmargins-1.1.8 pillow-9.3.0 wheel-0.38.4
[notice] A new release of pip available: 22.1.2 -> 22.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip
Downgrading solves the issue:
$ pip uninstall PyPDF2
Found existing installation: PyPDF2 3.0.0
Uninstalling PyPDF2-3.0.0:
Would remove:
c:\users\denne\appdata\local\programs\python\python310\lib\site-packages\pypdf2-3.0.0.dist-info\*
c:\users\denne\appdata\local\programs\python\python310\lib\site-packages\pypdf2\*
Proceed (Y/n)?
Successfully uninstalled PyPDF2-3.0.0
$ pip install --user install pypdf2==2.12.1
Collecting install
Downloading install-1.3.5-py3-none-any.whl (3.2 kB)
Collecting pypdf2==2.12.1
Downloading pypdf2-2.12.1-py3-none-any.whl (222 kB)
---------------------------------------- 222.8/222.8 kB 6.9 MB/s eta 0:00:00
Installing collected packages: pypdf2, install
Successfully installed install-1.3.5 pypdf2-2.12.1
[notice] A new release of pip available: 22.1.2 -> 22.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip
Hello,
I have encountered the following error message:
Error in pdfCropMargins: No system PyPDF2 Python package
was found. Reinstall pdfCropMargins via pip or install that
dependency ('pip install pypdf2').
I have found out that this is caused by line 71 from main_pdfCropMargins.py
:
from PyPDF2.utils import PdfReadError
PdfReadError has been moved to PyPDF2.errors since 1.27.6: see the changelog
Modifying the line to the one below helped me solve the issue. Maybe we could do some version checking before the import?
from PyPDF2.errors import PdfReadError
BR,
Peter
I got one-page PDFs of websites where the bottom margin can't be cropped with pdfCropMargins (I tried different calculation methods without success). Further investigation and testing revealed, that the bottom-margin is partially shrinked: the more whitespace at the bottom in the original PDF, the larger the remaining margin in the cropped PDF — it's seems like the original margin is shrinked percentually. For reproduction, I attached the original and cropped files in two versions: one with a huge white bottom margin and one with a smaller white bottom margin respectively.
Hope this issue can be fixed.
(Notice: this PDF-cropper https://github.com/ho-tex/pdfcrop removes the bottom margins flawlessly.)
Test-PDF_less_whitespace_cropped.pdf
Test-PDF_less_whitespace_original.pdf
Test-PDF_cropped.pdf
Test-PDF_original.pdf
See the title. In the meantime, either Ghostscript or pdftoppm will need to be installed and discoverable on Windows for the program to work there when installed via pip. Running from the copied repo the program can find the fallback binary and execute it correctly.
PS C:\Users\username> pip install pdfcropmargins
Processing c:\users\username\appdata\local\pip\cache\wheels\67\04\4e\171216647760de41e8d0c4d25abde3fdefe7ef25eaad6ac135\pdfcropmargins-0.2.15-py2.py3-none-any.whl
Requirement already satisfied: pillow>=7.1.0; python_version >= "3.0.0" in c:\python\lib\site-packages (from pdfcropmargins) (7.2.0)
Requirement already satisfied: PyPDF2 in c:\python\lib\site-packages (from pdfcropmargins) (1.26.0)
Requirement already satisfied: wheel in c:\python\lib\site-packages (from pdfcropmargins) (0.35.1)
Installing collected packages: pdfcropmargins
Successfully installed pdfcropmargins-0.2.15
PS C:\Users\username> pdf-crop-margins.exe .\1.pdf a.pdf
Caught an unexpected exception in the pdfCropMargins program.
Unexpected error: <class 'ModuleNotFoundError'>
Error message : No module named 'readline'
File "c:\python\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 48, in main
from .main_pdfCropMargins import main_crop
File "c:\python\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 54, in
import readline # Makes prompts go to stdout rather than stderr.
I am really glad I found this tool, it seems to be a perfect solution to my problems regarding automated cropping. However, there is one thing I could not figure out: in some cases, the croppable amount of a pdf is not the same for the left and right sides (margins not equal), but I would like the cropped pdf to still have the same horizontal center point. Therefore, the cropping amount would need to be the minimum of the left and right margins. Is that already implemented in some way?
Hello,
Thanks for the amazing pdfCropMargins!
Do you have any plans to crop PDF files while preserving pdf annotations like clickable links from the table of contents to specific pages of the PDF document?
I think this information is stored in a pdf file in a different way than the text content of the pdf file making such a feature quite challenging. If you have tried implementing something like this, do you think it's easy, demanding, difficult, or impossible to do? (!)
Hi!
Some package managers need to build the packages from source, thus needing to look at the specific commit used for the pypi release.
If that's not a lot more work for you, could you start using tags for pointing to the commit of releases ?
Thanks!
Hi,
I installed your tool:
pip install pdfCropMargins --upgrade --user
pdf-crop-margins -v -s -u name_of_my_file.pdf
But unfortunatelly I got this error message (even multiple files I tried to use for the same command):
Unexpected error: class 'PermissionError'
Error message : [WinError 32] The process cannot access the file because it is being used by another process:
'C:\\Users\\tomas\\AppData\\Local\\Temp\\pdfCropMarginsTmpDir_ai2f6zow\\pdfCropMarginsTmp_pecp2k1b.pdf'
Thanks for any help!
The way I've been doing GUI + command line is that if someone enters no parameters, just runs the program, then I use the GUI version, versus adding the --gui flag as you have. My thought was that if they want to use the command line, then the user likely has command line experience and will know to type --help to get more about the CLI format.
I am finding it difficult to combine multiple functions
The aim is to auto trim vertical whitespace after clipping top and bottom printer margins but still maintain one common uniform page width
testing
pdf-crop-margins -ap4 20 20 20 20 -u -p 0 sample.pdf
allows to 1st trim bad edge clutter such as in scans or printer top/bottom headings
the -u p0 keeps the results as the minimum uniform width but I want to then force the heights to the minimum is there a way to fix / trim ONLY width OR height
Hello,
I don't know much about PDF, and am confused about *box (mediabox, cropbox, etc.) and the units used in *box and pdfCropMargins (pt vs. %).
What would be the right way to permanently — not just for viewing: The data must no longer be in the output file — remove the headers and footers on most pages of a PDF, while leaving some pages untouched (eg. the first page of each chapter)?
Thank you.
I want to know will -gs (using ghostscript for bounding box) crop the same as without this option, if my document is not scanned but a word text (no images) converted to pdf.
> pdf-crop-margins.exe -p 0 -o o.pdf 1.df -v
** On entry to DGEBAL parameter number 3 had an illegal value
** On entry to DGEHRD parameter number 2 had an illegal value
** On entry to DORGHR DORGQR parameter number 2 had an illegal value
** On entry to DHSEQR parameter number 4 had an illegal value
Caught an unexpected exception in the pdfCropMargins program.
Unexpected error: <class 'RuntimeError'>
Error message : The current Numpy installation ('c:\\python\\lib\\site-packages\\numpy\\__init__.py') fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86
File "c:\python\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 48, in main
from .main_pdfCropMargins import main_crop
File "c:\python\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 78, in <module>
from .calculate_bounding_boxes import get_bounding_box_list
File "c:\python\lib\site-packages\pdfCropMargins\calculate_bounding_boxes.py", line 45, in <module>
from PIL import Image, ImageFilter, __version__ as pillow_version
File "c:\python\lib\site-packages\PIL\ImageFilter.py", line 20, in <module>
import numpy
File "c:\python\lib\site-packages\numpy\__init__.py", line 305, in <module>
_win_os_check()
File "c:\python\lib\site-packages\numpy\__init__.py", line 302, in _win_os_check
raise RuntimeError(msg.format(__file__)) from None
pypdf2 package is not found though it is installed.
Install output:
> pip install pypdf2 pdfCropMargins --upgrade
Requirement already satisfied: pypdf2 in ....... (1.27.6)
Requirement already satisfied: pdfCropMargins in ....... (1.0.5)
Requirement already satisfied: wheel in ....... (from pdfCropMargins) (0.37.1)
Requirement already satisfied: pillow>=7.1.0 in ....... (from pdfCropMargins) (9.1.0)
Run output:
> pdf-crop-margins.exe .\test.pdf
Error in pdfCropMargins: No system PyPDF2 Python package
was found. Reinstall pdfCropMargins via pip or install that
dependency ('pip install pypdf2').
Hi,
When I'm trying to run it via python I see the source asks for readline moudle.
can you explain why? should I install it?
and how I can disable the automatic exit?
Thanks!
Given this file as input:
Z03-N03-Z02-N04-survey_EXTRACT.pdf
I get this with -s
Z03-N03-Z02-N04-survey-crop_EXTRACT.pdf
Note that the number on the left axis (3.5) is cut off. This does not happen if I use, e.g., pdfcrop
:
Z03-N03-Z02-N04-survey_EXTRACT-crop.pdf
Any ideas what's going wrong? These are PDFs created by matplotlib...
Hi, I found your project is so useful. I think is't better to provide function to remove header and footer automatically. Thanks!
I am trying to crop a pdf file of 4000 pages.
My /tmp directory does not have more than 6GB
So is there any way i can tell the pdf-crop-margins to use a preferred /tmp directory location
I have set
export TMPDIR=/mylocation
but it didnt help
In a pdf file, one particular page is width is more than the other. I want to change the width of the page and scale the page accordingly.
Is it possible
Hi I was using your tool to crop one of my textbooks, but after cropping the PDF the original index (table of contents) is lost.
I'm on pdfCropMargins
version 1.1.12, with these dependency versions:
> pip install -U pdfCropMargins
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pdfCropMargins in /home/lba/.local/lib/python3.10/site-packages (1.1.12)
Requirement already satisfied: pillow>=9.3.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (9.4.0)
Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (from pdfCropMargins) (0.37.1)
Requirement already satisfied: PyPDF2<3.0.0,>=2.11.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (2.12.1)
Requirement already satisfied: PySimpleGUI>=4.40.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (4.60.4)
Requirement already satisfied: PyMuPDF>=1.20.0 in /home/lba/.local/lib/python3.10/site-packages (from pdfCropMargins) (1.21.1)
When I try to run it, I see this error:
> pdf-crop-margins /tmp/in.pdf -o /tmp/foo.pdf
Caught an unexpected exception in the pdfCropMargins program.
Unexpected error: <class 'AttributeError'>
Error message : 'NoneType' object has no attribute 'producer'
File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/pdfCropMargins.py", line 59, in main
output_doc_pathname, exit_code, stdout_str, stderr_str = crop()
File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/pdfCropMargins.py", line 173, in crop
output_doc_pathname = main_crop(argv_list)
File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/main_pdfCropMargins.py", line 1574, in main_crop
bounding_box_list, delta_page_nums = process_pdf_file(input_doc_pathname,
File "/home/lba/.local/lib/python3.10/site-packages/pdfCropMargins/main_pdfCropMargins.py", line 1336, in process_pdf_file
metadata_info.producer)
pdf-crop-margins --version
seems to be about the only thing I can run that does not raise this error.
Thanks for pdfCropMargins
and please let me know if there is any more info you need.
Hi,
Thank you very much for this tool. I just switched to Linux and this is quite useful for me. I print a lot of academic papers and I like to remove margins to increase font size and then print 2 pages per sheet to save paper. I don't have a lot of bash knowledge but I managed to automatize this process.
However, I've some issues I'm not able to solve. I'm sorry if they're pretty easy, I spent many hours trying to get this work and writing here has been my last resort.
cropmargins -p 10 -ap4 0 50 0 0 Document.pdf
gives me Warning in pdfCropMargins: The cropbox could not be written to page 20. The error is: rect not in mediabox
, but cropmargins -p 10 -ap4 0 40 0 0 Document.pdf
works perfectly.ap4
manually, which would not allow me to automatize since that value will be different depending on the footer size of the document. Is there any other way to do this? This same happens with headers. I'd like to keep only the relevant part (https://i.imgur.com/hxQs82C.png).EDIT: Okay, I kept trying with many more pdfs and it seems it does a great job with most academic papers. It only has problems when there is a large blank space between the content and the headers/footers. In these cases, what would be the best option? Choosing the ap4
value by eye?
Thank you very much and sorry if these are noob questions. Btw this may obvious, but I had to install python3-pip along with python3-tk.
Is there any support for using this library from within another Python script, other than using os
to call the program from the command line?
import os
os.system('pdf-crop-margins document.pdf -o cropped.pdf -p 0')
I tried something like the following:
from pdfCropMargins.main_pdfCropMargins import process_pdf_file
process_pdf_file("document.pdf","document.pdf","output.pdf")
but there's seemingly no way to pass the args
to the function.
well working in cropping margins, like a charm, thanks a lot.
and regretfully find links in the cropped pdf not working in some readers, ie. foxit (win10) and pdfexpert (ios)
after optimizing the cropped pdf by acrobat, things get worse, all links not working even in adobe reader.
Bookmarks works well after cropping.
When running pdf-crop-margins -u -s -gui doc.pdf
, it gives an exception like,
Caught an unexpected exception in the pdfCropMargins program.
Unexpected error: <class 'AttributeError'>
Error message : 'Document' object has no attribute 'isEncrypted'
File "/home/tharindu/.local/lib/python3.8/site-packages/pdfCropMargins/pdfCropMargins.py", line 58, in main
crop()
File "/home/tharindu/.local/lib/python3.8/site-packages/pdfCropMargins/pdfCropMargins.py", line 96, in crop
main_crop(argv_list)
File "/home/tharindu/.local/lib/python3.8/site-packages/pdfCropMargins/main_pdfCropMargins.py", line 1410, in main_crop
did_crop = create_gui(input_doc_fname, fixed_input_doc_fname, output_doc_fname,
File "/home/tharindu/.local/lib/python3.8/site-packages/pdfCropMargins/gui.py", line 311, in create_gui
num_pages = document_pages.open_document(fixed_input_doc_fname)
File "/home/tharindu/.local/lib/python3.8/site-packages/pdfCropMargins/pymupdf_routines.py", line 80, in open_document
if self.document.isEncrypted:
I don't have an idea whether the problem is with that pdf file. However, the pdf is behaving as usual in other tools.
>pdf-crop-margins --pdftoppmLocal --percentRetain 0 1.pdf -o a.pdf
Warning in pdfCropMargins: The wildcards in the path
a.pdf
failed to expand. Treating as literal.
Caught an unexpected exception in the pdfCropMargins program.
Unexpected error: <class 'PyPDF2.utils.PdfReadError'>
Error message : Multiple definitions in dictionary at byte 0x1ed72 for key /PageMode
File "c:\python\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 58, in main
crop()
File "c:\python\lib\site-packages\pdfCropMargins\pdfCropMargins.py", line 96, in crop
main_crop(argv_list)
File "c:\python\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 1397, in main_crop
process_pdf_file(input_doc_fname, fixed_input_doc_fname, output_doc_fname)
File "c:\python\lib\site-packages\pdfCropMargins\main_pdfCropMargins.py", line 1116, in process_pdf_file
all_page_nums = set(range(0, input_doc.getNumPages()))
File "c:\python\lib\site-packages\PyPDF2\pdf.py", line 1155, in getNumPages
self._flatten()
File "c:\python\lib\site-packages\PyPDF2\pdf.py", line 1505, in _flatten
catalog = self.trailer["/Root"].getObject()
File "c:\python\lib\site-packages\PyPDF2\generic.py", line 516, in __getitem__
return dict.__getitem__(self, key).getObject()
File "c:\python\lib\site-packages\PyPDF2\generic.py", line 178, in getObject
return self.pdf.getObject(self).getObject()
File "c:\python\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
retval = readObject(self.stream, self)
File "c:\python\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "c:\python\lib\site-packages\PyPDF2\generic.py", line 584, in readFromStream
raise utils.PdfReadError("Multiple definitions in dictionary at byte %s for key %s"
valid in saladict,acrobat,zotero
not valid in evince,chrome,okular
what reasons may result in this
Thanks this is an amazing tool! Reading the documentation I am not sure I can do the following: Suppose I have a pdf with orginal paper size = W x H. Now I want to crop all possible white space but my output file should be still rescaled to have a paper size WxH
Is that possible?
Hi!
I'm wondering if I can use the tool without the file system.
More specifically, I'd like my input to be bytes, my output to be bytes, and also make sure no temporary/intermediate files are created in the process.
This is because I am using wkhtmltopdf
's Python wrapper to render PDF-s from HTML templates. I am setting page-height
to be something astronomically high to make sure I can generate a single-page (very long) PDF file, and then trim the bottom white space as much as I need.
I'd like to say something like:
result_bytes = crop(["-p4", "100", "0", "100", "100", "-a4", "0", "-28", "0", "0", input_bytes])
It doesn't seem to be supported by default, but perhaps I am missing something or there is a known approach to solve such issues?
➜ ~ pip install pdfCropMargins[gui] --user --upgrade
➜ no matches found: pdfCropMargins[gui]
the [gui]
part of the install fails, not sure if there is a macOS version for the gui available?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.