dcmvdbekerom / pypdfplot Goto Github PK

Pypdfplot uses pyplot to produce plots in PDF-format with the python script and data files embedded. The plot can be edited by changing filename from .pdf to .py and running the script again.

License: GNU General Public License v3.0

Python 100.00%

pypdfplot's People

Contributors

Stargazers

Watchers

Forkers

rodrigovb96 erwanp commanderpho

pypdfplot's Issues

Spyder compatibility

There are some issues with running pypdfplot in Spyder.

In order to load the backend, the backend should be set to automatic in the Tools>Preferences>IPython console >graphics menu.

Because the libraries aren't reloaded, the iteration counter of pypdfplot isn't reset, making pypdfplot think this is a single-script-multiple-plot situation, resulting in pickling of the consecutive plots.

Furthermore, if the modules aren't reloaded this means the files won't be extracted at the beginning, causing problems with PyPDF files with embedded external files.

Put all functions into a class

PyFile and PyPDFVersion are stored in /Root, /Info would be more suited.

As the title suggests, /Info seems more suited. Might even put the filesize info there as well (though this may be difficult to locate).
Issue is that docinfo object does not exist until the after sweeping indirect references, so we need to save the /PyFile and /PyPDFVersion in memory and only write them at the end.

fix_pypdf via command line

Open issues

Update fix_pypdf()
temporary output files should be ByteIO objects instead of files
Some objects take more than 79 lines
\r\n and \n mixing might occur
Fix_pypdf should be done by importing pypdfplot and running as python file
~~- [ ] PyFile and PyPDFVersion are stored in /Root, /Info would be more suited.~~ > not really sure about this, closing for now...

Make pickling work with backend

Currently the figures that are pickled are empty, clearly not the desired behavior.

fix_pypdf by keyword in extract()

This replaces the separate function

Add ASCII85 encoding in addition to ASCIIHex

This gives smaller file sizes

Import of pypdf

Hi, thanks for this very promising package I just discovered. I tried to install and use, and got the following error:

  File "/home/ycopin/Softwares/VirtualEnvs/Python3/lib/python3.5/site-packages/pypdfplot/classes.py", line 35, in <module>
    from pypdf import PdfFileWriter,PdfFileReader

I guess this is because the installation of pypdfplot-0.5.0 prompted the installation of PyPDF4-1.27.0 from pypi, which still imports as PyPDF4. I had to specifically install master version of PyPDF4 (using pip install https://github.com/claird/PyPDF4/archive/master.zip) to make it work.

temporary output files should be ByteIO objects instead of files

As the title suggests. This should reduce the number of problems with access rights etc and reduces fileIO. It's just neater.
No issues expected, just need to find time to do it.

\r\n and \n mixing might occur

Some editors, and especially GitHub, like to change \n into \r\n without letting you know. Super annoying.
The result is that the PyPDF file is corrupted. We should check for this in the read() function.
Moreover, if said PDF file is opened in acrobat it will attempt to "fix" the PDF file, severing the PyPDF functionality.

Initializing function should only export embedded files

Create matplotlib backend

A matplotlib backend would be much "cleaner" than the current approach. This would lead to a few changes:

publish() is replaced by pyplot's very own savefig()
pack() is replaced by the file_list keyword where the user can supply a list of filenames to be packed.

If fname doesn't change in a loop, add consecutive plots as new pages

Move reading of PyPDF file to inserted init() line

Currently, when pypdfplot is imported it immediately reads the file in an attempt to export embedded files. The reason this was done is to ensure that files are exported before the Python scripts ever needs to read them.

This is problematic for a number of reasons, among others because it is opaque and difficult to pass arguments.

I suggest to stop reading "automatically" and instead have the publish function insert a line pypdfplot.init() just after importing the pypdfplot package. It first checks if pypdfplot.init() already is in the doc somewhere and if so skips insertion.

The automatic init could move to a special import called pypdfplot.autoinit, which would function pretty much as before.
Then again, what would be the purpose of an autoinit if inserting the line init() will have exactly the same functionality and is more clear? I currently don't see any.

Corrupted PDF file

I tried with the following code:

import numpy as N
import pypdfplot as P

yy, xx = N.mgrid[-1:+1:100j, -1:+1:100j]   # Coordinate images
t = N.full((100, 100), 0.9)                # Mean transmission map
v = N.r_[[1]*75,1:0.9:25j].reshape(-1, 1)  # Vignetting profile

P.imshow(t * v, vmin=0.81, vmax=0.9, cmap='bone')
P.colorbar()

P.publish()

which generated the attached PDF.
rgs270_vignetting.pdf
However, when opened with evince (GNOME Document Viewer 3.18.2), it displays something like

while it should look as

Did I do anything wrong?

Add example gallery ?

Hello Dirk !

I got the example gallery working on RADIS :
https://radis.readthedocs.io/en/latest/auto_examples/index.html

If you want I can set this up for pypdfplot, using all the examples in /examples

Read advanced xref tables

Acrobat sometimes produces pdf's with xref-tables that can't be read by PyPDF4, the package that handles the PDF side of things.
Looks like in this case the xref table itself is an encoded stream. If this is indeed the case the fix should be relatively easy.

add localize function to copy external files in main folder

Cleanup should be publish() keyword; in_place should be part of cleanup

Detect if plots are generated in iterative loop, and pickle figure for consecutive figures

Some features requests

I have few usability questions:

I wonder if it would be interesting to have a finer control of the I/O with additional possibilities, e.g.:
- python -m pypdfplot embed myscript.py -o myplot.pdf would generate the PDF plot and embed the script
- python -m pypdfplot extract myplot.pdf -o myscript.py would extract embedded script from PDF plot
Is there any possibility to do such a trick with PNG files? (or other bitmap formats)

Thanks again for the package!

add flags to cli

Currently only keywords in the script are read.
It would be good to have this control also in the command line.

Some objects take more than 79 lines

Proposed solution: encode all potentially long objects (i.e. lists, dicts, and streams) with hexascii.
The hexascii-writer already adheres to 79 line limit.

Specify newline char in pdf-file so it can be retrieved when corrupted

Find a way to make iterative plots work

Update docs after all updated

Fix_pypdf should be done by importing pypdfplot and running as python file

The "canonical" pypdf way would be to add a line at the top of the document "import pypdfplot" and have the code do the rest.
The issue is that this means that the PDF must be saved in the Python editor, and it may have picked up some binary characters that cause the interpreter to choke,

Test if binary characters are problematic, i.e., if the PDF file is corrupted after opening and saving in a Python editor (IDLE)
Add check for linearized PDF's in read() function. If linearized PDF detected, salvage the PDF and save as fresh PyPDF file.