Coder Social home page Coder Social logo

dcmvdbekerom / pypdfplot Goto Github PK

View Code? Open in Web Editor NEW
65.0 65.0 3.0 27.56 MB

Pypdfplot uses pyplot to produce plots in PDF-format with the python script and data files embedded. The plot can be edited by changing filename from .pdf to .py and running the script again.

License: GNU General Public License v3.0

Python 100.00%

pypdfplot's People

Contributors

dcmvdbekerom avatar rodrigovb96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pypdfplot's Issues

Spyder compatibility

There are some issues with running pypdfplot in Spyder.

In order to load the backend, the backend should be set to automatic in the Tools>Preferences>IPython console >graphics menu.

Because the libraries aren't reloaded, the iteration counter of pypdfplot isn't reset, making pypdfplot think this is a single-script-multiple-plot situation, resulting in pickling of the consecutive plots.

Furthermore, if the modules aren't reloaded this means the files won't be extracted at the beginning, causing problems with PyPDF files with embedded external files.

PyFile and PyPDFVersion are stored in /Root, /Info would be more suited.

As the title suggests, /Info seems more suited. Might even put the filesize info there as well (though this may be difficult to locate).
Issue is that docinfo object does not exist until the after sweeping indirect references, so we need to save the /PyFile and /PyPDFVersion in memory and only write them at the end.

Open issues

  • Update fix_pypdf()
  • temporary output files should be ByteIO objects instead of files
  • Some objects take more than 79 lines
  • \r\n and \n mixing might occur
  • Fix_pypdf should be done by importing pypdfplot and running as python file
    - [ ] PyFile and PyPDFVersion are stored in /Root, /Info would be more suited. > not really sure about this, closing for now...

Import of pypdf

Hi, thanks for this very promising package I just discovered. I tried to install and use, and got the following error:

  File "/home/ycopin/Softwares/VirtualEnvs/Python3/lib/python3.5/site-packages/pypdfplot/classes.py", line 35, in <module>
    from pypdf import PdfFileWriter,PdfFileReader

I guess this is because the installation of pypdfplot-0.5.0 prompted the installation of PyPDF4-1.27.0 from pypi, which still imports as PyPDF4. I had to specifically install master version of PyPDF4 (using pip install https://github.com/claird/PyPDF4/archive/master.zip) to make it work.

\r\n and \n mixing might occur

Some editors, and especially GitHub, like to change \n into \r\n without letting you know. Super annoying.
The result is that the PyPDF file is corrupted. We should check for this in the read() function.
Moreover, if said PDF file is opened in acrobat it will attempt to "fix" the PDF file, severing the PyPDF functionality.

Create matplotlib backend

A matplotlib backend would be much "cleaner" than the current approach. This would lead to a few changes:

  • publish() is replaced by pyplot's very own savefig()
  • pack() is replaced by the file_list keyword where the user can supply a list of filenames to be packed.

Move reading of PyPDF file to inserted init() line

Currently, when pypdfplot is imported it immediately reads the file in an attempt to export embedded files. The reason this was done is to ensure that files are exported before the Python scripts ever needs to read them.

This is problematic for a number of reasons, among others because it is opaque and difficult to pass arguments.

I suggest to stop reading "automatically" and instead have the publish function insert a line pypdfplot.init() just after importing the pypdfplot package. It first checks if pypdfplot.init() already is in the doc somewhere and if so skips insertion.

The automatic init could move to a special import called pypdfplot.autoinit, which would function pretty much as before.
Then again, what would be the purpose of an autoinit if inserting the line init() will have exactly the same functionality and is more clear? I currently don't see any.

Corrupted PDF file

I tried with the following code:

import numpy as N
import pypdfplot as P

yy, xx = N.mgrid[-1:+1:100j, -1:+1:100j]   # Coordinate images
t = N.full((100, 100), 0.9)                # Mean transmission map
v = N.r_[[1]*75,1:0.9:25j].reshape(-1, 1)  # Vignetting profile

P.imshow(t * v, vmin=0.81, vmax=0.9, cmap='bone')
P.colorbar()

P.publish()

which generated the attached PDF.
rgs270_vignetting.pdf
However, when opened with evince (GNOME Document Viewer 3.18.2), it displays something like
Screenshot from 2020-11-20 17-09-42
while it should look as
rgs270_vignetting

Did I do anything wrong?

Read advanced xref tables

Acrobat sometimes produces pdf's with xref-tables that can't be read by PyPDF4, the package that handles the PDF side of things.
Looks like in this case the xref table itself is an encoded stream. If this is indeed the case the fix should be relatively easy.

Some features requests

I have few usability questions:

  • I wonder if it would be interesting to have a finer control of the I/O with additional possibilities, e.g.:
    • python -m pypdfplot embed myscript.py -o myplot.pdf would generate the PDF plot and embed the script
    • python -m pypdfplot extract myplot.pdf -o myscript.py would extract embedded script from PDF plot
  • Is there any possibility to do such a trick with PNG files? (or other bitmap formats)

Thanks again for the package!

add flags to cli

Currently only keywords in the script are read.
It would be good to have this control also in the command line.

Some objects take more than 79 lines

Proposed solution: encode all potentially long objects (i.e. lists, dicts, and streams) with hexascii.
The hexascii-writer already adheres to 79 line limit.

Fix_pypdf should be done by importing pypdfplot and running as python file

The "canonical" pypdf way would be to add a line at the top of the document "import pypdfplot" and have the code do the rest.
The issue is that this means that the PDF must be saved in the Python editor, and it may have picked up some binary characters that cause the interpreter to choke,

  • Test if binary characters are problematic, i.e., if the PDF file is corrupted after opening and saving in a Python editor (IDLE)
  • Add check for linearized PDF's in read() function. If linearized PDF detected, salvage the PDF and save as fresh PyPDF file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.