ickc / pantable Goto Github PK

View Code? Open in Web Editor NEW

84.0 6.0 15.0 7.63 MB

CSV Tables in Markdown: Pandoc Filter for CSV Tables

Home Page: https://ickc.github.io/pantable/

License: BSD 3-Clause "New" or "Revised" License

Makefile 4.33% Python 95.67%

pandoc pandoc-filters pandoc-filter csv

pantable's Introduction

Pantable—A Python library for writing pandoc filters for tables with batteries included.

Date: January 25, 2022

Introduction

Pantable is a Python library that maps the pandoc Table AST to an internal structure losslessly. This enables writing pandoc filters specifically manipulating tables in pandoc.

This also comes with 3 pandoc filters, pantable, pantable2csv, pantable2csvx.

pantable is the main filter, introducing a syntax to include CSV table in markdown source. It supports all table features supported by the pandoc Table AST.

pantable2csv complements pantable, is the inverse of pantable, which convert native pandoc tables into the CSV table format defined by pantable. This is lossy as of pandoc 2.11+, which is supported since pantable 0.13.

pantable2csvx (experimental, may drop in the future) is similar to pantable2csv, but introduces an extra column with the fancy-table syntax defined below such that any general pandoc Table AST can be losslessly encoded in CSV format.

Some example uses are:

You already have tables in CSV format.
You feel that directly editing markdown table is troublesome. You want a spreadsheet interface to edit, but want to convert it to native pandoc table for higher readability. And this process might go back and forth.
You want lower-level control on the table and column widths.
You want to use all table features supported by the pandoc’s internal AST table format, which is not possible in markdown for pandoc (as of writing.)

A word on support

Note that the above is exactly how I use pantable personally. So you can count on the round-trip losslessness. pantable and pantable2csv should have robust support since it has been used for years. But since pandoc 2.11 the table AST has been majorly revised. Pantable 0.13 added support for this new AST by completely rewriting pantable, at the same time addresses some of the shortcoming of the original design. Part of the new design is to enable pantable as a library (see Pantable as a library below) so that its functionality can be extended, similar to how to write a pandoc filter to intercept the AST and modify it, you can intercept the internal structure of PanTable and modify it.

However, since this library is completely rewritten as of v0.13,

pantable and pantable2csv as pandoc filters should be stable
- there may be regression, please open an issue to report this
round-trip losslessness may break, please open an issue to report this
pantable2csvx as pandoc filter is experimental. API here might change in the future or may be dropped completed (e.g. replaces by something even more general)
Pantable as a library also is experimental, meaning that the API might be changed in the future.

Installation

Pip

To manage pantable using pip, open the command line and run

pip install pantable to install
- pip install https://github.com/ickc/pantable/archive/master.zip to install the in-development version
pip install -U pantable to upgrade
pip uninstall pantable to remove

You need a matching pandoc version for pantable to work flawlessly. See Supported pandoc versions for details. Or, use the Conda method to install below to have the pandoc version automatically managed for you.

Conda

To manage pantable with a matching pandoc version, open the command line and run

conda install -c conda-forge pantable to install
conda update pantable to upgrade
conda remove pantable to remove

You may also replace conda by mamba, which is basically a drop-in replacement of the conda package manager. See mamba-org/mamba: The Fast Cross-Platform Package Manager for details.

Note on versions

Supported Python versions

pantable v0.12 drop Python 2 support. You need to pip install pantable<0.12 if you need to run it on Python 2.

To enforce using Python 3, depending on your system, you may need to specify python3 and pip3 explicitly.

Check the badge above or setup.py for supported Python versions, setup.py further indicates support of pypy in additional of CPython.

Supported pandoc versions

pandoc versioning semantics is MAJOR.MAJOR.MINOR.PATCH and panflute’s is MAJOR.MINOR.PATCH. Below we shows matching versions of pandoc that panflute supports, in descending order. Only major version is shown as long as the minor versions doesn’t matter.

Version Matching¹

pantable	panflute version	supported pandoc versions	supported pandoc API versions
0.14.1	2.1.3	2.11.0.4–2.16.x	1.22–1.22.1
0.14	2.1	2.11.0.4—2.14.x	1.22
0.13	2.0	2.11.0.4—2.11.x	1.22
	not supported	2.10	1.21
0.12	1.12	2.7-2.9	1.17.5–1.20

Note: pandoc 2.10 is short lived and 2.11 has minor API changes comparing to that, mainly for fixing its shortcomings. Please avoid using pandoc 2.10.

To use pantable with pandoc < 2.10, install pantable 0.12 explicitly by pip install pantable~=0.12.4.

Pantable as pandoc filters

`pantable`

This allows CSV tables, optionally containing markdown syntax (disabled by default), to be put in markdown as a fenced code blocks.

Example

Also see the README in GitHub Pages.

```table
---
caption: '*Awesome* **Markdown** Table'
alignment: RC
table-width: 2/3
markdown: True
---
First row,defaulted to be header row,can be disabled
1,cell can contain **markdown**,"It can be aribrary block element:

- following standard markdown syntax
- like this"
2,"Any markdown syntax, e.g.",E = mc^2^
```

becomes

AwesomeMarkdown Table

First row	defaulted to be header row	can be disabled
1	cell can contain markdown	It can be aribrary block element: following standard markdown syntax like this
2	Any markdown syntax, e.g.	E = mc²

First row

defaulted to be header row

can be disabled

cell can contain markdown

It can be aribrary block element:

following standard markdown syntax
like this

Any markdown syntax, e.g.

E = mc²

(The equation might not work if you view this on PyPI.)

Usage

pandoc -F pantable -o README.html README.md

Syntax

Fenced code blocks is used, with a class table. See Example.

Optionally, YAML metadata block can be used within the fenced code block, following standard pandoc YAML metadata block syntax. 7 metadata keys are recognized:

caption

the caption of the table. Can be block-like. If omitted, no caption will be inserted. Interpreted as markdown only if markdown: true below.

Default: disabled.

short-caption

the short-caption of the table. Must be inline-like element. Interpreted as markdown only if markdown: true below.

Default: disabled.

alignment

alignment for columns: a string of characters among L,R,C,D, case-insensitive, corresponds to Left-aligned, Right-aligned, Center-aligned, Default-aligned respectively. e.g. LCRD for a table with 4 columns.

You can specify only the beginning that’s non-default. e.g. DLCR for a table with 8 columns is equivalent to DLCRDDDD.

Default: DDD...

alignment-cells

alignment per cell. One row per line. A string of characters among L,R,C,D, case-insensitive, corresponds to Left-aligned, Right-aligned, Center-aligned, Default-aligned respectively. e.g.

LCRD
DRCL

for a table with 4 columns, 2 rows.

you can specify only the top left block that is not default, and the rest of the cells with be default to default automatically. e.g.

DC
LR

for a table with 4 columns, 3 rows will be equivalent to

DCDD
LRDD
DDDD

Default: DDD...\n...

width

a list of relative width corresponding to the width of each columns. D means default width. e.g.

- width
    - 0.1
    - 0.2
    - 0.3
    - 0.4
    - D

Again, you can specify only the left ones that are non-default and it will be padded with defaults.

Default: [D, D, D, ...]

table-width

the relative width of the table (e.g. relative to \linewidth). If specified as a number, and if any of the column width in width is default, then auto-width will be performed such that the sum of width equals this number.

Default: None

header

If it has a header row or not.

Default: True

markdown

If CSV table cell contains markdown syntax or not.

Default: False

fancy_table

if true, then the first column of the table will be interpreted as a special fancy-table syntax s.t. it encodes which rows are

table-header,
table-foot,
multiple table-bodies and
“body-head” within table-bodies.

see example below.

include

the path to an CSV file, can be relative/absolute. If non-empty, override the CSV in the CodeBlock.

Default: None

include-encoding

if specified, the file from include will be decoded according to this encoding, else assumed to be UTF-8. Hint: if you save the CSV file via Microsoft Excel, you may need to set this to utf-8-sig.

csv-kwargs

If specified, should be a dictionary passed to csv.reader as options. e.g.

---
csv-kwargs:
  dialect: unix
  key: value...
...

format

The file format from the data in code-block or include if specified.

Default: csv for data from code-block, and infer from extension in include.

Currently only csv is supported.

ms

(experimental, may drop in the future): a list of int that specifies the number of rows per row-block. e.g. [2, 6, 3, 4, 5, 1] means the table should have 21 rows, first 2 rows are table-head, last 1 row is table-foot, there are 2 table-bodies (indicated by 6, 3, 4, 5 in the middle) where the 1st body 6, 3 has 6 body-head and 3 “body-body”, and the 2nd body 4, 5 has 4 body-head and 5 “body-body”.

If this is specified, header will be ignored.

Default: None, which would be inferred from header.

ns_head

(experimental, may drop in the future): a list of int that specifies the number of head columns per table-body. e.g. [1, 2] means the 1st table-body has 1 column of head, the 2nd table-body has 2 column of head

Default: None

`pantable2csv`

This one is the inverse of pantable, a panflute filter to convert any native pandoc tables into the CSV table format used by pantable.

Effectively, pantable forms a “CSV Reader”, and pantable2csv forms a “CSV Writer”. It allows you to convert back and forth between these 2 formats.

For example, in the markdown source:

+--------+---------------------+--------------------------+
| First  | defaulted to be     | can be disabled          |
| row    | header row          |                          |
+========+=====================+==========================+
| 1      | cell can contain    | It can be aribrary block |
|        | **markdown**        | element:                 |
|        |                     |                          |
|        |                     | -   following standard   |
|        |                     |     markdown syntax      |
|        |                     | -   like this            |
+--------+---------------------+--------------------------+
| 2      | Any markdown        | $$E = mc^2$$             |
|        | syntax, e.g.        |                          |
+--------+---------------------+--------------------------+

: *Awesome* **Markdown** Table

running pandoc -F pantable2csv -o output.md input.md, it becomes

``` {.table}
---
alignment: DDD
caption: '*Awesome* **Markdown** Table'
header: true
markdown: true
table-width: 0.8055555555555556
width: [0.125, 0.3055555555555556, 0.375]
---
First row,defaulted to be header row,can be disabled
1,cell can contain **markdown**,"It can be aribrary block element:

-   following standard markdown syntax
-   like this
"
2,"Any markdown syntax, e.g.",$$E = mc^2$$
```

`pantable2csvx`

(experimental, may drop in the future)

Similar to pantable2csv, but convert with fancy_table syntax s.t. any general Table in pandoc AST is in principle losslessly converted to a markdown-ish syntax in a CSV representation.

e.g.

pandoc -F pantable2csvx -o tests/files/native_reference/planets.md tests/files/native/planets.native

would turn the native Table from platnets.native² to

``` {.table}
---
caption: Data about the planets of our solar system.
alignment: CCDRRRRRRRR
ns-head:
- 3
markdown: true
fancy-table: true
...
===,"(1, 2)
",,Name,Mass (10\^24kg),Diameter (km),Density (kg/m\^3),Gravity (m/s\^2),Length of day (hours),Distance from Sun (10\^6km),Mean temperature (C),Number of moons,Notes
,"(4, 2)
Terrestrial planets",,Mercury,0.330,"4,879",5427,3.7,4222.6,57.9,167,0,Closest to the Sun
,,,Venus,4.87,"12,104",5243,8.9,2802.0,108.2,464,0,
,,,Earth,5.97,"12,756",5514,9.8,24.0,149.6,15,1,Our world
,,,Mars,0.642,"6,792",3933,3.7,24.7,227.9,-65,2,The red planet
,"(4, 1)
Jovian planets","(2, 1)
Gas giants",Jupiter,1898,"142,984",1326,23.1,9.9,778.6,-110,67,The largest planet
,,,Saturn,568,"120,536",687,9.0,10.7,1433.5,-140,62,
,,"(2, 1)
Ice giants",Uranus,86.8,"51,118",1271,8.7,17.2,2872.5,-195,27,
,,,Neptune,102,"49,528",1638,11.0,16.1,4495.1,-200,14,
___,"(1, 2)
Dwarf planets",,Pluto,0.0146,"2,370",2095,0.7,153.3,5906.4,-225,5,Declassified as a planet in 2006.
```

Pantable as a library

(experimental, API may change in the future)

Documentation here is sparse, partly because the upstream (pandoc) may change the table AST again. See Crazy ideas: table structure from upstream GitHub.

See the API docs in https://ickc.github.io/pantable/.

For example, looking at the source of pantable as a pandoc filter, in codeblock_to_table.py, you will see the main function doing the work is now

pan_table_str = (
    PanCodeBlock
    .from_yaml_filter(options=options, data=data, element=element, doc=doc)
    .to_pantablestr()
)
if pan_table_str.table_width is not None:
    pan_table_str.auto_width()
return (
    pan_table_str
    .to_pantable()
    .to_panflute_ast()
)

You can see another example from table_to_codeblock.py which is what pantable2csv and pantable2csvx called.

Below is a diagram illustrating the API:

Overview

Solid arrows are lossless conversions. Dashed arrows are lossy.

You can see the pantable internal structure, PanTable is one-one correspondence to the pandoc Table AST. Similarly for PanCodeBlock.

It can then losslessly converts between PanTable and PanTableMarkdown, where everything in PanTableMarkdown is now markdown strings (whereas those in PanTable are panflute or panflute-like AST objects.)

Lastly, it defines a one-one correspondence to PanCodeBlock with fancy_table syntax mentioned earlier.

Below is the same diagram with the method names. You’d probably want to zoom into it to see it clearly.

Detailed w/ methods

Development

To run all the tests run tox. GitHub Actions is used for CI too so if you fork this you can check if your commits passes there.

(The table here is created in the beginning of pantable, which has since added more features. This is left here for historical reason and also as a credit to those before this.)

The followings are pandoc filters written in Haskell that provide similar functionality. This filter is born after testing with theirs.

	pandoc-csv2table	pandoc-placetable	panflute example	pantable
caption	caption	caption	title	caption
aligns	aligns = LRCD	aligns = LRCD		aligns = LRCD
width		widths = "0.5 0.2 0.3"		width: [0.5, 0.2, 0.3]
table-width				table-width: 1.0
header	header = yes \| no	header = yes \| no	has_header: True \| False	header: True \| False \| yes \| NO
markdown		inlinemarkdown		markdown: True \| False \| yes \| NO
source	source	file	source	include
others	type = simple \| multiline \| grid \| pipe
		delimiter
		quotechar
		id (wrapped by div)
Notes				width are auto-calculated when width is not specified

For pandoc API verion, check https://hackage.haskell.org/package/pandoc for pandoc-types, which is the same thing.↩
copied from pandoc from here, which was dual licensed as CC0 here ↩

pantable's People

Contributors

Stargazers

Watchers

Forkers

reenberg lahorichargha gepcel sunt05 alerque brenthueth dilawar alabrashjr xoe-labs ppenguin davidplpl cawa0505 gly-git sympley qzdeng

pantable's Issues

Add other ways of injecting PanTableOption

See #28 by @ber532k

Thoughts:

it could be related to #10, when pantable is turned into a cli tool, what is suggested in #28 is doable.
use env. var. to setup the default. In that case people can setup their own defaults in .bash_profile or .bashrc permanently. This is easy to do. And considering that pandoc is unlikely to provide a better way to pass cli options to filter, and the current recommended ways of passing info to filter is via env. var., this might be just the way to solve #10
global defaults in YAML front matter. Another often suggested way to pass info to filter. A bit more complicated to do, since another function to walk the YAML is needed. This is more self-contained in the document though (i.e. more reproducible).

In the end may be all 3 methods should be provided as a hierarchy ways of defining the defaults. (priority, higher first: YAML in CSV table, YAML front matter, cli args, env. var., default)

pantable as a cli tool

Depending on #8 and #9:

the syntax probably would goes something like

# pandoc like args
## -s means pantable is used as a "standalone cli tool"
pantable -s -o table.csv table.md
## without -s, it acts as a pandoc filter:
pandoc -t json table.md | pantable -t csv # -t csv means convert native pandoc table into a CSV table in CodeBlock
# or simply when filter arg is not needed
pandoc -F pantable table.md

Edit: it also has a question of how to represent the metadata in some output formats, e.g. to CSV:

pantable -s -o table.csv table.md

Suppose table.md only has a table. The output .csv still needs to store the metadata in some way. Possible solutions:

CSV comment # ....: non-standard. e.g. Excel won't understand this.
no metadata and pantable -f csv will chooses "sensible default", and uses other args (that will be given) to override it.

Space character after `,` between column contents creates a column?

This is extracted from a larger table so please ignore the row designation within the content.

Also, I couldn't get backtick quoted blocks containing backticks to work on GitHub, so they are missing from the table declarations below though they are properly present in the source document.

The first table behaves as expected, the second gives:

pantable: table rows are of irregular length. Empty cells appended.

The difference, which is almost impossible to see, is that there is a space character following the comma between columns one and two. The single space character is treated as it own column with the following quoted text becoming column one of a new row. I would think the space between the comma and the opening quote would be silently swallowed by the CSV parser.

table
---
caption: '__Not Broken, No Space After Comma__'
alignment: RRR
table-width: 2/3
markdown: True
---
*First row*,__defaulted to be header row__, __*can be disabled*__
"Row-3-Col-1-Arbitrary block element:

- following standard markdown syntax
- like this
- Row-3-Col-1-END.","Row-3-Col-2-Another Arbitrary block element:

    1. Number 1 -- Row-3-Col-2
    2. Number 2 -- Row-3-Col-2
        - Mixed #1 Row-3-Col-2
        - Mixed #2 Row-3-Col-2","Row-3-Col-3 Nothing Fancy"

table
---
caption: '__Broken, Space After Comma__'
alignment: RRR
table-width: 2/3
markdown: True
---
*First row*,__defaulted to be header row__, __*can be disabled*__
"Row-3-Col-1-Arbitrary block element:

- following standard markdown syntax
- like this
- Row-3-Col-1-END.", "<-- *The space is to the left of that quote!* Arbitrary block element:
    1. Number 1 -- Row-3-Col-2
    2. Number 2 -- Row-3-Col-2
        - Mixed #1 Row-3-Col-2
        - Mixed #2 Row-3-Col-2","Row-3-Col-3 Nothing Fancy"

bug when using fancy_table

e.g.

---
alignment: DLRL
markdown: true
fancy-table: true
...
===,,**asdfsadf**,,
,**asdfasdf**,asdfasd,180,safgafg
,,asdfa,90,asgadsfg
,,asdfsadf,40,asgasfg
,,zxcvxczv,1,asgsafg
___,,zxcvxczv,1,zxcvzxv
,**xzcvxczv**,zxcvzcxv,100,sdfgasg
,,sagfsg,40,asfg
,,asdgfasfg,70,adsfgbbvv
,,asgsadg,30,adfgfdg
___,,Edsafdsag,1,asfgsafg

Show escaping of quotes in example

I think it'd be useful to show how to escape quote marks in the CSV (i.e., double quotes within quote demarcations).

3,"But, I wonder, how to include ""quote mark,"" if possible, ""as they say""",[use a double quote to escape](https://gpdb.docs.pivotal.io/6-5/admin_guide/load/topics/g-escaping-in-csv-formatted-files.html)

footnote in table lost when using pantable2csv

If a native pandoc table that has footnote, and is converted to PanTable (e.g. using pantable2csv), footnote will be lost.

Use another CSV parser?

It is known that unicode support in CSV in Python 2 is tricky (see documentation of the csv moduel, and #14). This is originally written in Python 3 (back then panflute only support Python 3 before I ported it to be compatible in Python 2 as well), and use the same trick applied on panflute to support Python 2. The original thought is having partial Python 2 support (without unicode in CSV) is better than no support at all. But then unavoidably people do use this with Python 2 and unicode.

#14 proposed a fix that could solve the unicode problem. However, for various reason, an alternative CSV parser is considered:

It seems that CSV module in Python 2 and 3 behaves slightly differently. The last thing I want is Python 2 and 3 users see different behavior, leading this package to be less maintainable (frankly I don't want to deal with differences between Python 2 and 3...).
As in #16, #17 that people would like to extend the functionality of pantable to be able to filter subcells from the CSV input. This feature might make the efficiency of CSV parser more critical. It is because if there's no filtering capability, we can reasonably assume the table size is small (for LaTeX, constraints by pages, for others like HTML, at least it is not too big to be rendered by a browser efficiently). But with filtering, the source CSV can be arbitrarily large, and only a small subset of table cells are filtered.
When using another CSV parser not from the standard library, then we need to either deal with that extra dependency, or conditional import, making the end users making the choice (and installing).

Criteria, based on the above 2 reasons, and other concerns:

uniform Python 2/3 behavior
unicode support
high efficiency
try to avoid conditional import so that I don't need to deal with different behaviors from different CSV parser (Python 2 & 3 CSV module, & that conditionally imported CSV module)
try to make the dependency small and easy to install

Potential choices are:

unicodecsv
fastcsv
numpy csv parser
pandas csv parser

I like the pandas CSV parser since it is well known to be very fast. And I want some of pandas' capability to generate plots from tables. But it needs to be compiled, and alternative CPU architecture might or might not be supported (at least no pre-built binaries).

Add other format supports

such as xlsx, c.f. #37

fixed width tables do not get parsed in docx

Hey there, i am currently trying to use pantable to export tables into a couple of different formats.

When I export this table

\```table
---
alignment: LLL
markdown: true
width: [0.2, 0.5, 0.3]
---
1,2,3
hello,my,name
\```

to latex, I get the following latex code:

\begin{longtable}[]{@{}
  >{\raggedright\arraybackslash}p{(\columnwidth - 4\tabcolsep) * \real{0.2000}}
  >{\raggedright\arraybackslash}p{(\columnwidth - 4\tabcolsep) * \real{0.5000}}
  >{\raggedright\arraybackslash}p{(\columnwidth - 4\tabcolsep) * \real{0.3000}}@{}}
\toprule
\begin{minipage}[b]{\linewidth}\raggedright
1
\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright
2
\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright
3
\end{minipage} \\
\midrule
\endhead
hello & my & name \\
\bottomrule
\end{longtable}

When I try to use pandoc to convert this to a docx file, the table does not get parsed (and the header is cut off).

This definitely has to do with the width: [0.2, 0.5, 0.3]. Is this an issue with pantable or with pandoc?

image link in a table cell makes error at pdf output

OSX Sierra(other OSes are not tested yet)
pandoc 1.19.2.1
- -t latex -o pdf.pdf option
pantable 0.10.4/0.10.5

Description

When adding some image(s) in a table cell, no matter external file or code block, pandoc raises
error and fails to output pdf.

$ pandoc markdown.md -t latex --filter=pantable -o pdf.pdf
! Misplaced \noalign.
\caption ->\noalign
                    \bgroup \@ifnextchar [{\egroup \LT@c@ption \@firstofone ...
l.132 \caption
pandoc: Error producing PDF

Tested if latex output differs by filtering: pantable generated source is not identical to inline table

$ pandoc markdown.md -t latex --latex-engine=xelatex --filter=pantable -o latex.tex
$ pandoc markdown.md -t latex --latex-engine=xelatex -o latex2.tex

$ diff latex.tex latex2.tex
...
> \begin{longtable}[]{@{}ll@{}}
> \caption{CAPTION}\tabularnewline
> \toprule
> \begin{minipage}[b]{0.24\columnwidth}\raggedright\strut
> image\strut
> \end{minipage} & \begin{minipage}[b]{0.71\columnwidth}\raggedright\strut
> \begin{figure}
> \centering
> \includegraphics{./png.png}
> \caption{caption}
> \end{figure}
> \strut
> \end{minipage}\tabularnewline
...

From the diff the filtered latex contains \begin{figure}...\end{figure} block while inline does not.
Can you confirm if this is a bug in pantable/panflute or pandoc itself?

Source files

`png.png`

`csv.csv`

image,![caption](./png.png)
text,this is a text
,foo
bar,

`markdown.md`

# include image by standard way

![caption](./png.png)

# include image in inline grid table

Table: CAPTION

+-------+-----------------------+
| image | ![caption](./png.png) |
+=======+=======================+
| text  | this is a text        |
+-------+-----------------------+
|       | foo                   |
+-------+-----------------------+
| bar   |                       |
+-------+-----------------------+

# include image in grid table generated by pantable

```table
---
# table-width:
header: True
markdown: True
caption: CAPTION
include: csv.csv
---
```

# source files

- `markdown.md`
- `csv.csv`
- `png.png`

# commands

1. ```pandoc markdown.md -t html                         --filter=pantable -o html.html```
1. ```pandoc markdown.md -t latex --latex-engine=xelatex --filter=pantable -o latex.tex```
1. ```pandoc markdown.md -t latex --latex-engine=xelatex                   -o latex2.tex```
1. ```pandoc markdown.md -t latex --latex-engine=xelatex --filter=pantable -o pdf.pdf```
1. ```pandoc markdown.md -t latex --latex-engine=xelatex                   -o pdf.pdf```

Correct handling of Unicode in py2

I'm not a py3 coder, so this is regarding py2.

When trying to include a .csv file which is utf-8 encoded, the filter fails miserably as panflute expects unicode data (it calls text.encode('utf-8') on line 338 of panflute/tools.py)

You are using the built in csv module, which clearly states in the docs, that "The csv module doesn’t directly support reading and writing Unicode [...]". Aka, you should convert the read data to unicode before returning it as raw_table_list in read_data() at line 197 of pantable.py.

Or perhaps more preferrable, just use the unicodecsv module, which claims to be a "drop-in replacement for Python 2’s csv module which supports unicode strings without a hassle."

I tested unicodecsv in a local checkout, and it seems to work flawlessly. I just added import unicodecsv as csv, not caring about anything than py2 :)

Test dependencies appear to be bogus

Most of the python packages it test/requirements.txt do not seem to be used anywhere.

(Noticed while trying to fix up Arch Linux packaging.)

Pantable filters broken after Pandoc upgrade

I recently upgraded a number of production machines from Pandoc 2.9.2 to 2.10.1. I thought all was well, but it turns out all my jobs that include Pantable at any point are failing. Here is a MWE that illustrates my usage:

# head

```table
---
alignment: RL
width: [0.5, 0.5]
---
"foo","bar"
```

$ pandoc -F pantable -t markdwon+pipe_tables-multiline-tables-grid_tables-raw_html < input.md
Error running filter pantable:
Error in $.blocks[1].c: cannot unpack array of length 5 into a tuple of length 6

Although I should note that this can be simplified further: just outputting to -t html throws the same error, and the YAML table properties don't seem to matter.

Add compatibility with Pandoc 2.18

It would be nice if you could make this work with Pandoc 2.18 (see releases).

Currently, at least conda / mamba complain:

Encountered problems while solving:
  - package pantable-0.14.2-pyhd8ed1ab_0 has constraint pandoc >=2.11.2,<2.17 conflicting with pandoc-2.18-h694c41f_0

Thanks for your effort!

Unable to install

When I attempt to run this after a fresh install, I get the following error:

~ $ pandoc -F pantable -o test simple.csv /Library/Python/2.7/site-packages/panflute/autofilter.py:163: Warning: Click detected the use of the unicode_literals __future__ import. This is heavily discouraged because it can introduce subtle bugs in your code. You should instead use explicit u"" literals for your unicode strings. For more information see https://click.palletsprojects.com/python3/

Am I doing something wrong?

Relicensing to BSD 3-clause license

Hi, @reenberg, @alerque, @gepcel,

I plan to relicense this repository to BSD 3-clause license similar to the one panflute use, the main dependency of this project. pandocfilters by jgm uses the same license, which inspires panflute.

Note that pandoc itself is GPLv2+, so for the whole stack to work it would still be GPLv2+. The only thing relaxed here would be from GPLv3 to GPLv2+.

You have contributed to this repository. So I'm asking you explicit permissions to do so.

P.S. I am not a lawyer. Feel free to inform me better.

Filter out single and double quotes from CSV

Sometimes, it is unavoidable that CSV files contain single or double quotes for string values.

Pantable seems to include them in the resulting tables, which does not look very nice:

prefix, city_or_region, comments, Status
'030', 'Berlin', 'My comment', True
'069', 'Frankfurt', , False
'089', 'Munich', 'Another comment', True

It would be nice to be able to tell Pantable to remove single and double quotes from field values.

Improve autowidth on Unicode character

Currently, autowidth only calculate width by no. of characters.

I recall @jgm does something smart in pandoc to account for the width of unicode character. e.g. Chinese character is considered twice as long as an ASCII.

Considerations:

what if its other unicode character?
do we need to match pandoc's behavior exactly? Note that this should not affect idempotency (because once the numerical value of width are set, it's done).
are there any better width prediction algorithm?
what does this mean to performance?

Filtering Subcells of CSV

See #16

I've seen such feature request for other pandoc filters dealing with csv before, but I didn't give enough time to think about a good syntax for it (and since I personally don't need this feature yet, I didn't give it a priority). And from the example given, the syntax is not very intuitive.

Although pantable is a 3rd party pandoc filter rather than native pandoc syntax (because @jgm doesn't think CSV format is markdown-ish enough (i.e. can be read as plain text), I still want it to be as markdown-ish as possible.

incorrect path when --extract-media is used in conjunction with

When --extract-media is used in conjunction with this filter, the resulting path with point to only media/... ignoring the specified parent directory.

A semi-reproducible example would be to create a docx file with an image in a table (TODO: create an example file here), and then run

# makefile syntax
%.md: %.docx
    pandoc -s -o $@ $< --extract-media=$(@D)/$* --id-prefix=$* -F pantable2csv

P.S. It might as well be an upstream bug.

support pandoc 2.15–16

Requires panflute 2.1.3+ to support pandoc 2.16. See sergiocorreia/panflute#201

update pyproject.toml to require panflute = "^2.1.3"
bump to 0.14.1
update conda-forge

TravisCI fails for Py3.3

With the introduced dependency on setuptools>=20.6.8 the travis build for python 3.3 fails, because the travis python 3.3 virtual environment doesn't contain a new enough setuptools:

[...]
    AssertionError: Setuptools version 20.6.8 or heigher is required.  Updated it using `pip install -U setuptools`.
[...]

Clarify usage of parameters when using CSV from file

It seems pantable understands parameters like caption, alignment, etc. in combination with CSV from a file only if they are in the markup or in the file, not in the element meta-data.

As putting them into the file will make the file unusable for most other tools, this is a bit unfortunate.

Approach 3, i.e. putting the meta-data including the include parameter into the code block is actually fine, but it should be documented more clearly if possible. Thanks!

This works: Approach 1

```{.table include="data/csv_example_with_header.csv"}
```

File: csv_example_with_header.csv

---
caption: '*Awesome* **Markdown** Table from `pantable`'
alignment: RC
table-width: 2/3
markdown: True
---
prefix, city_or_region, comments, Status
'030', 'Berlin', 'My comment', True
'069', 'Frankfurt', , False
'089', 'Munich', 'Another comment', True

This does not work (caption etc. missing): Approach 2

```{.table include="data/csv_example.csv" alignment="RC" caption="A cool CSV Table"}
```

This also works: Approach 3

```table
---
caption: '*Awesome* **Markdown** Table from `pantable`'
alignment: RC
table-width: 2/3
markdown: True
include: "data/csv_example.csv"
---
```

File: csv_example.csv

prefix, city_or_region, comments, Status
'030', 'Berlin', 'My comment', True
'069', 'Frankfurt', , False
'089', 'Munich', 'Another comment', True

Environment:

pandoc                    2.17.0.1             
pandoc-crossref           0.3.12.2           
pantable                  0.14.2

Support pandoc 2.14

Currently, test is failing. I guess the output is slightly different.

Need to think about how to target different versions of pandoc where the outputs can be different.

Inserting midrules

Hi, i'm touching up my thesis and could use some midrules in my tables.
I found that simply inserting them in standard markdown tables works fine, but not with pantable

example:

|1 | 1|1 |
|--|--|--|
|1 | 1|1 |
|1 | 1|1 |
|\midrule 1 | 1|1 |
|1 | 1|1 |


```table
---
markdown: false
---
1,1,1,1,1
1,1,1,1,1
1,1,1,1,1
\midrule 1,1,1,1,1
1,1,1,1,1
1,1,1,1,1
```

```table
---
markdown: true
---
1,1,1,1,1
1,1,1,1,1
1,1,1,1,1
\midrule 1,1,1,1,1
1,1,1,1,1
1,1,1,1,1
```

command:

pandoc test.md -F pantable -t latex

output:

\begin{longtable}[]{@{}lll@{}}
\toprule
1 & 1 & 1 \\
\midrule
\endhead
1 & 1 & 1 \\
1 & 1 & 1 \\
\midrule 1 & 1 & 1 \\
1 & 1 & 1 \\
\bottomrule
\end{longtable}

\begin{longtable}[]{@{}lllll@{}}
\toprule
1 & 1 & 1 & 1 & 1 \\
\midrule
\endhead
1 & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 1 \\
\textbackslash midrule 1 & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 1 \\
\bottomrule
\end{longtable}

\begin{longtable}[]{@{}
  >{\raggedright\arraybackslash}p{(\columnwidth - 8\tabcolsep) * \real{0.20}}
  >{\raggedright\arraybackslash}p{(\columnwidth - 8\tabcolsep) * \real{0.20}}
  >{\raggedright\arraybackslash}p{(\columnwidth - 8\tabcolsep) * \real{0.20}}
  >{\raggedright\arraybackslash}p{(\columnwidth - 8\tabcolsep) * \real{0.20}}
  >{\raggedright\arraybackslash}p{(\columnwidth - 8\tabcolsep) * \real{0.20}}@{}}
\toprule
1 & 1 & 1 & 1 & 1 \\
\midrule
\endhead
1 & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 1 \\
\begin{minipage}[t]{\linewidth}\raggedright
\midrule 1
\end{minipage} & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 1 \\
\bottomrule
\end{longtable}

My use-case is markdown: true. The minipage block is likely added here.
Is it possible to add a fix or workaround for this?

setup.py: problem with conditionally installing backport.csv; upgrade setuptools

Dump of the error for further investigation.

Note that while installing in a conda, python3 env., backport.csv is still installed.

And note the final error about setuptools

Collecting panflute (from -r common/pip.txt (line 1))
  Using cached panflute-1.10.5-py3-none-any.whl
Collecting pantable (from -r common/pip.txt (line 2))
  Downloading pantable-0.11-py3-none-any.whl
Collecting yaml2cli (from -r common/pip.txt (line 3))
  Using cached yaml2cli-0.5.1-py2.py3-none-any.whl
Collecting quaternionarray (from -r common/pip.txt (line 4))
Collecting pykg-config (from -r common/pip.txt (line 5))
Collecting future (from panflute->-r common/pip.txt (line 1))
Requirement already up-to-date: pyyaml in /usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages (from panflute->-r common/pip.txt (line 1))
Collecting shutilwhich (from panflute->-r common/pip.txt (line 1))
Collecting setuptools>=20.6.8 (from pantable->-r common/pip.txt (line 2))
  Downloading setuptools-36.2.7-py2.py3-none-any.whl (477kB)
    100% |████████████████████████████████| 481kB 1.5MB/s 
Collecting backports.csv (from pantable->-r common/pip.txt (line 2))
  Downloading backports.csv-1.0.5-py2.py3-none-any.whl
Collecting yamlordereddictloader (from yaml2cli->-r common/pip.txt (line 3))
  Using cached yamlordereddictloader-0.4.0.tar.gz
Building wheels for collected packages: yamlordereddictloader
  Running setup.py bdist_wheel for yamlordereddictloader ... done
  Stored in directory: ~/Library/Caches/pip/wheels/92/30/01/9a9fc94901b1de7c87e1779db660b84b37e2c411852ab172bd
Successfully built yamlordereddictloader
Installing collected packages: future, shutilwhich, panflute, setuptools, backports.csv, pantable, yamlordereddictloader, yaml2cli, quaternionarray, pykg-config
  Found existing installation: setuptools 27.2.0
    Uninstalling setuptools-27.2.0:
      Successfully uninstalled setuptools-27.2.0
Successfully installed backports.csv-1.0.5 future-0.16.0 panflute-1.10.5 pantable-0.11 pykg-config-1.3.0 quaternionarray-0.6.2 setuptools-36.2.7 shutilwhich-1.1.0 yaml2cli-0.5.1 yamlordereddictloader-0.4.0
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/all3-defaults/bin/pip", line 6, in <module>
    sys.exit(pip.main())
  File "/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/pip/__init__.py", line 249, in main
    return command.main(cmd_args)
  File "/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/pip/basecommand.py", line 252, in main
    pip_version_check(session)
  File "/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/pip/utils/outdated.py", line 102, in pip_version_check
    installed_version = get_installed_version("pip")
  File "/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/pip/utils/__init__.py", line 838, in get_installed_version
    working_set = pkg_resources.WorkingSet()
  File "/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 644, in __init__
    self.add_entry(entry)
  File "/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 700, in add_entry
    for dist in find_distributions(entry, True):
  File "/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 1949, in find_eggs_in_zip
    if metadata.has_metadata('PKG-INFO'):
  File "/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 1463, in has_metadata
    return self.egg_info and self._has(self._fn(self.egg_info, name))
  File "/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 1823, in _has
    return zip_path in self.zipinfo or zip_path in self._index()
  File "/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 1703, in zipinfo
    return self._zip_manifests.load(self.loader.archive)
  File "/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/pip/_vendor/pkg_resources/__init__.py", line 1643, in load
    mtime = os.stat(path).st_mtime
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/anaconda3/envs/all3-defaults/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg'

Refactoring

Currently, pantable and pantable2csv are provided. But the naming is not ideal (what if more formats will be added, as in #8?).

One option is to let pantable accept args, but it is not easy: https://groups.google.com/d/msg/pandoc-discuss/LIAfgkZKUiE/HWr_1k13EgAJ

Another problem is, if I want it to be a "simple filter" in the pandocpm sense (pandoc-extras/pandocpm#7), then it has to be a single self-containing filter like pantable and pantable2csv currently are.

It also depends on #8, if other formats are supported such that an extra dependency is needed, then there's no point for it to be all in 1 file, since it is not self-contained anyway.

On one hand, I want to wait to see if filter arg will be supported in pandoc first, on the other hand, the later I make such change, the higher the backward-ompatibility problem will becomes.

Add other formats?

Currently, pantable support to and from CSV.

Potentially, other table formats could be supported:

.xlsx: this one will be useful but difficult:
- Ideally, one would want to read/write .xlsx just like how pandoc read/write .docx. But there's seems no good cli to convert between .docx and .xlsx (to pass the .docx to pandoc).
- So one might allow "markdown syntax" in .xlsx, which might works just like the current .csv but seems counterintuitive (people expect rich formatting in Excel).
- Or find a tool to convert .xlsx to .html to pass the .html to pandoc (quite lossy though).
- And then this' a question of what if people want to intermix rich text and markdown syntax (e.g. for LaTeX equation).
HTML: if someone find writing the source table in HTML easier, say when it is a big table, but want to output to other formats as well
YAML (table representing general YAML is not ideal, I'm thinking more like 2 column tables to show the key-value pairs)
JSON (similar to above)

Travis CI not working

Travis CI won't even start building. (Related to .org to .com migration? But it shouldn't be mandatory.) One moment it was still running and suddenly it stops.

Considering moving to Circle CI. Travis CI has been giving me problems lately.

Currently the package is tested locally using Python 3.6 and pandoc 2.7.2.

Allowing caller to specify the csv dialect

Thanks a lot for this good development of pantable. Currently, it supports the default format of csv (essentially commas as separator).

It would be good if it also allowed different csv formats (e.g. semicolons as separator).

I would like to define something according to those lines in pantable.py:

dialect = csv.excel
dialect.delimiter = ';'

def read_data(include, data, delimiter, dialect):
     ....
    raw_table_list = list(csv.reader(file, delimiter=delimiter, dialect=dialect))

As a suggestion, there could be two complementary approaches:

command-line: allow the caller to specify dialect and delimiters from directly from bash, as environment variables, e.g. export PANTABLE_DELIMITER=';' ; export PANTABLE_DIALECT=excel (my preferred one, since "Excel with ;" is a very workable default)
as parameters: specify the delimiter and dialect in the yaml header of the markdown file.

What would you think about this?

Doesn't like non-ascii characters

Hi,

I've got a csv file that works fine with csv2table, but for which pantable throws this error:

pantable:
table rows are of irregular length. Empty cells appended.
Traceback (most recent call last):
File "/usr/local/bin/pantable", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/pantable/pantable.py", line 320, in main
strict_yaml=True
File "/usr/local/lib/python2.7/site-packages/panflute/io.py", line 265, in run_filter
return run_filters([action], *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/panflute/io.py", line 246, in run_filters
doc = doc.walk(action, doc)
File "/usr/local/lib/python2.7/site-packages/panflute/base.py", line 274, in walk
ans = list(chain.from_iterable(ans))
File "/usr/local/lib/python2.7/site-packages/panflute/base.py", line 272, in
ans = ((item,) if type(item) != list else item for item in ans)
File "/usr/local/lib/python2.7/site-packages/panflute/base.py", line 269, in
ans = (item.walk(action, doc) for item in obj)
File "/usr/local/lib/python2.7/site-packages/panflute/base.py", line 285, in walk
altered = action(self, doc)
File "/usr/local/lib/python2.7/site-packages/panflute/tools.py", line 164, in yaml_filter
element=element, doc=doc)
File "/usr/local/lib/python2.7/site-packages/pantable/pantable.py", line 279, in convert2table
options), number_of_columns, table_list)
File "/usr/local/lib/python2.7/site-packages/pantable/pantable.py", line 135, in auto_width
) for column_index in range(number_of_columns)]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23: ordinal not in range(128)
pandoc: Error running filter pantable
Filter returned error status 1

any idea how to fix it?

Cheers,
Lyndon

Problems with encoding for column width calculation

The following table could not be compiled:

---
caption: 'Áreas ...'
alignment: LLL
table-width: 2/3
markdown: True
---
Código,Áreas,N.Horas
DS, Discrete Structures, 43

Solution: force UTF8 encoding. Add in line 60 of "pantable.py" the following lines:
reload(sys)
sys.setdefaultencoding('utf8')

Change default alignment?

Hi there!

First off: great work with this filter. I love it!

I was wondering if there is any way to change the default alignment of the columns. I have many columns and I would like all of them to be right-aligned. By default, all of them are left-aligned. I could just write a lot of rs into the yaml-header but I thought there must be a more elegant way. Can anyone help?

Supporting pandoc 2.11

See sergiocorreia/panflute#142

Edit:

Warning: after studying the new AST, the decision is to completely rewrite pantable. I'd advice against PR for now, as the current code base is going to be obsolete soon.

Links on new table AST

PR
- Feature request started here: jgm/pandoc#1024, jgm/pandoc-types#65
- this is where the actual change of the AST happened: https://github.com/jgm/pandoc-types/pull/66/files?file-filters%5B%5D=.hs#diff-01f4ffe52cf097ab9ff89eebce394c86
- pandoc 2.10 release note: https://github.com/jgm/pandoc/releases/tag/2.10
- pandoc 2.11 release note (see HTML writer): https://github.com/jgm/pandoc/releases/tag/2.11
- pandoc-types 1.21 changelog: https://github.com/jgm/pandoc-types/blob/master/changelog
- panflute 2.0: sergiocorreia/panflute#156, sergiocorreia/panflute@243af31
def
- https://hackage.haskell.org/package/pandoc-types-1.22/docs/Text-Pandoc-Definition.html
- source: https://github.com/jgm/pandoc-types/blob/master/src/Text/Pandoc/Arbitrary.hs
- doc on lua filters has some info on this: https://pandoc.org/lua-filters.html
- See comments at https://github.com/jgm/pandoc-types/pull/66#issuecomment-611053332 and after on normalizing tables
example ASTs:
- https://github.com/jgm/pandoc-types/blob/master/test/test-pandoc-types.hs
- https://github.com/jgm/pandoc/tree/master/test/tables

pandoc reader/writer support status

2.10: add new table AST

2.10.1: add LaTeX reader support: rowspan, colspan

2.11:

DocBook reader: column span
HTML writer
OpenDocument writer

ickc / pantable Goto Github PK

pantable's Introduction

Pantable—A Python library for writing pandoc filters for tables with batteries included.

Introduction

A word on support

Installation

Pip

Conda

Note on versions

Supported Python versions

Supported pandoc versions

Pantable as pandoc filters

pantable

Example

Usage

Syntax

pantable2csv

pantable2csvx

Pantable as a library

Development

Related Filters

pantable's People

Contributors

Stargazers

Watchers

Forkers

pantable's Issues

Description

Source files

png.png

csv.csv

markdown.md

Links on new table AST

pandoc reader/writer support status

Recommend Projects

Recommend Topics

Recommend Org

`pantable`

`pantable2csv`

`pantable2csvx`

`png.png`

`csv.csv`

`markdown.md`