aaren / pandoc-reference-filter Goto Github PK

View Code? Open in Web Editor NEW

41.0 41.0 9.0 431 KB

internal referencing pandoc filter

Python 90.55% HTML 8.25% Shell 1.20%

pandoc-reference-filter's People

Contributors

Stargazers

Watchers

Forkers

ivotron xguse bwhelm ericthrift natelehman

pandoc-reference-filter's Issues

use \ref instead of \autoref

\autoref is useful but magical. It makes the referencing system more complex.

Use \ref style by default (i.e. see figure #fig:a -> see figure 1, rather than see #fig:a -> see Figure 1.

Keep \autoref available by a metadata switch.

multi document cross references

Use case: writing a thesis with multiple chapters.

When creating a pdf, latex takes care of all internal referencing.
Multiple source documents are concatenated into a single continuous
markdown and then injected into the thesis latex template. When latex
comes to manage the internal references it is dealing with a single
(long) file.

Html output presents a problem: either we concatenate all of the
documents into a single html page or we have internal referencing
confined to each page. This is not what we want: it is preferable to
have shorter pages on the web, covering a single distinct topic.

What is needed is a mechanism to allow cross referencing with multiple
source documents. I am going to present two possible solutions here:

Multiple passes with a persistent file containing the references.
A single pass with a directory of input files contained in the
metadata.

Pandoc 1.16 breaks pandoc-reference-filter

The new version of pandoc (1.16) adds in attributes to images and links. This breaks pandoc-reference-filter (as well as pandocfilters).

I've fixed things in my fork of pandoc-reference-filter, though my fix presupposes all the other changes I've made to pandoc-reference-filter (and proposed to you via pull requests), as well as the obvious fix to pandocfilters (updating it so that Image and Link take 3 rather than 2 arguments, with the extra argument being the attributes). It is also worth noting that these changes require pandoc-1.16.

I'm not sure where you stand on those pull requests, but if you didn't want them, you could still look at how I fix things for pandoc-1.16 at bwhelm@2a95814; those changes would be relatively easy to adapt to the current master branch.

your link from pypi to this site is incorrect

Allow for formatting in captions

Currently pandoc-reference-filter strips all formatting from captions. This is undesirable, but fixing it requires some significant changes to figure_replacement(), treating LaTeX, html, and markdown all differently.

I have a somewhat hackish proof-of-concept fix for this, which is built on top of (but mostly replaces) my work on adding support for short captions (in a pending pull request), though it should be relatively easy to remove the short caption stuff from this proof-of-concept if that were desirable. I've pasted the patch text below. (If you want, I can create a pull request, but I'm not sure this is ready. It's in the Format-Captions branch of my fork if you want to look there.)

Currently .docx captions aren't showing up at all (though the image is), for reasons I can't figure out. Any help with that would be appreciated. Also, it's not well tested, so expect bugs.

The question is: is this wanted? If so, is this the right approach, or does anyone have better ideas?

 internalreferences.py | 165 ++++++++++++++++++++++++++++----------------------
 1 file changed, 94 insertions(+), 71 deletions(-)

diff --git a/internalreferences.py b/internalreferences.py
index 5e85073..63f3a3f 100755
--- a/internalreferences.py
+++ b/internalreferences.py
@@ -1,6 +1,7 @@
 #!/usr/bin/env python
 import re
 from collections import OrderedDict
+from subprocess import Popen, PIPE

 import pandocfilters as pf

@@ -127,6 +128,76 @@ def create_figures(key, value, format, metadata):
     else:
         return None

+def toFormat(string, format):
+    # Process string through pandoc to get formatted string. Is there a better way?
+    p1 = Popen(['echo'] + string.split(), stdout=PIPE)
+    p2 = Popen(['pandoc', '-t', format], stdin=p1.stdout, stdout=PIPE)
+    p1.stdout.close()
+    return p2.communicate()[0].strip('\n')
+
+def latex_figure(attr, filename, caption, alt):
+    beginText = (u'\n'
+               '\\begin{{figure}}[htbp]\n'
+               '\\centering\n'
+               '\\includegraphics{{{filename}}}\n'.format(
+                                           filename=filename
+                                           ).encode('utf-8'))
+    endText = (u'}}\n'
+               '\\label{{{attr.id}}}\n'
+               '\\end{{figure}}\n'.format(attr=attr))
+    star = False
+    if 'unnumbered' in attr.classes:
+        beginText += '\\caption*{'
+        star = True
+    if alt and not star:
+        shortCaption = toFormat(alt, 'latex')
+        beginText += '\\caption['
+        latexFigure = [RawInline('latex', beginText)]
+        latexFigure += [RawInline('latex', shortCaption + ']{')] 
+    
+    else: # No short caption
+        if star: beginText += '\\caption*{'
+        else: beginText += '\\caption{'
+        latexFigure = [RawInline('latex', beginText + '{')]
+
+    latexFigure += caption
+    latexFigure += [RawInline('latex', endText)]
+    return pf.Para(latexFigure)
+
+def html_figure(attr, filename, fcaption, alt):
+    beginText = (u'\n'
+                  '<div {attr.html}>\n'
+                  '<img src="{filename}" alt="{alt}" />\n'
+                  '<p class="caption">').format(attr=attr,
+                                                filename=filename,
+                                                alt=alt)
+    endText = (u'</p>\n'
+                '</div>\n')
+    htmlFigure = [RawInline('html', beginText)]
+    htmlFigure += fcaption
+    htmlFigure += [RawInline('html', endText)]
+    return pf.Plain(htmlFigure)
+
+def html5_figure(attr, filename, fcaption, alt):
+    beginText = (u'\n'
+                   '<figure {attr.html}>\n'
+                   '<img src="{filename}" alt="{alt}" />\n'
+                   '<figcaption>').format(attr=attr,
+                                          filename=filename,
+                                          alt=alt)
+    endText = u'</figcaption>\n</figure>\n'
+    htmlFigure = [RawInline('html5', beginText)]
+    htmlFigure += fcaption
+    htmlFigure += [RawInline('html5', endText)]
+    return pf.Plain(htmlFigure)
+
+def markdown_figure(attr, filename, fcaption, alt):
+    beginText = u'<div {attr.html}>'.format(attr=attr)
+    endText = u'</div>'
+    markdownFigure = [pf.Para([pf.RawInline('html', beginText)])]
+    markdownFigure += [pf.Para([pf.Image(fcaption, (filename,alt))])]
+    markdownFigure += [pf.Para([pf.RawInline('html', endText)])]
+    return markdownFigure

 class ReferenceManager(object):
     """Internal reference manager.
@@ -139,32 +210,6 @@ class ReferenceManager(object):
     text of any given internal reference (no need for e.g. 'fig:' at
     the start of labels).
     """
-    figure_styles = {
-        'latex': (u'\n'
-                   '\\begin{{figure}}[htbp]\n'
-                   '\\centering\n'
-                   '\\includegraphics{{{filename}}}\n'
-                   '\\caption{star}{{{caption}}}\n'
-                   '\\label{{{attr.id}}}\n'
-                   '\\end{{figure}}\n'),
-
-        'html': (u'\n'
-                  '<div {attr.html}>\n'
-                  '<img src="{filename}" alt="{alt}" />'
-                  '<p class="caption">{fcaption}</p>\n'
-                  '</div>\n'),
-
-        'html5': (u'\n'
-                   '<figure {attr.html}>\n'
-                   '<img src="{filename}" alt="{alt}" />\n'
-                   '<figcaption>{fcaption}</figcaption>\n'
-                   '</figure>\n'),
-
-        'markdown': (u'\n'
-                      '<div {attr.html}>\n'
-                      '![{fcaption}]({filename})\n'
-                      '\n'
-                      '</div>\n')}

     latex_multi_autolink = u'\\cref{{{labels}}}{post}'

@@ -243,7 +288,7 @@ class ReferenceManager(object):
         """If the key, value represents a figure, append reference
         data to internal state.
         """
-        _caption, (filename, target), (id, classes, kvs) = value
+        _caption, (filename, alt), (id, classes, kvs) = value
         if 'unnumbered' in classes:
             return
         else:
@@ -278,7 +323,7 @@ class ReferenceManager(object):
         self.references[label] = {'type': 'math',
                                   'id': self.equation_count,
                                   'label': label}
-
+        
     def figure_replacement(self, key, value, format, metadata):
         """Replace figures with appropriate representation.

@@ -288,58 +333,36 @@ class ReferenceManager(object):
         The other way of doing it would be to pull out a '\label{(.*)}'
         from the caption of an Image and use that to update the references.
         """
-        _caption, (filename, target), attrs = value
-#        caption = pf.stringify(_caption)
-        caption = _caption
+        caption, (filename, alt), attrs = value
+        if format == 'latex': alt = toFormat(str(alt), format)  # Preserve formatting
+#        else: alt = pf.stringify(alt)

         attr = PandocAttributes(attrs)

         if 'unnumbered' in attr.classes:
-            star = '*'
             fcaption = caption
         else:
             ref = self.references[attr.id]
             star = ''
             if caption:
-                fcaption = u'Figure {n}: {caption}'.format(n=ref['id'],
-                                                           caption=caption)
+                fcaption = [pf.Str(u'Figure {n}: '.format(n=ref['id']))] + caption
             else:
-                fcaption = u'Figure {n}'.format(n=ref['id'])
+                fcaption = [pf.Str(u'Figure {n}'.format(n=ref['id']))]

         if 'figure' not in attr.classes:
             attr.classes.insert(0, 'figure')
-
-        if format in self.formats:
-#            figure = self.figure_styles[format].format(attr=attr,
-#                                                       filename=filename,
-#                                                       alt=fcaption,
-#                                                       fcaption=fcaption,
-#                                                       caption=caption,
-#                                                       star=star).encode('utf-8')
-
-#            return RawBlock(format, figure)
-            beginText = (u'\n'
-                   '\\begin{{figure}}[htbp]\n'
-                   '\\centering\n'
-                   '\\includegraphics{{{filename}}}\n'
-                   '\\caption{star}{{'.format(filename=filename,
-                                               star=star).encode('utf-8'))
-            endText = (u'}}\n'
-                   '\\label{{{attr.id}}}\n'
-                   '\\end{{figure}}\n'.format(attr=attr))
-            begin = RawBlock('latex', beginText)
-            end = RawBlock('latex', endText)
-            all = [begin, pf.Str('hello'), end]
-            return [begin] + [pf.Plain(caption)] + [end]
-            # Convert from: {"t":"Figure", "c":[[{"t":"Str","c":"CAPTION"}],["FIGURE.JPG","TITLE"],"{#REFERENCE}"]}
-            # to: {"t": "RawBlock", "c": }
-
+        
+        if format == 'latex': return latex_figure(attr, filename, caption, alt)
+        elif format == 'html': return html_figure(attr, filename, fcaption, alt)
+        elif format == 'html5': return html5_figure(attr, filename, fcaption, alt)
+        elif format == 'markdown': return markdown_figure(attr, filename, fcaption, alt)
         else:
-            alt = [pf.Str(fcaption)]
-            target = (filename, '')
-            image = pf.Image(alt, target)
-            figure = pf.Para([image])
-            return pf.Div(attr.to_pandoc(), [figure])
+#            # FIXME: docx export fails to include the caption!
+#            fcaption = pf.stringify(fcaption)
+#            fcaption = [pf.Str(str(caption))]
+            image = pf.Image(fcaption, [filename, ''])
+            return pf.Plain([image])
+#            return pf.Para([image])

     def section_replacement(self, key, value, format, metadata):
         """Replace sections with appropriate representation.
@@ -406,8 +429,9 @@ class ReferenceManager(object):
         else:
             citation = citations[0]

-        prefix = citation['citationPrefix']
+        prefix = citation['citationPrefix'] + [pf.Space()]
         suffix = citation['citationSuffix']
+        

         label = citation['citationId']

@@ -426,10 +450,9 @@ class ReferenceManager(object):
             link = pf.RawInline('latex', '\\ref{{{label}}}'.format(label=label))
             return prefix + [link] + suffix

-        else: # FIXME! -- This must be the HTML case.
-            link_text = '{}{}{}'.format(prefix, text, suffix)
-            link = pf.Link([pf.Str(link_text)], ('#' + label, ''))
-            return link
+        else:
+            link = pf.Link([pf.Str(text)], ('#' + label, ''))
+            return prefix + [link] + suffix

     def convert_multiref(self, key, value, format, metadata):
         """Convert all internal links from '#blah' into format
-- 
2.2.1

DocBook support

… would be awesome! :)

Thanks!
Bela

Not able to convert and apply filter on spec.md

Hi,

when I try to convert documents with e.g. pandoc spec.md --filter internal-references --to latex I get the following:

Traceback (most recent call last):
File "/usr/local/bin/internal-references", line 9, in
load_entry_point('pandoc-internal-references==0.5.1', 'console_scripts', 'internal-references')()
File "build/bdist.macosx-10.12-intel/egg/internalreferences.py", line 527, in main
KeyError: 0
pandoc: Error running filter internal-references
Filter returned error status 1

Can this issue be resolved?

ImportError: No module named identity

I have git clone 'd the project onto my computer, run setup.py install which seemed to install all required prerequisites. But I get the above error. Specifically form the line:

from identity import *

Clearly I don't have a module called identity, Googling for it does help either. Where can I find this identity module?

Fails to parse unicode in captions

The function replace_references() throws an error if the figure captions contain any non-ascii characters. This seems to be the case even with quotation marks, dashes and the like, which are converted by the --smart option in pandoc.

File "build/bdist.linux-i686/egg/internalreferences/internalreferences.py", line 236, in figure_replacement
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 7: ordinal not in range(128)
pandoc: Error running filter internal-references

Using filter to convert from Latex to RTF

I’m trying to figure out how to add figure reference numbers to my Pandoc manuscript and came across your pandoc-reference-filter, but I’m not sure it does exactly what I want.

Below is the current workflow I’m using, and then the workflow I think is the correct way.

Current workflow:

Input file is written in latex
Outfile file should be doc/docx/rtf
Use Pandoc with default settings to convert latex to doc

Images should appear inline and should be numbered (as latex normally does),
e.g. the text should say “see Fig. 1” and then Figure 1 should be numbered “1” along with its caption, and so on.

Problem is that the references are not numbered when I use this workflow. From reading the pandoc forum I figured out that pandoc can’t do this directly… but I’m not exactly sure how to use your filter, since it outputs to latex or markdown but not rtf.

So would this be the suggested workflow?:

Write input file in Latex but include markdown (1) below (which latex should ignore)
-> convert to markdown using your filter, which will generate markdown (2) with the reference numbers
-> convert to doc/rtf

(1)     
Look at #fig:thing.
![some caption](image.png){#fig:thing}

(2)     
Look at [Figure 1](#fig:thing).
<div id='#fig:thing' class='figure'>
![Figure 1: some caption](image.png)
</div>

If I manually add markdown (2) to the markdown file and convert to RTF, it seems to work as expected, so I think this is the way to go. Is this what you would suggest?

One other question: I use Windows and don’t currently have python installed. Is it correct that I only need to install python AND the pandocfilters 1.2.3?

Thanks!

use `@` instead of `#`

Pandoc might end up using @, see jgm/pandoc#813.

This would also be easier to process, as it is the same as the citation syntax.

This would be a breaking change but I could provide a filter to convert from # to @.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.