aaren / pandoc-reference-filter Goto Github PK
View Code? Open in Web Editor NEWinternal referencing pandoc filter
internal referencing pandoc filter
\autoref
is useful but magical. It makes the referencing system more complex.
Use \ref
style by default (i.e. see figure #fig:a
-> see figure 1
, rather than see #fig:a
-> see Figure 1
.
Keep \autoref
available by a metadata switch.
Use case: writing a thesis with multiple chapters.
When creating a pdf, latex takes care of all internal referencing.
Multiple source documents are concatenated into a single continuous
markdown and then injected into the thesis latex template. When latex
comes to manage the internal references it is dealing with a single
(long) file.
Html output presents a problem: either we concatenate all of the
documents into a single html page or we have internal referencing
confined to each page. This is not what we want: it is preferable to
have shorter pages on the web, covering a single distinct topic.
What is needed is a mechanism to allow cross referencing with multiple
source documents. I am going to present two possible solutions here:
The new version of pandoc
(1.16) adds in attributes to images and links. This breaks pandoc-reference-filter
(as well as pandocfilters
).
I've fixed things in my fork of pandoc-reference-filter, though my fix presupposes all the other changes I've made to pandoc-reference-filter (and proposed to you via pull requests), as well as the obvious fix to pandocfilters
(updating it so that Image
and Link
take 3 rather than 2 arguments, with the extra argument being the attributes). It is also worth noting that these changes require pandoc-1.16
.
I'm not sure where you stand on those pull requests, but if you didn't want them, you could still look at how I fix things for pandoc-1.16
at bwhelm@2a95814; those changes would be relatively easy to adapt to the current master branch.
Currently pandoc-reference-filter strips all formatting from captions. This is undesirable, but fixing it requires some significant changes to figure_replacement()
, treating LaTeX, html, and markdown all differently.
I have a somewhat hackish proof-of-concept fix for this, which is built on top of (but mostly replaces) my work on adding support for short captions (in a pending pull request), though it should be relatively easy to remove the short caption stuff from this proof-of-concept if that were desirable. I've pasted the patch text below. (If you want, I can create a pull request, but I'm not sure this is ready. It's in the Format-Captions branch of my fork if you want to look there.)
Currently .docx captions aren't showing up at all (though the image is), for reasons I can't figure out. Any help with that would be appreciated. Also, it's not well tested, so expect bugs.
The question is: is this wanted? If so, is this the right approach, or does anyone have better ideas?
internalreferences.py | 165 ++++++++++++++++++++++++++++----------------------
1 file changed, 94 insertions(+), 71 deletions(-)
diff --git a/internalreferences.py b/internalreferences.py
index 5e85073..63f3a3f 100755
--- a/internalreferences.py
+++ b/internalreferences.py
@@ -1,6 +1,7 @@
#!/usr/bin/env python
import re
from collections import OrderedDict
+from subprocess import Popen, PIPE
import pandocfilters as pf
@@ -127,6 +128,76 @@ def create_figures(key, value, format, metadata):
else:
return None
+def toFormat(string, format):
+ # Process string through pandoc to get formatted string. Is there a better way?
+ p1 = Popen(['echo'] + string.split(), stdout=PIPE)
+ p2 = Popen(['pandoc', '-t', format], stdin=p1.stdout, stdout=PIPE)
+ p1.stdout.close()
+ return p2.communicate()[0].strip('\n')
+
+def latex_figure(attr, filename, caption, alt):
+ beginText = (u'\n'
+ '\\begin{{figure}}[htbp]\n'
+ '\\centering\n'
+ '\\includegraphics{{{filename}}}\n'.format(
+ filename=filename
+ ).encode('utf-8'))
+ endText = (u'}}\n'
+ '\\label{{{attr.id}}}\n'
+ '\\end{{figure}}\n'.format(attr=attr))
+ star = False
+ if 'unnumbered' in attr.classes:
+ beginText += '\\caption*{'
+ star = True
+ if alt and not star:
+ shortCaption = toFormat(alt, 'latex')
+ beginText += '\\caption['
+ latexFigure = [RawInline('latex', beginText)]
+ latexFigure += [RawInline('latex', shortCaption + ']{')]
+
+ else: # No short caption
+ if star: beginText += '\\caption*{'
+ else: beginText += '\\caption{'
+ latexFigure = [RawInline('latex', beginText + '{')]
+
+ latexFigure += caption
+ latexFigure += [RawInline('latex', endText)]
+ return pf.Para(latexFigure)
+
+def html_figure(attr, filename, fcaption, alt):
+ beginText = (u'\n'
+ '<div {attr.html}>\n'
+ '<img src="{filename}" alt="{alt}" />\n'
+ '<p class="caption">').format(attr=attr,
+ filename=filename,
+ alt=alt)
+ endText = (u'</p>\n'
+ '</div>\n')
+ htmlFigure = [RawInline('html', beginText)]
+ htmlFigure += fcaption
+ htmlFigure += [RawInline('html', endText)]
+ return pf.Plain(htmlFigure)
+
+def html5_figure(attr, filename, fcaption, alt):
+ beginText = (u'\n'
+ '<figure {attr.html}>\n'
+ '<img src="{filename}" alt="{alt}" />\n'
+ '<figcaption>').format(attr=attr,
+ filename=filename,
+ alt=alt)
+ endText = u'</figcaption>\n</figure>\n'
+ htmlFigure = [RawInline('html5', beginText)]
+ htmlFigure += fcaption
+ htmlFigure += [RawInline('html5', endText)]
+ return pf.Plain(htmlFigure)
+
+def markdown_figure(attr, filename, fcaption, alt):
+ beginText = u'<div {attr.html}>'.format(attr=attr)
+ endText = u'</div>'
+ markdownFigure = [pf.Para([pf.RawInline('html', beginText)])]
+ markdownFigure += [pf.Para([pf.Image(fcaption, (filename,alt))])]
+ markdownFigure += [pf.Para([pf.RawInline('html', endText)])]
+ return markdownFigure
class ReferenceManager(object):
"""Internal reference manager.
@@ -139,32 +210,6 @@ class ReferenceManager(object):
text of any given internal reference (no need for e.g. 'fig:' at
the start of labels).
"""
- figure_styles = {
- 'latex': (u'\n'
- '\\begin{{figure}}[htbp]\n'
- '\\centering\n'
- '\\includegraphics{{{filename}}}\n'
- '\\caption{star}{{{caption}}}\n'
- '\\label{{{attr.id}}}\n'
- '\\end{{figure}}\n'),
-
- 'html': (u'\n'
- '<div {attr.html}>\n'
- '<img src="{filename}" alt="{alt}" />'
- '<p class="caption">{fcaption}</p>\n'
- '</div>\n'),
-
- 'html5': (u'\n'
- '<figure {attr.html}>\n'
- '<img src="{filename}" alt="{alt}" />\n'
- '<figcaption>{fcaption}</figcaption>\n'
- '</figure>\n'),
-
- 'markdown': (u'\n'
- '<div {attr.html}>\n'
- '![{fcaption}]({filename})\n'
- '\n'
- '</div>\n')}
latex_multi_autolink = u'\\cref{{{labels}}}{post}'
@@ -243,7 +288,7 @@ class ReferenceManager(object):
"""If the key, value represents a figure, append reference
data to internal state.
"""
- _caption, (filename, target), (id, classes, kvs) = value
+ _caption, (filename, alt), (id, classes, kvs) = value
if 'unnumbered' in classes:
return
else:
@@ -278,7 +323,7 @@ class ReferenceManager(object):
self.references[label] = {'type': 'math',
'id': self.equation_count,
'label': label}
-
+
def figure_replacement(self, key, value, format, metadata):
"""Replace figures with appropriate representation.
@@ -288,58 +333,36 @@ class ReferenceManager(object):
The other way of doing it would be to pull out a '\label{(.*)}'
from the caption of an Image and use that to update the references.
"""
- _caption, (filename, target), attrs = value
-# caption = pf.stringify(_caption)
- caption = _caption
+ caption, (filename, alt), attrs = value
+ if format == 'latex': alt = toFormat(str(alt), format) # Preserve formatting
+# else: alt = pf.stringify(alt)
attr = PandocAttributes(attrs)
if 'unnumbered' in attr.classes:
- star = '*'
fcaption = caption
else:
ref = self.references[attr.id]
star = ''
if caption:
- fcaption = u'Figure {n}: {caption}'.format(n=ref['id'],
- caption=caption)
+ fcaption = [pf.Str(u'Figure {n}: '.format(n=ref['id']))] + caption
else:
- fcaption = u'Figure {n}'.format(n=ref['id'])
+ fcaption = [pf.Str(u'Figure {n}'.format(n=ref['id']))]
if 'figure' not in attr.classes:
attr.classes.insert(0, 'figure')
-
- if format in self.formats:
-# figure = self.figure_styles[format].format(attr=attr,
-# filename=filename,
-# alt=fcaption,
-# fcaption=fcaption,
-# caption=caption,
-# star=star).encode('utf-8')
-
-# return RawBlock(format, figure)
- beginText = (u'\n'
- '\\begin{{figure}}[htbp]\n'
- '\\centering\n'
- '\\includegraphics{{{filename}}}\n'
- '\\caption{star}{{'.format(filename=filename,
- star=star).encode('utf-8'))
- endText = (u'}}\n'
- '\\label{{{attr.id}}}\n'
- '\\end{{figure}}\n'.format(attr=attr))
- begin = RawBlock('latex', beginText)
- end = RawBlock('latex', endText)
- all = [begin, pf.Str('hello'), end]
- return [begin] + [pf.Plain(caption)] + [end]
- # Convert from: {"t":"Figure", "c":[[{"t":"Str","c":"CAPTION"}],["FIGURE.JPG","TITLE"],"{#REFERENCE}"]}
- # to: {"t": "RawBlock", "c": }
-
+
+ if format == 'latex': return latex_figure(attr, filename, caption, alt)
+ elif format == 'html': return html_figure(attr, filename, fcaption, alt)
+ elif format == 'html5': return html5_figure(attr, filename, fcaption, alt)
+ elif format == 'markdown': return markdown_figure(attr, filename, fcaption, alt)
else:
- alt = [pf.Str(fcaption)]
- target = (filename, '')
- image = pf.Image(alt, target)
- figure = pf.Para([image])
- return pf.Div(attr.to_pandoc(), [figure])
+# # FIXME: docx export fails to include the caption!
+# fcaption = pf.stringify(fcaption)
+# fcaption = [pf.Str(str(caption))]
+ image = pf.Image(fcaption, [filename, ''])
+ return pf.Plain([image])
+# return pf.Para([image])
def section_replacement(self, key, value, format, metadata):
"""Replace sections with appropriate representation.
@@ -406,8 +429,9 @@ class ReferenceManager(object):
else:
citation = citations[0]
- prefix = citation['citationPrefix']
+ prefix = citation['citationPrefix'] + [pf.Space()]
suffix = citation['citationSuffix']
+
label = citation['citationId']
@@ -426,10 +450,9 @@ class ReferenceManager(object):
link = pf.RawInline('latex', '\\ref{{{label}}}'.format(label=label))
return prefix + [link] + suffix
- else: # FIXME! -- This must be the HTML case.
- link_text = '{}{}{}'.format(prefix, text, suffix)
- link = pf.Link([pf.Str(link_text)], ('#' + label, ''))
- return link
+ else:
+ link = pf.Link([pf.Str(text)], ('#' + label, ''))
+ return prefix + [link] + suffix
def convert_multiref(self, key, value, format, metadata):
"""Convert all internal links from '#blah' into format
--
2.2.1
… would be awesome! :)
Thanks!
Bela
Hi,
when I try to convert documents with e.g. pandoc spec.md --filter internal-references --to latex
I get the following:
Traceback (most recent call last):
File "/usr/local/bin/internal-references", line 9, in
load_entry_point('pandoc-internal-references==0.5.1', 'console_scripts', 'internal-references')()
File "build/bdist.macosx-10.12-intel/egg/internalreferences.py", line 527, in main
KeyError: 0
pandoc: Error running filter internal-references
Filter returned error status 1
Can this issue be resolved?
I have git clone
'd the project onto my computer, run setup.py install
which seemed to install all required prerequisites. But I get the above error. Specifically form the line:
from identity import *
Clearly I don't have a module called identity, Googling for it does help either. Where can I find this identity module?
The function replace_references() throws an error if the figure captions contain any non-ascii characters. This seems to be the case even with quotation marks, dashes and the like, which are converted by the --smart
option in pandoc.
File "build/bdist.linux-i686/egg/internalreferences/internalreferences.py", line 236, in figure_replacement
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 7: ordinal not in range(128)
pandoc: Error running filter internal-references
I’m trying to figure out how to add figure reference numbers to my Pandoc manuscript and came across your pandoc-reference-filter, but I’m not sure it does exactly what I want.
Below is the current workflow I’m using, and then the workflow I think is the correct way.
Current workflow:
Images should appear inline and should be numbered (as latex normally does),
e.g. the text should say “see Fig. 1” and then Figure 1 should be numbered “1” along with its caption, and so on.
Problem is that the references are not numbered when I use this workflow. From reading the pandoc forum I figured out that pandoc can’t do this directly… but I’m not exactly sure how to use your filter, since it outputs to latex or markdown but not rtf.
So would this be the suggested workflow?:
(1)
Look at #fig:thing.
![some caption](image.png){#fig:thing}
(2)
Look at [Figure 1](#fig:thing).
<div id='#fig:thing' class='figure'>
![Figure 1: some caption](image.png)
</div>
If I manually add markdown (2) to the markdown file and convert to RTF, it seems to work as expected, so I think this is the way to go. Is this what you would suggest?
One other question: I use Windows and don’t currently have python installed. Is it correct that I only need to install python AND the pandocfilters 1.2.3?
Thanks!
Pandoc might end up using @
, see jgm/pandoc#813.
This would also be easier to process, as it is the same as the citation syntax.
This would be a breaking change but I could provide a filter to convert from #
to @
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.