Comments (15)
a table header needs to be followed by a delimiter row, otherwise it is not a table.
Yes, and that's what this parser expects. The difficulty is that this parser tries to do line-by-line block parsing with no backtracking, so it can't look ahead to the next line. The current approach is to retroactively change the block from a table to a paragraph if that second line isn't encountered. But that won't work for the kind of example you give. The fix, with the current parsing strategy, isn't obvious, unless it's the thing I mention above. Note that the monoidal composition of syntax specs is order-sensitive. So the problem here is that we're checking for a table before checking for a list item.
from commonmark-hs.
I think what's happening is that the table extension is kicking in, because of the |
.
I think the current behavior of this extension is to regress and parse the lines as paragraph content if a table heading line doesn't follow -- obviously, this isn't what is wanted in the current case. In fact, this case may interact badly with the commonmark parser's goal of parsing line by line with no backtracking.
from commonmark-hs.
The relevant part of the source is in commonmark-extensions, src/Commonmark/Extensions/PipeTable.hs, lines 176-193.
from commonmark-hs.
You may be able to work around this by moving the pipeTableSpec
after defaultSyntaxSpec
, i.e.
allTheGfmExtensionsExceptPipeTable <> defaultSyntaxSpec <> pipeTableSpec
from commonmark-hs.
I think what's happening is that the table extension is kicking in, because of the
|
.
I think the current behavior of this extension is to regress and parse the lines as paragraph content if a table heading line doesn't follow
commonmark-hs/commonmark-extensions/src/Commonmark/Extensions/PipeTable.hs
Lines 183 to 195 in 714a1b6
from commonmark-hs.
Judging from the examples given with https://github.github.com/gfm/#table, and its present implementation on github.com, a table header needs to be followed by a delimiter row, otherwise it is not a table.
i|am|not|a|table
|i|am|also|not|a|table|
i | am | a | table |
---|
i | am | also | a | table |
---|
In fact, this case may interact badly with the commonmark parser's goal of parsing line by line with no backtracking.
Apparently, you need two lines to start a table. So yes, noble goal, but reality is harsh... ;-)
from commonmark-hs.
You may be able to work around this by moving the
pipeTableSpec
afterdefaultSyntaxSpec
, i.e.allTheGfmExtensionsExceptPipeTable <> defaultSyntaxSpec <> pipeTableSpec
Thanks for the hint!
I am a bit hesitant to follow this path, because I cannot judge the consequences. How do I know this isn't a whack-a-mole-game? We haven't secured desired behaviors of our markdown parser instance with a testsuite, so I might break something else.
I'd rather wait for a fix of the table parser...
from commonmark-hs.
Ok, I try the workaround then!
from commonmark-hs.
The problem with the suggested workaround (pipeTableSpec
after defaultSyntaxSpec
) is that tables do not parse anymore:
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}
import Commonmark
import Commonmark.Extensions
import Data.Text.Lazy.IO as TLIO
main :: IO ()
main = do
commonmarkWith spec "inline" input >>= \case
Left e -> error (show e)
Right (html :: Html ()) -> TLIO.putStr $ renderHtml html
where
spec = mconcat
[ emojiSpec
, strikethroughSpec
, autolinkSpec
, autoIdentifiersSpec
, taskListSpec
, footnoteSpec
, defaultSyntaxSpec
, pipeTableSpec
]
input = table
table = "|a|table|\n|---|---|"
Gives:
<p>|a|table|
|---|---|</p>
from commonmark-hs.
OK, back to the drawing board.
Here's another case to keep in mind:
- `a|b`
- | -
This does parse -- as a table. Hm, I wonder how GitHub's parser renders it? Let's try:
a|b
- | -
Let's also try your original case:
- foo
a|b
- bar
from commonmark-hs.
Literal pipes in GitHub's pipe tables formerly needed to be escaped, even inside code backticks; let's see if that's still the case:
test | table |
---|---|
` | ` |
Answer: yes.
from commonmark-hs.
Aha. You need a trailing newline in input
in your program above. That's why it's not parsing as a table.
from commonmark-hs.
Original case also works with the workaround; you just need to ensure that all lines are terminated with newline characters.
from commonmark-hs.
I don't think the workaround will have any bad consequences: it just means that nothing will be interpreted as a table if it can be interpreted as any other kind of block-level element (aside from a paragraph), and I think that's desirable.
We can leave this issue open, though, because this is an awkward workaround and it makes it impossible to use gfmExtensions
.
from commonmark-hs.
Thanks for the further research, @jgm. I now applied your workaround to hackage-server
:
from commonmark-hs.
Related Issues (20)
- [fuzz result] parser sees links with unbalanced `[]` inside
- [fuzz result] nested empty list with two trailing blank lines causes the outer list to be parsed as loose
- [fuzz result] footnote definition labels with blank lines are allowed HOT 1
- [fuzz result] unindented lines after footnote def are silently eaten HOT 6
- `commonmark-pandoc`: calculate relative cell widths for pipe tables HOT 11
- Tests fail with "Stack space overflow" on big endian systems HOT 1
- [fuzz result] Link def title trailing backslash
- Support for GitHub markdown alerts {Note, Important, Tip...} HOT 14
- [fuzzing result] deeply nested list with blank lines causes outer list to not parse as loose
- [fuzz result] code span vanishes when link destination is ` HOT 5
- [fuzz result] [commonmark-pandoc] footnotes in footnotes HOT 3
- [fuzz result] inline processing instructions can't parse more than once in a block? HOT 2
- [fuzz result] HTML declaration blocks do not follow spec 0.30
- [fuzz result] counterintuitive list tightness
- Quadratic output size explosion with tables extension HOT 5
- gfm parsing oddity with links and raw HTML HOT 6
- GFM + Rebase relative paths incorrectly rebases URLs with Unicode characters HOT 1
- [fuzz result] backslashing `&` entities doesn't escape them in link destinations HOT 1
- autolink_bare_uris doesn't work correctly in 3.1.12.x for URLs with dot, comma, parenthesis etc. HOT 1
- Challenges Outputting Context-Dependent Things To Text HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from commonmark-hs.