Coder Social home page Coder Social logo

commonmark-hs's People

Contributors

david-christiansen avatar frasertweedale avatar hagl avatar jgm avatar josephcsible avatar kukimik avatar notriddle avatar sjakobi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

commonmark-hs's Issues

stack overflow

This occurs on the benchmark for pathological <?, but it can be reproduced without this:

% python -c 'print("?" * 4000)' | commonmark +RTS -K90000 -xc
*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace:
  MAIN.MAIN
*** Exception (reporting due to +RTS -xc): (THUNK), stack trace:
  MAIN.MAIN
*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace:
  Data.Text.Internal.IO.readTextDevice
*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace:
  Commonmark.Inlines.pSymbol,
  called from Commonmark.Inlines.defaultInlineParser,
  called from Commonmark.Inlines.pInline,
  called from Commonmark.Inlines.attrParser,
  called from Commonmark.Inlines.parseChunks,
  called from Commonmark.Blocks.Commonmark.Blocks.runInlineParser,
  called from Commonmark.Blocks.restOfLine,
  called from Commonmark.Blocks.block_starts,
  called from Commonmark.Blocks.blockContinues,
  called from Commonmark.Blocks.Commonmark.Blocks.processLines
*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace:
  Commonmark.Inlines.pSymbol,
  called from Commonmark.Inlines.defaultInlineParser,
  called from Commonmark.Inlines.pInline,
  called from Commonmark.Inlines.attrParser,
  called from Commonmark.Inlines.parseChunks,
  called from Commonmark.Blocks.Commonmark.Blocks.runInlineParser,
  called from Commonmark.Blocks.restOfLine,
  called from Commonmark.Blocks.block_starts,
  called from Commonmark.Blocks.blockContinues,
  called from Commonmark.Blocks.Commonmark.Blocks.processLines
Commonmark.Blocks.Commonmark.Blocks.processLines (src/Commonmark/Blocks.hs:108:1-12)
Commonmark.Blocks.blockContinues (src/Commonmark/Blocks.hs:(223,8)-(238,64))
Commonmark.Blocks.block_starts (src/Commonmark/Blocks.hs:(151,28)-(163,13))
Commonmark.Blocks.restOfLine (src/Commonmark/Blocks.hs:170:34-43)
Commonmark.Blocks.Commonmark.Blocks.runInlineParser (src/Commonmark/Blocks.hs:385:1-15)
Commonmark.Inlines.parseChunks (src/Commonmark/Inlines.hs:(70,34)-(72,45))
Commonmark.Inlines.attrParser (src/Commonmark/Inlines.hs:(331,33)-(333,71))
Commonmark.Inlines.pInline (src/Commonmark/Inlines.hs:333:30-71)
Commonmark.Inlines.defaultInlineParser (src/Commonmark/Inlines.hs:(84,35)-(94,28))
Commonmark.Inlines.pSymbol (src/Commonmark/Inlines.hs:(431,43)-(435,28))
*** Exception (reporting due to +RTS -xc): (THUNK_1_0), stack trace:
  Commonmark.Inlines.pSymbol,
  called from Commonmark.Inlines.defaultInlineParser,
  called from Commonmark.Inlines.pInline,
  called from Commonmark.Inlines.attrParser,
  called from Commonmark.Inlines.parseChunks,
  called from Commonmark.Blocks.Commonmark.Blocks.runInlineParser,
  called from Commonmark.Blocks.restOfLine,
  called from Commonmark.Blocks.block_starts,
  called from Commonmark.Blocks.blockContinues,
  called from Commonmark.Blocks.Commonmark.Blocks.processLines
commonmark: Stack space overflow: current size 33568 bytes.
commonmark: Use `+RTS -Ksize -RTS' to increase it.

Documentation improvements

  • Example of adding an inline parser (say, abbreviations)
  • Example of modifying existing HTML output
  • Example of creating a new output format, e.g. roff man. (In this case there are some complexities, because the effect of an 'emph' might be \f[I], but it might be \f[BI] (if we're already in a boldface section). We can handle that by making the IsInline type for roff be a newtype embeddding State EmphState Builder or something like that. Then the effect can be state dependent. (Maybe we should consider changing all the implementations to be like this? It would remove some of the complexity for e.g. footnotes.)

Footnote in AST lacks index information

The HasFootnote instance for Pandoc AST ignores the footnote identifiers, and labels:

instance (Rangeable (Cm a B.Inlines), Rangeable (Cm a B.Blocks))
=> HasFootnote (Cm a B.Inlines) (Cm a B.Blocks) where
footnote _num _lab _x = mempty
footnoteList _xs = mempty
footnoteRef _num _lab contents = B.note <$> contents

And it looks like Pandoc is doing its own state management to compute the identifier, and render the references accordingly. Is storing these footnote identifiers/labels in the Pandoc AST explicitly out of scope?

New pathological parsing for fenced divs

python -c 'n=800; print("::: {#id}\n" * n + "a\n" + ":::\n" * n)'

This is a problem with the new fix for detecting fence closers.
At each close we need to iterate through all subordinate fenced divs in the stack.

Unable to set attributes on table

When I add an attribute like {.overflows .collapsing .compact .sortable} immediately above the table, in the AST I get a wrapping div (with these attributes), which wrapes the table element. I'd expect these attributes to be applied to the <table> tag itself.

Make source positions a parser option?

Instead of handling it with typeclasses.
This would allow simpler typeclasses: Html, Pandoc.
It might also allow us to improve performance by avoiding the work of storing and computing ranges.

Extra block included in fenced_div

commonmark -xfenced_divs
::: {#id}
- a
- b
:::

Next para.
^D
<div id="id">
<ul>
<li>a
</li>
<li>b
</li>
</ul>
<p>Next para.</p>
</div>

The paragraph at the end should be outside the div.

pathological case parsing inline CDATA tag

commonmark/cmark#299 affects commonmark-hs as well.

python -c 'print("a <![CDATA[" * 10000)' | time cmark > /dev/null
0.40user 0.00system 0:00.42elapsed 95%CPU (0avgtext+0avgdata 9720maxresident)k

python -c 'print("a <![CDATA[" * 20000)' | time cmark > /dev/null
1.60user 0.00system 0:01.62elapsed 98%CPU (0avgtext+0avgdata 17760maxresident)k

python -c 'print("a <![CDATA[" * 40000)' | time cmark > /dev/null
6.20user 0.02system 0:06.25elapsed 99%CPU (0avgtext+0avgdata 34372maxresident)k

Bad definition list parsing

% commonmark -xdefinition_lists
## Blah

`-v`, `--version`

:   Print version.

`-h`, `--help`

:   Show usage message.
^D
<h2 id="blah">Blah</h2>
<dl>
<dt><code>-v</code>, <code>--version</code></dt>
<dd>
<p>Print version.</p>
</dd>
</dl>
<dl>
<dt><code>-h</code>, <code>--help</code></dt>
<dd>
<p>Show usage message.</p>
</dd>
</dl>

This produces two dls rather than one. But if you remove the heading, you get one as intended.

New system for source map

The current system (defining a new typeclass instance for the constructors) doesn't give fine-grained enough information (e.g. it doesn't distinguish code span delimiters from the code). And it is awkwardly designed, so that for example it's easy to write instances that produce loops.

Better to put a field in state that keeps a source map, and maybe another field that controls whether to populate it (for efficiency this can be turned off). Then the individual constructors can be designed to insert whatever fine-grained mapping data would be useful.

Alternatively, instead of making these parsers work for any Monad, limit to the HasSourceMap typeclass and define dummy default instances for common monads.

Unable to link images

The parser doesn't create correct pandoc AST nodes for this syntax:

[![asciicast](https://asciinema.org/a/329911.svg)](https://asciinema.org/a/329911)

Here's how one would expect it to render:

asciicast

Fix Windows test failures

See the test suite under Actions.
Needs investigating whether there are issues in the library itself, or just in the test suite.

rawHtmlSpec gets in the way of parsing special links

In neuron we support special links of the format <a34sfef4> (as well as: <z:zettels?tag=foo>) that has to be processed by the app to be replaced with some custom HTML stuff.

I wrote a syntax spec defining the parser in syntaxInlineParsers, but when actually using it it was only getting applied for <1hello> but not <hello> (which gets rendered as raw HTML).

rawhtmlspec in defaultBlockSpecs is the cause of this behaviour. What is the recommended way to turn this behaviour off, so that applications have the flexibility to parse syntax with angle brackets?

Consider Megaparsec

A quick question. Perhaps this repo is a good place to try using Megaparsec instead of Parsec?

Megaparsec has some niceties compared to Parsec, which include better error reporting and Unicode support, and it's also reportedly faster. The only downside I can think of (apart from switching parsers being potentially a lot of work) is that Megaparsec is not as stable in terms of API and features -- but that comes with territory with an actively-maintained package.

Since this repo is not (yet) directly tied to Pandoc, I thought that this might be a good place for such experiments.

Just a thought, no pressure.

Release to hackage?

Hello! Thanks for commonmark!

The newest version of https://github.com/srid/neuron, uses the commonmark library (quite successfully I, as mainly a user, might add). To bring the new features to all users (and e.g. nixpkgs) we would like to make a new neuron release. Sadly releasing to hackage with a dependency not on hackage is a bad idea.
So for this situation it would be marvelous if we could make a first commonmark release to hackage soon.

@jgm Would you be willing to do that?

If you don‘t want to maintain commonmark on hackage, @srid has offered to do the hackage maintenance, if you were to agree with that.

Bug in parsing some HTML tags immediately followed by eof

% echo -n "<\!A>" | commonmark
"stdin" (line 1, column 5):
unexpected end of input
% echo -n "<\!-- hi -->" | commonmark
"stdin" (line 1, column 12):
unexpected end of input
% echo -n "<?" | commonmark
"stdin" (line 1, column 3):
unexpected end of input
% echo -n "<? hi ?>" | commonmark
"stdin" (line 1, column 9):
unexpected end of input

Implicit heading references breaks when used with `smart`

% commonmark -ximplicit_heading_references -xauto_identifiers -xsmart
# hi

See [hi].

# Jo's heading

See [Jo's heading]
^D
<h1 id="hi">hi</h1>
<p>See <a href="#">hi</a>.</p>
<h1 id="jos-heading">Jo’s heading</h1>
<p>See [Jo’s heading]</p>

Examples to show off the library

  • Markdown-aware spell checker: emit misspelled words + source locations.
  • In-place transformations: do a specific transformation on an existing markdown document (e.g., capitalizing all heading titles, or changing indented code blocks to fenced style) without changing anything else about the document.

source map/highlight issue with link in table cell

% commonmark  -xall --highlight
| Sample                   |showdown  |commonmark|marked    |markdown-it|
|--------------------------|---------:|---------:|---------:|----------:|
|[README.md]               |         1|       3.6|       3.1|        3.9|

[README.md]: url

yields (snipping relevant part)

| <span class="str">Sample</span>                   |<span class="str">showdown</span>  |<span class="str">commonmark</span>|<span class="str">marked</span>    |<span class="str">markdown</span><span class="str">-</span><span class="str">it</span>|
|--------------------------|---------:|---------:|---------:|----------:|
|<span class="link" title="link">[<span class="str">README</span><span class="str">.</span><span class="str">md</span>]               |         <span class="str">1</span>|       <span class="str">3</span><span class="str">.</span><span class="str">6</span>|       <span class="str">3</span><span class="str">.</span><span class="str">1</span>|        <span class="str">3</span><span class="str">.</span><span class="str">9</span>|

in which the link isn't closed in the right place.

Nonlinear parsing time for inline link openers without closers

See benchmarks

benchmarking pathological/inline link openers without closers/commonmark/800
time                 12.22 ms   (12.01 ms .. 12.43 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 12.28 ms   (12.21 ms .. 12.35 ms)
std dev              144.8 μs   (107.0 μs .. 185.6 μs)

benchmarking pathological/inline link openers without closers/commonmark/1200
time                 26.03 ms   (25.76 ms .. 26.32 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 26.21 ms   (26.05 ms .. 26.49 ms)
std dev              350.4 μs   (183.2 μs .. 539.2 μs)

benchmarking pathological/inline link openers without closers/commonmark/1600
time                 47.06 ms   (46.08 ms .. 47.64 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 47.86 ms   (47.42 ms .. 48.95 ms)
std dev              995.8 μs   (346.9 μs .. 1.569 ms)
variance introduced by outliers: 11% (moderately inflated)

benchmarking pathological/inline link openers without closers/commonmark/2000
time                 73.23 ms   (71.16 ms .. 75.25 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 75.10 ms   (74.27 ms .. 76.17 ms)
std dev              1.334 ms   (813.3 μs .. 1.975 ms)
variance introduced by outliers: 14% (moderately inflated)

Not an issue ;-)

Hi,
correct tracking of source positions is a great undertaking, especially when it comes to integration with proofreading software. Much lesser attempts for LaTeX and Markdown can be seen in
TeXtidote and Tex2txt.

Many thanks already for pandoc, and good progress with this project!
Matthias

Understanding your intent

Would you be willing to write a short para about your intent here? Is this package meant to ultimately replace the Pandoc markdown parser? Or are you using this as a place to trial extensions that might land in the CommonMark spec? Or... ?

I've been reading your code and trying to get my head around whether I should be attempting to contribute here, or to mmark, or somehow using cmark-gfm, or... Mostly I need one of the table extensions; my target is LaTeX fragments but I'm mostly interested in seeing what comes of a newer AST as compared to using pandoc-types' "native" AST.

Warm regards to Pandoc's author.

AfC

Bad sourcepos

 % commonmark --sourcepos
[hi]   ok

[hi]: url
<p data-sourcepos="stdin@1:1-1:10"><a data-sourcepos="stdin@1:1-@1:1" href="url"><span data-sourcepos="stdin@1:2-1:4">hi</span></a>   <span data-sourcepos="stdin@1:8-1:10">ok</span></p>

We could also use some more extensive test cases for source positions.

loop in highlighting with definition_list extension

instance (HasDefinitionList il bl, Semigroup bl, Semigroup il)
        => HasDefinitionList (WithSourceMap il) (WithSourceMap bl) where
  definitionList spacing items = definitionList spacing items
                                   <* addName "definitionList"
% commonmark --highlight -xdefinition_lists
hi
:    there

commonmark: <<loop>>

Parsing * * * * * * … takes quadratic time

$ python3 -c 'print(end="* "*1000)' | time commonmark > /dev/null
0.46user 0.16system 0:00.35elapsed 177%CPU (0avgtext+0avgdata 52080maxresident)k
0inputs+0outputs (0major+15662minor)pagefaults 0swaps
$ python3 -c 'print(end="* "*2000)' | time commonmark > /dev/null
1.45user 0.63system 0:01.07elapsed 193%CPU (0avgtext+0avgdata 52120maxresident)k
0inputs+0outputs (0major+16320minor)pagefaults 0swaps
$ python3 -c 'print(end="* "*4000)' | time commonmark > /dev/null
6.27user 2.81system 0:04.52elapsed 201%CPU (0avgtext+0avgdata 52292maxresident)k
0inputs+0outputs (0major+18973minor)pagefaults 0swaps
$ python3 -c 'print(end="* "*8000)' | time commonmark > /dev/null
35.88user 15.90system 0:25.93elapsed 199%CPU (0avgtext+0avgdata 51780maxresident)k
0inputs+0outputs (0major+25323minor)pagefaults 0swaps

One of the extensions seems to use FFI

I saw this in GHCJS for a particular markdown content. It doesn't happen with bare commonmark parser, but only when the extensions are enabled. I'll get to debugging and isolating the problem one of the following days, but it would be great if someone already knew what it could be off the top of their head ...

image

Cannot set attributes on the image

This for example:

![asciicast](https://asciinema.org/a/329911.svg){#ident .centered .big}

generates:

<p>
  <img class="" id="" src="https://asciinema.org/a/329911.svg" title="">
  .centered .big}
</p>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.