exist-db / exist-markdown Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 9.0 239 KB

Markdown Parser in XQuery

License: Other

XQuery 80.21% JavaScript 19.79%

exist-markdown's People

Contributors

Stargazers

Watchers

Forkers

ljo emchateau joewiz bmix angelodel80 bkis fabianetling lguariento isabella232

exist-markdown's Issues

further enhancements

This is a followup on #15

A few enhancements we should consider now or later:

drop .existdb.json

Put all necessary package metadata in a property app, exist or xar in package.json.
npm packages are implicitly allowed to add their custom properties in package.json but have to take care themselves not to clash with names used by npm itself. I have use app for other projects in the past.

add npm script to install the library (without the test application)

I usually use npm start for that.

optimise GithubActions

Use prebuilt docker images (gives cacheing of images for free, makes the test preparation and workflow definition less complex and error prone).
As npm test already calls gulp install:all copying the XAR into auto deploy of the docker container is superfluous
Also test on docker image tag 5.3.0 to ensure backwards compatibility for future versions of this lib

adopt readOptionsFromEnv

To allow all npm and gulp scripts to target different existdb instances with ease.

A quite complete setup with all of the above can be found in a eeditiones/roaster#30 which is not yet merged.

Problems parsing XQuery code blocks

The markdown:parse() function mangles XQuery source code contained in fenced code blocks.

For example, the following code...

xquery version "3.1";

import module namespace markdown="http://exist-db.org/xquery/markdown";

markdown:parse('# Code sample

This is a map containing two entries, one whose value is an array and another whose value is a string.

```xquery
xquery version "3.1";

map { "k1": array { "v1", "v2" }, "k2": "v3" }
```

This code should correctly render.
')

... returns the following HTML:

<body>
    <section>
        <h1>Code sample</h1>
        <p>This is a map containing two entries, one whose value is an array and another whose value is a string.</p>
        <pre data-language="xquery">xquery version "3.1";

map <span itemprop=" &#34;k1&#34;">array { "v1"</span>, <span itemprop=" &#34;k1&#34;">"v2" </span>, "k2": "v3" }
</pre>
        <p>This code should correctly render.</p>
    </section>
</body>

Effectively, it turns:

map { "k1": array { "v1", "v2" }, "k2": "v3" }

into:

map array { "v1", "v2" , "k2": "v3" }

This can be seen in https://exist-db.org/exist/apps/wiki/blogs/eXist/XQuery31 in the section titled "Serialization".

first paragraph missing

I have tried this in Xidel, but the first paragraph is always missing

E.g. markdown:parse("xx") becomes <body></body>

* a
* b
* c

becomes <body></body>, too

But

a

b

c

becomes <body>bc</body>

And

x

* a
* b
* c

becomes

<body><ul><li>
        
            a
            
        </li><li>
        
            b
            
        </li><li>
        
            c
            
        </li></ul></body>

Is this an issue with Xidel or the module? I had to replace xquery version "3.0"; with xquery version "3.1"; and util:parse-html with x:parse-html

[BUG] Strange "h4039" element on default landing page

Describe the bug

When loading the landing page (/main.md), the generated HTML has a strange <h4039> element inside the body/section:

<body class="container">
    <body>
        <section>
            <h4039># Supported Markdown syntax

Markdown within this element is not further processed or transformed into HTML.

Expected behavior

The page should contain valid HTML.

To Reproduce

Install app, load http://localhost:8080/exist/apps/markdown.

Context (please always complete the following information):

OS: macOS 11.4
eXist-db Version: eXist 5.3.0-SNAPSHOT e371efd9987a9a2f4414839c7bf1dbc20107b6d1 20210604033555
Java Version: OpenJDK 1.8.0_292-b10 (liberica-jdk8-full)
App Version: 0.6 (both installed from public-repo, and built from current master)

Additional context

How is eXist-db installed? built from source
Any custom changes in e.g. conf.xml? none

[BUG] Markdown interleaved in HTML blocks is mangled

Expected behavior

Markdown interleaved in HTML blocks was expected to work by the author of test.md.

Actual behavior

Markdown interleaved in HTML blocks is mangled

Reproduction steps

See the pending test at https://github.com/eXist-db/exist-markdown/blob/master/test/xqs/test-suite.xqm#L309-L340.

This test takes this markdown:

<div class="row">
    <div class="col-md-6">
        First column in **two column layout**.
        
        Second paragraph.
    </div>
    <div class="col-md-6">
        Second column in two column layout.
    </div>
</div>

With this input, the markdown:parse() function should return:

<body>
    <div class="row">
        <div class="col-md-6">
            <p>First column in <strong>two column layout</strong>.</p>
            <p>Second paragraph.</p>
        </div>
        <div class="col-md-6">
            <p>Second column in two column layout.</p>
        </div>
    </div>
</body>

But it actually returns:

<body>
    <div class="row">
        <body/>
        <div class="col-md-6">
            <body>
                <p>First column in two column layout.</p>
            </body>
        </div>
    </div>
    <p>Second paragraph. <div class="col-md-6"> Second column in two column layout. </div> &lt;/div&gt;</p>
</body>

Note that (1) an empty <body/> element is inserted into the outer div, (2) the "Second paragraph" is ejected from the first inner div, and (3) the second inner div is inserted into the "Second paragraph"  element.

Since the parsed markdown doesn't equal the expected output, the test fails (and is marked as pending in the source until a fix is in place):

<testcase name="HTML block containing markdown" class="tests:html-block-containing-markdown">
    <failure message="assertTrue failed." type="failure-error-code-1"/>
    <output>false</output>
</testcase>

Note that the Commonmark dingus at https://spec.commonmark.org/dingus/ also produces mangled output:

<div class="row">
    <div class="col-md-6">
        First column in **two column layout**.
<pre><code>    Second paragraph.
&lt;/div&gt;
&lt;div class=&quot;col-md-6&quot;&gt;
    Second column in two column layout.
&lt;/div&gt;
</code></pre>
</div>

This suggests that a Commonmark-compliant processor may not be expected to handle interleaved HTML blocks and Markdown.

Please provide the following

Java Version: n/a
exist-db version: 6.1.0-SNAPSHOT
exist-markdown version: 1.0.0
OS version: n/a

Error on startup with current eXist develop

After starting up current eXist develop, loading the markdown app at http://localhost:8080/exist/apps/markdown/ redirects to http://localhost:8080/exist/apps/markdown/test.md, which yields the following error:

<exception>
    <path>/db/apps/markdown/parse.xql</path>
    <message>
        err:XQST0033 error found while loading module md: Error while loading module content/markdown.xql: Cannot bind prefix 'md' to 'http://exist-db.org/xquery/markdown' it is already bound to 'http://exist-db.org/metadata'
    </message>
</exception>

The key bit:

Cannot bind prefix 'md' to 'http://exist-db.org/xquery/markdown' it is already bound to 'http://exist-db.org/metadata'

The registration of this prefix appears to stretch back to 2012 - according to eXist-db/exist@c33a2fa - so it's very odd that we haven't seen this before!

[BUG] Curly braces in fenced code blocks are mangled

Expected behavior

Curly braces inside fenced code blocks should be left as literal curly braces.

Actual behavior

Curly braces are replaced with a  element.

Reproduction steps

See the pending test at https://github.com/eXist-db/exist-markdown/blob/master/test/xqs/test-suite.xqm#L223-L244.

This test takes this markdown:

```xquery
for $i in 1 to 10
return
    <li>{$i * 2}</li>
```

With this input, the markdown:parse() function should return:

<body>
    <pre data-language="xquery">for $i in 1 to 10
return
    &lt;li&gt;{$i * 2}&lt;/li&gt;
</pre>
</body>

The Commonmark dingus at https://spec.commonmark.org/dingus/ returns something quite similar, so our expectations are inline with Commonmark:

<pre>
    <code class="language-xquery">for $i in 1 to 10
return
    &lt;li&gt;{$i * 2}&lt;/li&gt;
</code>
</pre>

But it actually returns:

<body>
    <pre data-language="xquery">for $i in 1 to 10
return
    &lt;li&gt;<span itemprop="$i * 2">$i * 2</span>&lt;/li&gt;
</pre>
</body>

Note that the curly braces are transformed into a  structure - which is associated with the library's handling of "label" at https://github.com/eXist-db/exist-markdown/blob/master/content/markdown.xqm#L119-L128.

Since the parsed markdown doesn't equal the expected output, the test fails (and is marked as pending in the source until a fix is in place):

<testcase name="Code Blocks" class="tests:code-blocks">
    <failure message="assertTrue failed." type="failure-error-code-1"/>
    <output>false</output>
</testcase>

Please provide the following

Java Version: n/a
exist-db version: 6.1.0-SNAPSHOT
exist-markdown version: 1.0.0
OS version: n/a

[BUG] Parsing of `mark` element in "Inline HTML" test

Expected behavior

In inline HTML, inline elements like  should be preserved.

Actual behavior

The elements are dropped from output.

Reproduction steps

See the pending test at https://github.com/eXist-db/exist-markdown/blob/master/test/xqs/test-suite.xqm#L346-L361.

This test takes this markdown:

A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some <mark>inline</mark> <code>HTML</code>.

With this input, the markdown:parse() function should return:

<body>
    <p>A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some <mark>inline</mark> <code>HTML</code>.</p>
</body>

The Commonmark dingus at https://spec.commonmark.org/dingus/ returns this exactly (sans the <body> wrapper, which exist-markdown uses to ensure its results are well-formed, and which users of the library would normally omit from output):

<p>A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some <mark>inline</mark> <code>HTML</code>.</p>

But it actually returns:

<body>
    <p>A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some  <code>HTML</code>.</p>
</body>

Note that the inline element was dropped from the output and replaced with an extra space character between some and <code>HTML</code>.

Since the parsed markdown doesn't equal the expected output, the test fails (and is marked as pending in the source until a fix is in place):

<testcase name="Inline HTML" class="tests:inline-html">
    <failure message="assertTrue failed." type="failure-error-code-1"/>
    <output>false</output>
</testcase>

Please provide the following

Java Version: n/a
exist-db version: 6.1.0-SNAPSHOT
exist-markdown version: 1.0.0
OS version: n/a

Add tests, or... ?

Without a test suite, fixing bugs in this library's Markdown parser risks introducing new ones.

The CommonMark tests from https://github.com/commonmark/commonmark-spec would be a natural starting point., as CommonMark is:

a standard, unambiguous syntax specification for Markdown, along with a suite of comprehensive tests to validate Markdown implementations against this specification.

To get started, I cloned the commonmark-spec repository and extracted the tests as described in its README:

gh repo clone commonmark/commonmark-spec
cd commonmark-spec
python3 test/spec_tests.py --dump-tests > commonmark-tests.json

... and I uploaded these to /db/commonmark-tests.json.

Then I developed the following query. Initially I got all errors or failures, but when I stripped out the trailing \n newline from the test's source Markdown, I got 68 passes, 570 failures, and 14 errors.

Certainly, some of the failures are caused by whitespace differences, but without a function for parsing HTML in eXist-db (!), normalizing expected and actual outputs is not possible, and thus the test suite can't tell us whether a failure is a real problem or just a meaningless whitespace issue.

xquery version "3.1";

import module namespace markdown="http://exist-db.org/xquery/markdown";

let $tests := json-doc("/db/commonmark-tests.json")
let $results :=
    for $test in $tests?*
    let $markdown := $test?markdown
    (: disregard trailing newline from the source test's expected output :) 
    let $expected-result := $test?html => replace("\n$", "")
    let $actual-result := 
        try { 
            (
                (: the parse function wraps results in a <body> element :)
                markdown:parse($markdown)/node() 
                ! serialize(., map { "method": "html", "indent": true(), "html-version": 4.0 } )
            ) 
            => string-join() 
        }
        catch * { 
            map { 
                "error": "markdown parsing error raised at " || $err:line-number || ":" || $err:column-number 
                    || ": " || $err:description 
            }
        }
    return
        map {
            "expected-result": $expected-result,
            "actual-result": $actual-result,
            "status": 
                (
                    if ($actual-result instance of map(*)) then 
                        "error"
                    else if (deep-equal($expected-result, $actual-result)) then
                        "pass"
                    else
                        "fail"
                ),
            "source": $test
        }
for $result in $results
group by $status := $result?status
order by index-of(("pass", "fail", "error"), $status)
return
    map {
        "status-group": $status,
        "number-of-results": count($result),
        "results": array { $result }
    }

Researching and fixing the failing tests would be an extensive project. It would require developing an XQuery function for parsing HTML—or shifting development to BaseX, which has an HTML parsing module.

Alternatively, an XQuery wrapper around the https://github.com/commonmark/commonmark-java or https://github.com/vsch/flexmark-java project might be a better investment.

Transfer exist-markdown repository to eXist-db organization

Like eXide and monex, this app is included in all default installations of eXist. To facilitate reporting issues related to it, it would be best, if possible, if the repository belonged to the eXist-db organization.

exist-db / exist-markdown Goto Github PK

exist-markdown's People

Contributors

Stargazers

Watchers

Forkers

exist-markdown's Issues

Expected behavior

Actual behavior

Reproduction steps

Please provide the following

Expected behavior

Actual behavior

Reproduction steps

Please provide the following

Expected behavior

Actual behavior

Reproduction steps

Please provide the following

Recommend Projects

Recommend Topics

Recommend Org