Coder Social home page Coder Social logo

exist-markdown's People

Contributors

adamretter avatar bkis avatar duncdrum avatar joewiz avatar lguariento avatar line-o avatar wolfgangmm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

exist-markdown's Issues

further enhancements

This is a followup on #15

A few enhancements we should consider now or later:

drop .existdb.json

Put all necessary package metadata in a property app, exist or xar in package.json.
npm packages are implicitly allowed to add their custom properties in package.json but have to take care themselves not to clash with names used by npm itself. I have use app for other projects in the past.

add npm script to install the library (without the test application)

I usually use npm start for that.

optimise GithubActions

  • Use prebuilt docker images (gives cacheing of images for free, makes the test preparation and workflow definition less complex and error prone).
  • As npm test already calls gulp install:all copying the XAR into auto deploy of the docker container is superfluous
  • Also test on docker image tag 5.3.0 to ensure backwards compatibility for future versions of this lib

adopt readOptionsFromEnv

To allow all npm and gulp scripts to target different existdb instances with ease.

A quite complete setup with all of the above can be found in a eeditiones/roaster#30 which is not yet merged.

Problems parsing XQuery code blocks

The markdown:parse() function mangles XQuery source code contained in fenced code blocks.

For example, the following code...

xquery version "3.1";

import module namespace markdown="http://exist-db.org/xquery/markdown";

markdown:parse('# Code sample

This is a map containing two entries, one whose value is an array and another whose value is a string.

```xquery
xquery version "3.1";

map { "k1": array { "v1", "v2" }, "k2": "v3" }
```

This code should correctly render.
')

... returns the following HTML:

<body>
    <section>
        <h1>Code sample</h1>
        <p>This is a map containing two entries, one whose value is an array and another whose value is a string.</p>
        <pre data-language="xquery">xquery version "3.1";

map <span itemprop=" &#34;k1&#34;">array { "v1"</span>, <span itemprop=" &#34;k1&#34;">"v2" </span>, "k2": "v3" }
</pre>
        <p>This code should correctly render.</p>
    </section>
</body>

Effectively, it turns:

map { "k1": array { "v1", "v2" }, "k2": "v3" }

into:

map array { "v1", "v2" , "k2": "v3" }

This can be seen in https://exist-db.org/exist/apps/wiki/blogs/eXist/XQuery31 in the section titled "Serialization".

first paragraph missing

I have tried this in Xidel, but the first paragraph is always missing

E.g. markdown:parse("xx") becomes <body></body>

* a
* b
* c 

becomes <body></body>, too

But

a

b

c

becomes <body><p>b</p><p>c</p></body>

And

x

* a
* b
* c

becomes

<body><ul><li>
        
            a
            
        </li><li>
        
            b
            
        </li><li>
        
            c
            
        </li></ul></body>

Is this an issue with Xidel or the module? I had to replace xquery version "3.0"; with xquery version "3.1"; and util:parse-html with x:parse-html

[BUG] Strange "h4039" element on default landing page

Describe the bug

When loading the landing page (/main.md), the generated HTML has a strange <h4039> element inside the body/section:

<body class="container">
    <body>
        <section>
            <h4039># Supported Markdown syntax

Markdown within this element is not further processed or transformed into HTML.

Expected behavior

The page should contain valid HTML.

To Reproduce

Install app, load http://localhost:8080/exist/apps/markdown.

Context (please always complete the following information):

  • OS: macOS 11.4
  • eXist-db Version: eXist 5.3.0-SNAPSHOT e371efd9987a9a2f4414839c7bf1dbc20107b6d1 20210604033555
  • Java Version: OpenJDK 1.8.0_292-b10 (liberica-jdk8-full)
  • App Version: 0.6 (both installed from public-repo, and built from current master)

Additional context

  • How is eXist-db installed? built from source
  • Any custom changes in e.g. conf.xml? none

[BUG] Markdown interleaved in HTML blocks is mangled

Expected behavior

Markdown interleaved in HTML blocks was expected to work by the author of test.md.

Actual behavior

Markdown interleaved in HTML blocks is mangled

Reproduction steps

See the pending test at https://github.com/eXist-db/exist-markdown/blob/master/test/xqs/test-suite.xqm#L309-L340.

This test takes this markdown:

<div class="row">
    <div class="col-md-6">
        First column in **two column layout**.
        
        Second paragraph.
    </div>
    <div class="col-md-6">
        Second column in two column layout.
    </div>
</div>

With this input, the markdown:parse() function should return:

<body>
    <div class="row">
        <div class="col-md-6">
            <p>First column in <strong>two column layout</strong>.</p>
            <p>Second paragraph.</p>
        </div>
        <div class="col-md-6">
            <p>Second column in two column layout.</p>
        </div>
    </div>
</body>

But it actually returns:

<body>
    <div class="row">
        <body/>
        <div class="col-md-6">
            <body>
                <p>First column in two column layout.</p>
            </body>
        </div>
    </div>
    <p>Second paragraph. <div class="col-md-6"> Second column in two column layout. </div> &lt;/div&gt;</p>
</body>

Note that (1) an empty <body/> element is inserted into the outer div, (2) the "Second paragraph" is ejected from the first inner div, and (3) the second inner div is inserted into the "Second paragraph" <p> element.

Since the parsed markdown doesn't equal the expected output, the test fails (and is marked as pending in the source until a fix is in place):

<testcase name="HTML block containing markdown" class="tests:html-block-containing-markdown">
    <failure message="assertTrue failed." type="failure-error-code-1"/>
    <output>false</output>
</testcase>

Note that the Commonmark dingus at https://spec.commonmark.org/dingus/ also produces mangled output:

<div class="row">
    <div class="col-md-6">
        First column in **two column layout**.
<pre><code>    Second paragraph.
&lt;/div&gt;
&lt;div class=&quot;col-md-6&quot;&gt;
    Second column in two column layout.
&lt;/div&gt;
</code></pre>
</div>

This suggests that a Commonmark-compliant processor may not be expected to handle interleaved HTML blocks and Markdown.

Please provide the following

  • Java Version: n/a
  • exist-db version: 6.1.0-SNAPSHOT
  • exist-markdown version: 1.0.0
  • OS version: n/a

Error on startup with current eXist develop

After starting up current eXist develop, loading the markdown app at http://localhost:8080/exist/apps/markdown/ redirects to http://localhost:8080/exist/apps/markdown/test.md, which yields the following error:

<exception>
    <path>/db/apps/markdown/parse.xql</path>
    <message>
        err:XQST0033 error found while loading module md: Error while loading module content/markdown.xql: Cannot bind prefix 'md' to 'http://exist-db.org/xquery/markdown' it is already bound to 'http://exist-db.org/metadata'
    </message>
</exception>

The key bit:

Cannot bind prefix 'md' to 'http://exist-db.org/xquery/markdown' it is already bound to 'http://exist-db.org/metadata'

The registration of this prefix appears to stretch back to 2012 - according to eXist-db/exist@c33a2fa - so it's very odd that we haven't seen this before!

[BUG] Curly braces in fenced code blocks are mangled

Expected behavior

Curly braces inside fenced code blocks should be left as literal curly braces.

Actual behavior

Curly braces are replaced with a <span itemprop=""> element.

Reproduction steps

See the pending test at https://github.com/eXist-db/exist-markdown/blob/master/test/xqs/test-suite.xqm#L223-L244.

This test takes this markdown:

```xquery
for $i in 1 to 10
return
    <li>{$i * 2}</li>
```

With this input, the markdown:parse() function should return:

<body>
    <pre data-language="xquery">for $i in 1 to 10
return
    &lt;li&gt;{$i * 2}&lt;/li&gt;
</pre>
</body>

The Commonmark dingus at https://spec.commonmark.org/dingus/ returns something quite similar, so our expectations are inline with Commonmark:

<pre>
    <code class="language-xquery">for $i in 1 to 10
return
    &lt;li&gt;{$i * 2}&lt;/li&gt;
</code>
</pre>

But it actually returns:

<body>
    <pre data-language="xquery">for $i in 1 to 10
return
    &lt;li&gt;<span itemprop="$i * 2">$i * 2</span>&lt;/li&gt;
</pre>
</body>

Note that the curly braces are transformed into a <span itemprop=""> structure - which is associated with the library's handling of "label" at https://github.com/eXist-db/exist-markdown/blob/master/content/markdown.xqm#L119-L128.

Since the parsed markdown doesn't equal the expected output, the test fails (and is marked as pending in the source until a fix is in place):

<testcase name="Code Blocks" class="tests:code-blocks">
    <failure message="assertTrue failed." type="failure-error-code-1"/>
    <output>false</output>
</testcase>

Please provide the following

  • Java Version: n/a
  • exist-db version: 6.1.0-SNAPSHOT
  • exist-markdown version: 1.0.0
  • OS version: n/a

[BUG] Parsing of `mark` element in "Inline HTML" test

Expected behavior

In inline HTML, inline elements like <mark> should be preserved.

Actual behavior

The elements are dropped from output.

Reproduction steps

See the pending test at https://github.com/eXist-db/exist-markdown/blob/master/test/xqs/test-suite.xqm#L346-L361.

This test takes this markdown:

A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some <mark>inline</mark> <code>HTML</code>.

With this input, the markdown:parse() function should return:

<body>
    <p>A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some <mark>inline</mark> <code>HTML</code>.</p>
</body>

The Commonmark dingus at https://spec.commonmark.org/dingus/ returns this exactly (sans the <body> wrapper, which exist-markdown uses to ensure its results are well-formed, and which users of the library would normally omit from output):

<p>A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some <mark>inline</mark> <code>HTML</code>.</p>

But it actually returns:

<body>
    <p>A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some  <code>HTML</code>.</p>
</body>

Note that the <mark>inline</mark> element was dropped from the output and replaced with an extra space character between some and <code>HTML</code>.

Since the parsed markdown doesn't equal the expected output, the test fails (and is marked as pending in the source until a fix is in place):

<testcase name="Inline HTML" class="tests:inline-html">
    <failure message="assertTrue failed." type="failure-error-code-1"/>
    <output>false</output>
</testcase>

Please provide the following

  • Java Version: n/a
  • exist-db version: 6.1.0-SNAPSHOT
  • exist-markdown version: 1.0.0
  • OS version: n/a

Add tests, or... ?

Without a test suite, fixing bugs in this library's Markdown parser risks introducing new ones.

The CommonMark tests from https://github.com/commonmark/commonmark-spec would be a natural starting point., as CommonMark is:

a standard, unambiguous syntax specification for Markdown, along with a suite of comprehensive tests to validate Markdown implementations against this specification.

To get started, I cloned the commonmark-spec repository and extracted the tests as described in its README:

gh repo clone commonmark/commonmark-spec
cd commonmark-spec
python3 test/spec_tests.py --dump-tests > commonmark-tests.json

... and I uploaded these to /db/commonmark-tests.json.

Then I developed the following query. Initially I got all errors or failures, but when I stripped out the trailing \n newline from the test's source Markdown, I got 68 passes, 570 failures, and 14 errors.

Certainly, some of the failures are caused by whitespace differences, but without a function for parsing HTML in eXist-db (!), normalizing expected and actual outputs is not possible, and thus the test suite can't tell us whether a failure is a real problem or just a meaningless whitespace issue.

xquery version "3.1";

import module namespace markdown="http://exist-db.org/xquery/markdown";

let $tests := json-doc("/db/commonmark-tests.json")
let $results :=
    for $test in $tests?*
    let $markdown := $test?markdown
    (: disregard trailing newline from the source test's expected output :) 
    let $expected-result := $test?html => replace("\n$", "")
    let $actual-result := 
        try { 
            (
                (: the parse function wraps results in a <body> element :)
                markdown:parse($markdown)/node() 
                ! serialize(., map { "method": "html", "indent": true(), "html-version": 4.0 } )
            ) 
            => string-join() 
        }
        catch * { 
            map { 
                "error": "markdown parsing error raised at " || $err:line-number || ":" || $err:column-number 
                    || ": " || $err:description 
            }
        }
    return
        map {
            "expected-result": $expected-result,
            "actual-result": $actual-result,
            "status": 
                (
                    if ($actual-result instance of map(*)) then 
                        "error"
                    else if (deep-equal($expected-result, $actual-result)) then
                        "pass"
                    else
                        "fail"
                ),
            "source": $test
        }
for $result in $results
group by $status := $result?status
order by index-of(("pass", "fail", "error"), $status)
return
    map {
        "status-group": $status,
        "number-of-results": count($result),
        "results": array { $result }
    }

Researching and fixing the failing tests would be an extensive project. It would require developing an XQuery function for parsing HTML—or shifting development to BaseX, which has an HTML parsing module.

Alternatively, an XQuery wrapper around the https://github.com/commonmark/commonmark-java or https://github.com/vsch/flexmark-java project might be a better investment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.