gitbookio / markup-it Goto Github PK

JavaScript library to parse and serialize markup content (Markdown and HTML)

JavaScript 94.02% HTML 5.98%

markdown javascript slate wysiwyg

markup-it's Introduction

markup-it

markup-it is a JavaScript library to serialize/deserialize markdown content using an intermediate format backed by an immutable model.

Installation

$ npm i markup-it --save

$ yarn add markup-it

Usage

Parse markdown

const { State, MarkdownParser } = require('markup-it');

const state = State.create(MarkdownParser);
const document = state.deserializeToDocument('Hello **World**');

Render document to HTML

const { State, HTMLParser } = require('markup-it');

const state = State.create(HTMLParser);
const str = state.serializeDocument(document);

Render document to Markdown

const { State, MarkdownParser } = require('markup-it');

const state = State.create(markdown);
const str = state.serializeDocument(document);

ES6

markup-it is ESM compliant through the package.json module field, so you can safely use it with ES6 syntax for tree-shaking.

import { State, HTMLParser } from 'markup-it';

const state = State.create(HTMLParser);
const str = state.serializeDocument(document);

Testing

There are many scripts available in the /bin folder to output an HTML or Markdown file to multiple formats (HTML, Hyperscript, JSON, Markdown, YAML).

These scripts can be called with babel-node, for example:

babel-node bin/toJSON.js ./page.md

markup-it's People

Contributors

Stargazers

Watchers

Forkers

maxott gugl todvora okonet antifragileer digideskio jasonic ziqingliang ffriedl89 dpgardne leme7 bazzargh blacksmpig asergeev-sc forging2012 praveenmunagapati warlock nolrinale wagon1104 banb4n jbeurel saravanan10393 kingmod neotim pftom sonichn productinfo architectum carlosdelfino book-writing forkkit jiayisheji cduzs1982 ensky dramalcolm temalover bitmaskit-forks noahtren zthomas ja1984 nonamenoslogan fbatroni web-dev-collaborative denhartog j0pgrm kkpan11

markup-it's Issues

Error with HTML in markdown

For markdown content:

<p>test mention <a href="http://localhost:5000/@SamyPesse">@SamyPesse</a> ddd</p>

We got errors:

Newer npm version with latest fixes

I'm sorry to bother you with this, @SamyPesse, but I noticed that there are some fixes (especially e9b1f25 and df30290) that weren't included in the latest packaged version on NPM.

Are you planing to release a new version to NPM soon?

rtrim and ltrim are fixed version

seems it causes problem when running webpack with node version later than v7.0.
Is is possible to update the versions?

Ensure that document contains at least one paragraph

Document should never be deserialize with an empty list of nodes.

nested lists are not rendered correctly

* One
  * sub-one
  * sub-two
* Two
* Three

On GitBook, this is rendered as:

<ul>
  <li>
    <p>One</p>
    <ul>
      <li>sub-one</li>
      <li>sub-two</li>
    </ul>
  </li>
  <li><p>Two</p></li>
  <li>Three</li>
</ul>

But on GitHub and CommonMark, it's rendered as:

<ul>
  <li>
    One
    <ul>
      <li>sub-one</li>
      <li>sub-two</li>
    </ul>
  </li>
  <li>Two</li>
  <li>Three</li>
</ul>

I prefer the version without paragraphs, otherwise it adds unnecessary margins between some of the list items. For now, I've added the following CSS rule to my GitBook, but it would be great if this could be fixed 😃

.markdown-section li > p {
  margin-bottom: 0 !important;
}

Checkbox parsing

See GitbookIO/community#116

Support for tables

Tables should be parsed as blocks with entities for rows and columns.

draft-markup NPM module still pointed here

It seems that you're working on migrating the React specific portions of this repo out and into https://github.com/GitbookIO/react-markup-editor -- but https://www.npmjs.com/package/draft-markup is still pointing to this Git repo. I think the NPM module needs to be deprecated or include a notice that it's no longer under development as-is on that page, NPM has good SEO and this just sent me down a weird rabbithole of "Why isn't it working?!"

Images Editing broken

When clicking on an image and dismissing the dialog, the image will be removed. The undo does not recover the operation.

Cannot install via npm

Thank you for your awesome works

I've got this message when install 1.0.0-pre via npm

ENOENT: no such file or directory, chmod '/***/node_modules/markup-it/bin/markup-toJSON.js'

Just install direct from github & it's work fine

Support for HTML (inline and block)

Since unstyled text is by default escaped (see #1), the user will not be able to write HTMl directly in his text.

HTMl should be parsed (using right rules), then displayed with a yellow background.

The editor can contains toolbar action to insert HTML.

Feature request: allow specfying image dimensions

Some markdown engines support syntax like below to specify image dimensions, but it doesn't seem to work in markup-it

![](./pic/pic1_50.png =100x20)

This is important for SVG files since there's no dimension information in the file, and also for PNG file that you want to look nice on Retina displays (logical size must be 50% of the pixel size)

Parenthesis in links or images

markup-it fails to parse links with an href containing parenthesis.

For example with the following markdown:

- [Test](hello(world).md)

markdown-it and GitHub correctly parses it as a link with href: hello(world).md. But markup-it fails.

When stringifying to Markdown, we should maybe also escape the parenthesis in links.

Related: GitbookIO/community#386

Serialize of hr should prefix with newline if first node.

When the document starts with an HR, the serialization should prefix it with a newline to avoid conflicts with frontmatter.

Support parsing of definition lists

We should maybe support a syntax for definition lists (already supported by Kramdown and Pandoc):

kramdown
: A Markdown-superset converter

But:

❌ GitHub doesn't support it.
❌ Hard to implement in the editor
❌ It's possible to use HTML in markdown

Utilities to manipulate tables

draft-js has RichUtils, we should provide utilities to manipulate tables:

// Create a new table
DraftMarkup.Table.create(ContentState, columns, rows) -> ContentState

// Add rows to a table
DraftMarkup.Table.addRows(ContentState, blockKey, n || 1) -> ContentState

// Add columns to a table
DraftMarkup.Table.addColumns(ContentState, blockKey, n || 1) -> ContentState

Trailing space + line break in links

Concerns the Markdown to HTML test for fixture links_reference_style.md

Here's another where the [link 
breaks] across lines, but with a line-ending space.

After the word link there is a trailing space and a line break. The fixture says the trailing space should not be kept in the HTML output:

<p>Here&apos;s another where the <a href="/url/">link
breaks</a> across lines, but with a line-ending space.</p>

Instead, there is still a trailing space.

[BUG] parse footnode error

eg:

[^1]: aaa\n\n[^2]: bbb

Fails to detect inside of HTML tag

If the inner text of an HTML tag can be found in the tag attributes itself, then we fail to parse it properly:

<a href="mylink">mylink</a>

yields:

      -          {
      -            "data": {
      -              "html": "<a href=\""
      -            }
      -            "isVoid": true
      -            "kind": "inline"
      -            "nodes": [
      -              {
      -                "kind": "text"
      -                "ranges": [
      -                  {
      -                    "kind": "range"
      -                    "marks": []
      -                    "text": " "
      -                  }
      -                ]
      -              }
      -            ]
      -            "type": "html"
      -          }
      -          {
      -            "kind": "text"
      -            "ranges": [
      -              {
      -                "kind": "range"
      -                "marks": []
      -                "text": "mylink"
      -              }
      -            ]
      -          }

Text without style should be processed as style of type "unstyled"

Currently, inlineStyleRanges does not contain range for unstyled text, it's causing the markdown syntax to be escaped correctly.

The workflow for applying inlineStyleRanges should be:

Linearize ranges
Fill empty ranges (text without ranges) with { offset: X, length: N, style: 'unstyled' }

Since the method is called recursively, text inside bold/italic/... will be correctly escaped when required.

Underline formatting is treated as Bold when serialising from document to markdown

Serialising document with underline formatting is treated as bold. In my use case i want underline formatting translated to respective markdown symbol.

<img> tag with 'alt' parameter doesn't parse correctly

this works:

<img src="images/foo.png" width="111" style="margin: 0 auto; display: block">

this doesn't:

<img src="images/foo.png" alt="foo" width="111" style="margin: 0 auto; display: block">

error is TypeError: Cannot read property 'skip' of undefined

Support for loose lists

Markdown supports two types of list: loose and normal.

Loose list have items separated by new lines, and output paragraphs in lists.

Reference: http://spec.commonmark.org/0.24/#list

Questions

How should we distinguish these lists?
- Using different token types?
  - It could a problem in draft
- By parsing inner content as paragraph (for loose) or unstyled (for normal)

Add support for single quotes in links URL

Right now, the following Markdown:

This is an [example](https://example.com/link's_example).

Is parsed to:

This is an [example](<link href="https://example.com/link's_example">https://example.com/link's_example</link>)

Instead of:

This is an <link href="https://example.com/link's_example">example</link>.

GitHub supports it, as well as other Markdown editors I played with.

Tabs in code blocks should not be replaced with spaces

Concerns the Markdown to HTML test for fixture markdown_documentation_syntax.md

Source:

Markdown provides backslash escapes for the following characters:

    \   backslash
    `   backtick
    *   asterisk
    _   underscore
    {}  curly braces
    []  square brackets
    ()  parentheses
    #   hash mark
    +   plus sign
    -   minus sign (hyphen)
    .   dot
    !   exclamation mark

Corresponding HTML file:

<pre><code>\   backslash
`   backtick
*   asterisk
_   underscore
{}  curly braces
[]  square brackets
()  parentheses
#   hash mark
+   plus sign
-   minus sign (hyphen)
.   dot
!   exclamation mark
</code></pre>

The tab \t between + and plus are expected to be replaced with spaces. We should look at other parsers' behaviors, and probably modifiy the test to expect the tabs to be left untouched.

Link parsing error

Failure to parse https://raw.githubusercontent.com/mozilla-neutrino/neutrino-dev/master/README.md

TypeError: Cannot read property 'get' of undefined
    at Object.resolveRef (/user_code/node_modules/markup-it/lib/markdown/utils.js:119:20)
    at /user_code/node_modules/markup-it/lib/markdown/inlines/link.js:131:22
    at /user_code/node_modules/markup-it/lib/models/deserializer.js:50:24
    at Deserializer.<anonymous> (/user_code/node_modules/markup-it/lib/models/rule-function.js:62:28)
    at Deserializer.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:154:25)
    at Function.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:171:57)
    at /user_code/node_modules/markup-it/lib/models/rule-function.js:100:45
    at Array.some (native)
    at /user_code/node_modules/markup-it/lib/models/rule-function.js:99:30
    at Deserializer.<anonymous> (/user_code/node_modules/markup-it/lib/models/rule-function.js:62:28)
    at Deserializer.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:154:25)
    at Function.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:171:57)
    at /user_code/node_modules/markup-it/lib/models/state.js:370:41
    at List.__iterate (/user_code/node_modules/immutable/dist/immutable.js:2206:13)
    at List.forEach (/user_code/node_modules/immutable/dist/immutable.js:4381:19)
    at State.applyRules (/user_code/node_modules/markup-it/lib/models/state.js:369:16)
    at State.lex (/user_code/node_modules/markup-it/lib/models/state.js:329:39)
    at State.deserialize (/user_code/node_modules/markup-it/lib/models/state.js:388:74)
    at /user_code/node_modules/markup-it/lib/markdown/blocks/paragraph.js:37:37
    at /user_code/node_modules/markup-it/lib/models/deserializer.js:50:24
    at Deserializer.<anonymous> (/user_code/node_modules/markup-it/lib/models/rule-function.js:62:28)
    at Deserializer.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:154:25)
    at Function.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:171:57)
    at /user_code/node_modules/markup-it/lib/models/state.js:370:41
    at List.__iterate (/user_code/node_modules/immutable/dist/immutable.js:2206:13)
    at List.forEach (/user_code/node_modules/immutable/dist/immutable.js:4381:19)
    at State.applyRules (/user_code/node_modules/markup-it/lib/models/state.js:369:16)

Newlines in paragraph are not converted to spaces

Hello
World

without two trailing spaces, will generate a break:

Hello
World

More footnote support

Add title on footnote ref

Setting title has strong advantage. Modern browser present a tooltip for the footnote (https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/title). This provides more useful for site seeing.

However, there are problems for coding because one-to-one correspondence between markdown footnote reference and html tag is not possible.

ref: https://github.com/GitbookIO/markup-it/blob/master/syntaxes/markdown/inline.js#L12

Remove sup tag for footnote id on reffn

Now, footnote id are rendering to '' + refname + '. '. However, normal rendered text is very strange, and if we try to fix appearance, needs special support for theme.

We should use ol or '' + refname + '.'.

ref:

markup-it/syntaxes/html/blocks.js

Line 36 in b610171

MarkupIt.Rule(MarkupIt.BLOCKS.FOOTNOTE)

Support for named markdown images

See https://github.com/GitbookIO/gitbook/issues/1447

Support _emphasis_followed by text and not whitespace

The following markdown:

Support _emphasis_followed by text and not whitespace

Should be treated as:

Support <em>emphasis</em>followed by text and not whitespace

Which seems to be a standard: https://daringfireball.net/projects/markdown/syntax#em

Incorrect parsing of inline maths

$$R_+$$ as the interval $$[c, r_+]$$ where $$r_+$$ is a pt to the right of c such that Pr$$[c, r_+]$$ is $$\epsilon$$

Support image descriptions

From GitbookIO/gitbook-markdown#12

Gitbook 2.4.3 renders such image descriptions as:

http://mygitbook.example.org/test/emergencies/example.png%20%22title%20

The CommonMark spec about this is : http://spec.commonmark.org/0.24/#image-description

Parsing of html discard some whitespaces before links

When parsing the following HTML, everything works fine except some whitespaces discarded:

<meta charset='utf-8'><h1 style="box-sizing: border-box; font-size: 2em; margin-top: 0px !important; margin-right: 0px; margin-bottom: 16px; margin-left: 0px; font-weight: 600; line-height: 1.25; padding-bottom: 0.3em; border-bottom: 1px solid rgb(238, 238, 238); color: rgb(51, 51, 51); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;">GitBook Editor</h1><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(51, 51, 51); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><a href="https://travis-ci.com/GitbookIO/editor" style="box-sizing: border-box; background-color: transparent; color: rgb(64, 120, 192); text-decoration: none;"><img src="https://camo.githubusercontent.com/6d1da44ed119a2923b10f9208a38a9a7374f85a0…656e3d4c6759536e556654314b3878727a55686b736f79266272616e63683d6d6173746572" alt="Build Status" data-canonical-src="https://travis-ci.com/GitbookIO/editor.svg?token=LgYSnUfT1K8xrzUhksoy&amp;branch=master" style="box-sizing: content-box; border-style: none; max-width: 100%; background-color: rgb(255, 255, 255);"></a><span class="Apple-converted-space"> </span><a href="https://ci.appveyor.com/project/GitBook/editor" style="box-sizing: border-box; background-color: transparent; color: rgb(64, 120, 192); text-decoration: none;"><img src="https://camo.githubusercontent.com/59a0ca6af837955bd21b9ce940f3afd24d212fb3…656374732f7374617475732f38736b78636462716263736a6a7768333f7376673d74727565" alt="Build status" data-canonical-src="https://ci.appveyor.com/api/projects/status/8skxcdbqbcsjjwh3?svg=true" style="box-sizing: content-box; border-style: none; max-width: 100%; background-color: rgb(255, 255, 255);"></a></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(51, 51, 51); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;">This repository contain the source code for both the webeditor and the desktop version. The editor is built using only web technologies.</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(51, 51, 51); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;">The editor is built using<span class="Apple-converted-space"> </span><a href="http://facebook.github.io/react/" style="box-sizing: border-box; background-color: transparent; color: rgb(64, 120, 192); text-decoration: none;">React</a>. The desktop version is built and packaged using<span class="Apple-converted-space"> </span><a href="http://electron.atom.io/" style="box-sizing: border-box; background-color: transparent; color: rgb(64, 120, 192); text-decoration: none;">Electron</a>.</p>

Math is not rendered in HTML

Given the contents of foo.md:

The expression $$x = y$$ is boring.

When I run this command:

$ markup-toHTML foo.md

I get this output:

<p>The expression  is boring.</p>

The math has been removed.

I expect to get this:

<p>The expression $$x = y$$ is boring.</p>

Or perhaps this:

<p>The expression <math>x = y</math> is boring.</p>

(Preferably the former — keeping $$ — so that I can then pass the output to something like KaTeX.)

I get the same behaviour when using the API:

const { State } = require("markup-it");
const markdown = require("markup-it/lib/markdown");
const html = require("markup-it/lib/html");

const text = "The expression $$x = y$$ is boring.";

const mdState = State.create(markdown);
const document = mdState.deserializeToDocument(text);
const htmlState = State.create(html);
// console prints "<p>The expression  is boring.</p>"
console.log(htmlState.serializeDocument(document));

I've tried following through the logic of the conversion, but I haven't found anything obvious. My only clue is that the math element is represented as a node with isVoid: true. It might be that void nodes aren't converted to HTML, but I haven't found the line that would drop such nodes.

Adding lines in code blocks breaks layout

When hitting Enter to insert a new line inside an code block (fence) , the new line is not inside the current element.

See image:

numbered bullets incorrectly rendering as 1, I, or i

Hi, I'm noticing rendering issues with the new render engine. Here's one that's pretty significant for our tutorials.

Markdown used

Gitbook 3.1.1

Screenshot using 3.1.1. This looks correct to us.

Gitbook 3.2.0

Screenshot using 3.2.0. Notice instead of 1 2 3, it's 1, I, i

Text.getRanges method is deprecated in slatejs

console will throw an error : Uncaught TypeError: text.getRanges is not a function

The Text.getRanges() method is now Text.getLeaves(), please update the method.

Code in tables

GitbookIO/community#285

Parse style attribute during HTML deserialization

The HTML deserialization currently uses tags and class names to detect nodes and marks.

It could also use style="" attribute to detect marks.

For example: This is bold should be parsed as a text with a BOLD mark.

It will improve parsing when copying content from Word into the GitBook Editor.

Backticks followed by text with no spaces is not parsed

Related to GitbookIO/community#253

Import/Export for ProseMirror format

Markup-It can be used to generate/import JSON format:

{
  "type": "doc",
  "content": [
    {
      "type": "heading",
      "attrs": {
        "level": 2
      },
      "content": [
        {
          "type": "text",
          "text": "Hello World!"
        }
      ]
    },
    {
      "type": "paragraph",
      "content": [
        {
          "type": "text",
          "text": "This is "
        },
        {
          "type": "text",
          "marks": [
            {
              "_": "em"
            },
            {
              "_": "strong"
            }
          ],
          "text": "an"
        },
        {
          "type": "text",
          "text": " editor."
        }
      ]
    },
    {
      "type": "horizontal_rule"
    },
    {
      "type": "paragraph",
      "content": [
        {
          "type": "image",
          "attrs": {
            "src": "http://prosemirror.net/img/logo.png",
            "alt": "",
            "title": ""
          }
        },
        {
          "type": "text",
          "text": " dd"
        }
      ]
    }
  ]
}

Line separator should not be "/n" (sic)

markup-it/src/html/parse.js

Line 303 in ad90bfb

sep = sep || detectNewLine(text) || '/n';

^ found this reading the code. It's falling back on using "/n" not "\n" as the line break if detection fails, which is nonsensical. detect-newline already contains code to cover this - detectNewline.graceful(x) will return "\n" for this case.
https://github.com/sindresorhus/detect-newline/blob/master/index.js#L23

"align" attribute of table is an array instead of a list

Related: GitbookIO/react-rich-diff#4

It'll be a breaking change, and we'll have to update the slate-edit-table plugin and the rendering of tables in the editor

Use real block entity

In draft-js, currently entities are "inlined". But as soon as facebookarchive/draft-js#157 is merged, we can support block entities for:

code blocks
tables
footnotes

Escaped parenthesis in output are not supported by gitbook 3

All fine in the Editor.

Not fine in GitBook v3

We should move to URI encoding as the preferred way for links.

Avoid requirements for a "text" rule

The text rule should be avoided,since if the regex is not complete, it could lead to an infinite loop.

The parser could test character by character if a rule can be applied on the next one.

We can also do both:

text rule with regexp to go fast when possible
But always move from a minimum of one character and append text style if needed

UL and OL in series are parsed as one list

Basically markup-it, marked and kramed are failing to parse the following snippet as GitHub:

1. First point.
2. Second point.
3. Third point.

- Bullet point.
- Another bullet point.

markup-it parses it as:

1. First point.
2. Second point.
3. Third point.
4. Bullet point.
5. Another bullet point.

Related: GitbookIO/community#272