Coder Social home page Coder Social logo

markup-it's Introduction

markup-it

Build Status NPM version

markup-it is a JavaScript library to serialize/deserialize markdown content using an intermediate format backed by an immutable model.

Installation

$ npm i markup-it --save

or

$ yarn add markup-it

Usage

Parse markdown

const { State, MarkdownParser } = require('markup-it');

const state = State.create(MarkdownParser);
const document = state.deserializeToDocument('Hello **World**');

Render document to HTML

const { State, HTMLParser } = require('markup-it');

const state = State.create(HTMLParser);
const str = state.serializeDocument(document);

Render document to Markdown

const { State, MarkdownParser } = require('markup-it');

const state = State.create(markdown);
const str = state.serializeDocument(document);

ES6

markup-it is ESM compliant through the package.json module field, so you can safely use it with ES6 syntax for tree-shaking.

import { State, HTMLParser } from 'markup-it';

const state = State.create(HTMLParser);
const str = state.serializeDocument(document);

Testing

There are many scripts available in the /bin folder to output an HTML or Markdown file to multiple formats (HTML, Hyperscript, JSON, Markdown, YAML).

These scripts can be called with babel-node, for example:

babel-node bin/toJSON.js ./page.md

markup-it's People

Contributors

aarono avatar bazzargh avatar gaeldestrem avatar halftheopposite avatar jpreynat avatar samypesse avatar soreine avatar todvora avatar zhouzi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

markup-it's Issues

Error with HTML in markdown

For markdown content:

<p>test mention <a href="http://localhost:5000/@SamyPesse">@SamyPesse</a> ddd</p>

We got errors:

screen shot 2017-01-24 at 10 40 58

nested lists are not rendered correctly

* One
  * sub-one
  * sub-two
* Two
* Three

On GitBook, this is rendered as:

<ul>
  <li>
    <p>One</p>
    <ul>
      <li>sub-one</li>
      <li>sub-two</li>
    </ul>
  </li>
  <li><p>Two</p></li>
  <li>Three</li>
</ul>

But on GitHub and CommonMark, it's rendered as:

<ul>
  <li>
    One
    <ul>
      <li>sub-one</li>
      <li>sub-two</li>
    </ul>
  </li>
  <li>Two</li>
  <li>Three</li>
</ul>

I prefer the version without paragraphs, otherwise it adds unnecessary margins between some of the list items. For now, I've added the following CSS rule to my GitBook, but it would be great if this could be fixed 😃

.markdown-section li > p {
  margin-bottom: 0 !important;
}

Support for tables

Tables should be parsed as blocks with entities for rows and columns.

Images Editing broken

When clicking on an image and dismissing the dialog, the image will be removed. The undo does not recover the operation.

Cannot install via npm

Thank you for your awesome works

I've got this message when install 1.0.0-pre via npm

ENOENT: no such file or directory, chmod '/***/node_modules/markup-it/bin/markup-toJSON.js'

Just install direct from github & it's work fine

Support for HTML (inline and block)

Since unstyled text is by default escaped (see #1), the user will not be able to write HTMl directly in his text.

HTMl should be parsed (using right rules), then displayed with a yellow background.

The editor can contains toolbar action to insert HTML.

Feature request: allow specfying image dimensions

Some markdown engines support syntax like below to specify image dimensions, but it doesn't seem to work in markup-it

![](./pic/pic1_50.png =100x20)

This is important for SVG files since there's no dimension information in the file, and also for PNG file that you want to look nice on Retina displays (logical size must be 50% of the pixel size)

Parenthesis in links or images

markup-it fails to parse links with an href containing parenthesis.

For example with the following markdown:

- [Test](hello(world).md)

markdown-it and GitHub correctly parses it as a link with href: hello(world).md. But markup-it fails.

When stringifying to Markdown, we should maybe also escape the parenthesis in links.

Related: GitbookIO/community#386

Support parsing of definition lists

We should maybe support a syntax for definition lists (already supported by Kramdown and Pandoc):

kramdown
: A Markdown-superset converter

But:

  • ❌ GitHub doesn't support it.
  • ❌ Hard to implement in the editor
  • ❌ It's possible to use HTML in markdown

Utilities to manipulate tables

draft-js has RichUtils, we should provide utilities to manipulate tables:

// Create a new table
DraftMarkup.Table.create(ContentState, columns, rows) -> ContentState

// Add rows to a table
DraftMarkup.Table.addRows(ContentState, blockKey, n || 1) -> ContentState

// Add columns to a table
DraftMarkup.Table.addColumns(ContentState, blockKey, n || 1) -> ContentState

Trailing space + line break in links

Concerns the Markdown to HTML test for fixture links_reference_style.md

Here's another where the [link 
breaks] across lines, but with a line-ending space.

After the word link there is a trailing space and a line break. The fixture says the trailing space should not be kept in the HTML output:

<p>Here&apos;s another where the <a href="/url/">link
breaks</a> across lines, but with a line-ending space.</p>

Instead, there is still a trailing space.

Fails to detect inside of HTML tag

If the inner text of an HTML tag can be found in the tag attributes itself, then we fail to parse it properly:

<a href="mylink">mylink</a>

yields:

      -          {
      -            "data": {
      -              "html": "<a href=\""
      -            }
      -            "isVoid": true
      -            "kind": "inline"
      -            "nodes": [
      -              {
      -                "kind": "text"
      -                "ranges": [
      -                  {
      -                    "kind": "range"
      -                    "marks": []
      -                    "text": " "
      -                  }
      -                ]
      -              }
      -            ]
      -            "type": "html"
      -          }
      -          {
      -            "kind": "text"
      -            "ranges": [
      -              {
      -                "kind": "range"
      -                "marks": []
      -                "text": "mylink"
      -              }
      -            ]
      -          }

Text without style should be processed as style of type "unstyled"

Currently, inlineStyleRanges does not contain range for unstyled text, it's causing the markdown syntax to be escaped correctly.

The workflow for applying inlineStyleRanges should be:

  • Linearize ranges
  • Fill empty ranges (text without ranges) with { offset: X, length: N, style: 'unstyled' }

Since the method is called recursively, text inside bold/italic/... will be correctly escaped when required.

<img> tag with 'alt' parameter doesn't parse correctly

this works:

<img src="images/foo.png" width="111" style="margin: 0 auto; display: block">

this doesn't:

<img src="images/foo.png" alt="foo" width="111" style="margin: 0 auto; display: block">

error is TypeError: Cannot read property 'skip' of undefined

Support for loose lists

Markdown supports two types of list: loose and normal.

Loose list have items separated by new lines, and output paragraphs in lists.

Reference: http://spec.commonmark.org/0.24/#list

Questions
  • How should we distinguish these lists?
    • Using different token types?
      • It could a problem in draft
    • By parsing inner content as paragraph (for loose) or unstyled (for normal)

Add support for single quotes in links URL

Right now, the following Markdown:

This is an [example](https://example.com/link's_example).

Is parsed to:

This is an [example](<link href="https://example.com/link's_example">https://example.com/link's_example</link>)

Instead of:

This is an <link href="https://example.com/link's_example">example</link>.

GitHub supports it, as well as other Markdown editors I played with.

Tabs in code blocks should not be replaced with spaces

Concerns the Markdown to HTML test for fixture markdown_documentation_syntax.md

Source:

Markdown provides backslash escapes for the following characters:

    \   backslash
    `   backtick
    *   asterisk
    _   underscore
    {}  curly braces
    []  square brackets
    ()  parentheses
    #   hash mark
    +   plus sign
    -   minus sign (hyphen)
    .   dot
    !   exclamation mark

Corresponding HTML file:

<pre><code>\   backslash
`   backtick
*   asterisk
_   underscore
{}  curly braces
[]  square brackets
()  parentheses
#   hash mark
+   plus sign
-   minus sign (hyphen)
.   dot
!   exclamation mark
</code></pre>

The tab \t between + and plus are expected to be replaced with spaces. We should look at other parsers' behaviors, and probably modifiy the test to expect the tabs to be left untouched.

Link parsing error

Failure to parse https://raw.githubusercontent.com/mozilla-neutrino/neutrino-dev/master/README.md

TypeError: Cannot read property 'get' of undefined
    at Object.resolveRef (/user_code/node_modules/markup-it/lib/markdown/utils.js:119:20)
    at /user_code/node_modules/markup-it/lib/markdown/inlines/link.js:131:22
    at /user_code/node_modules/markup-it/lib/models/deserializer.js:50:24
    at Deserializer.<anonymous> (/user_code/node_modules/markup-it/lib/models/rule-function.js:62:28)
    at Deserializer.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:154:25)
    at Function.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:171:57)
    at /user_code/node_modules/markup-it/lib/models/rule-function.js:100:45
    at Array.some (native)
    at /user_code/node_modules/markup-it/lib/models/rule-function.js:99:30
    at Deserializer.<anonymous> (/user_code/node_modules/markup-it/lib/models/rule-function.js:62:28)
    at Deserializer.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:154:25)
    at Function.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:171:57)
    at /user_code/node_modules/markup-it/lib/models/state.js:370:41
    at List.__iterate (/user_code/node_modules/immutable/dist/immutable.js:2206:13)
    at List.forEach (/user_code/node_modules/immutable/dist/immutable.js:4381:19)
    at State.applyRules (/user_code/node_modules/markup-it/lib/models/state.js:369:16)
    at State.lex (/user_code/node_modules/markup-it/lib/models/state.js:329:39)
    at State.deserialize (/user_code/node_modules/markup-it/lib/models/state.js:388:74)
    at /user_code/node_modules/markup-it/lib/markdown/blocks/paragraph.js:37:37
    at /user_code/node_modules/markup-it/lib/models/deserializer.js:50:24
    at Deserializer.<anonymous> (/user_code/node_modules/markup-it/lib/models/rule-function.js:62:28)
    at Deserializer.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:154:25)
    at Function.exec (/user_code/node_modules/markup-it/lib/models/rule-function.js:171:57)
    at /user_code/node_modules/markup-it/lib/models/state.js:370:41
    at List.__iterate (/user_code/node_modules/immutable/dist/immutable.js:2206:13)
    at List.forEach (/user_code/node_modules/immutable/dist/immutable.js:4381:19)
    at State.applyRules (/user_code/node_modules/markup-it/lib/models/state.js:369:16)

More footnote support

  • Add title on footnote ref

Setting title has strong advantage. Modern browser present a tooltip for the footnote (https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/title). This provides more useful for site seeing.

However, there are problems for coding because one-to-one correspondence between markdown footnote reference and html tag is not possible.

ref: https://github.com/GitbookIO/markup-it/blob/master/syntaxes/markdown/inline.js#L12

  • Remove sup tag for footnote id on reffn

Now, footnote id are rendering to '<sup>' + refname + '</sup>. '. However, normal rendered text is very strange, and if we try to fix appearance, needs special support for theme.

We should use ol or '<span>' + refname + '.</span>'.

ref:

MarkupIt.Rule(MarkupIt.BLOCKS.FOOTNOTE)

Parsing of html discard some whitespaces before links

When parsing the following HTML, everything works fine except some whitespaces discarded:

<meta charset='utf-8'><h1 style="box-sizing: border-box; font-size: 2em; margin-top: 0px !important; margin-right: 0px; margin-bottom: 16px; margin-left: 0px; font-weight: 600; line-height: 1.25; padding-bottom: 0.3em; border-bottom: 1px solid rgb(238, 238, 238); color: rgb(51, 51, 51); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;">GitBook Editor</h1><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(51, 51, 51); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><a href="https://travis-ci.com/GitbookIO/editor" style="box-sizing: border-box; background-color: transparent; color: rgb(64, 120, 192); text-decoration: none;"><img src="https://camo.githubusercontent.com/6d1da44ed119a2923b10f9208a38a9a7374f85a0…656e3d4c6759536e556654314b3878727a55686b736f79266272616e63683d6d6173746572" alt="Build Status" data-canonical-src="https://travis-ci.com/GitbookIO/editor.svg?token=LgYSnUfT1K8xrzUhksoy&amp;branch=master" style="box-sizing: content-box; border-style: none; max-width: 100%; background-color: rgb(255, 255, 255);"></a><span class="Apple-converted-space"> </span><a href="https://ci.appveyor.com/project/GitBook/editor" style="box-sizing: border-box; background-color: transparent; color: rgb(64, 120, 192); text-decoration: none;"><img src="https://camo.githubusercontent.com/59a0ca6af837955bd21b9ce940f3afd24d212fb3…656374732f7374617475732f38736b78636462716263736a6a7768333f7376673d74727565" alt="Build status" data-canonical-src="https://ci.appveyor.com/api/projects/status/8skxcdbqbcsjjwh3?svg=true" style="box-sizing: content-box; border-style: none; max-width: 100%; background-color: rgb(255, 255, 255);"></a></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(51, 51, 51); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;">This repository contain the source code for both the webeditor and the desktop version. The editor is built using only web technologies.</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(51, 51, 51); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;">The editor is built using<span class="Apple-converted-space"> </span><a href="http://facebook.github.io/react/" style="box-sizing: border-box; background-color: transparent; color: rgb(64, 120, 192); text-decoration: none;">React</a>. The desktop version is built and packaged using<span class="Apple-converted-space"> </span><a href="http://electron.atom.io/" style="box-sizing: border-box; background-color: transparent; color: rgb(64, 120, 192); text-decoration: none;">Electron</a>.</p>

Math is not rendered in HTML

Given the contents of foo.md:

The expression $$x = y$$ is boring.

When I run this command:

$ markup-toHTML foo.md

I get this output:

<p>The expression  is boring.</p>

The math has been removed.

I expect to get this:

<p>The expression $$x = y$$ is boring.</p>

Or perhaps this:

<p>The expression <math>x = y</math> is boring.</p>

(Preferably the former — keeping $$ — so that I can then pass the output to something like KaTeX.)


I get the same behaviour when using the API:

const { State } = require("markup-it");
const markdown = require("markup-it/lib/markdown");
const html = require("markup-it/lib/html");

const text = "The expression $$x = y$$ is boring.";

const mdState = State.create(markdown);
const document = mdState.deserializeToDocument(text);
const htmlState = State.create(html);
// console prints "<p>The expression  is boring.</p>"
console.log(htmlState.serializeDocument(document));

I've tried following through the logic of the conversion, but I haven't found anything obvious. My only clue is that the math element is represented as a node with isVoid: true. It might be that void nodes aren't converted to HTML, but I haven't found the line that would drop such nodes.

Parse style attribute during HTML deserialization

The HTML deserialization currently uses tags and class names to detect nodes and marks.

It could also use style="" attribute to detect marks.

For example: <span style="font-weight: bold;">This is bold</span> should be parsed as a text with a BOLD mark.

It will improve parsing when copying content from Word into the GitBook Editor.

Import/Export for ProseMirror format

Markup-It can be used to generate/import JSON format:

{
  "type": "doc",
  "content": [
    {
      "type": "heading",
      "attrs": {
        "level": 2
      },
      "content": [
        {
          "type": "text",
          "text": "Hello World!"
        }
      ]
    },
    {
      "type": "paragraph",
      "content": [
        {
          "type": "text",
          "text": "This is "
        },
        {
          "type": "text",
          "marks": [
            {
              "_": "em"
            },
            {
              "_": "strong"
            }
          ],
          "text": "an"
        },
        {
          "type": "text",
          "text": " editor."
        }
      ]
    },
    {
      "type": "horizontal_rule"
    },
    {
      "type": "paragraph",
      "content": [
        {
          "type": "image",
          "attrs": {
            "src": "http://prosemirror.net/img/logo.png",
            "alt": "",
            "title": ""
          }
        },
        {
          "type": "text",
          "text": " dd"
        }
      ]
    }
  ]
}

Avoid requirements for a "text" rule

The text rule should be avoided,since if the regex is not complete, it could lead to an infinite loop.

The parser could test character by character if a rule can be applied on the next one.

We can also do both:

  • text rule with regexp to go fast when possible
  • But always move from a minimum of one character and append text style if needed

UL and OL in series are parsed as one list

Basically markup-it, marked and kramed are failing to parse the following snippet as GitHub:

1. First point.
2. Second point.
3. Third point.

- Bullet point.
- Another bullet point.

markup-it parses it as:

1. First point.
2. Second point.
3. Third point.
4. Bullet point.
5. Another bullet point.

Related: GitbookIO/community#272

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.