jgm / djot.js Goto Github PK

View Code? Open in Web Editor NEW

126.0 126.0 12.0 695 KB

JavaScript implementation of djot

License: MIT License

TypeScript 90.71% Makefile 0.51% JavaScript 5.47% Shell 0.01% HTML 1.22% CSS 2.08%

djot.js's People

Contributors

Stargazers

Watchers

Forkers

waldyrious matklad eastack hellux iceghost gemmaro mikekasprzak marrus-sh mayebejames bryanchance

djot.js's Issues

Odd list tightness

I would guess these should be loose, but they are tight:

- a

-

yields

<ul>
<li>
a
</li>
<li>
</li>
</ul>

instead of

<ul>
<li>
<p>a</p>
</li>
<li>
</li>
</ul>

- a

- - b

yields

<ul>
<li>
a
</li>
<li>
<ul>
<li>
b
</li>
</ul>
</li>
</ul>

instead of

<ul>
<li>
<p>a</p>
</li>
<li>
<ul>
<li>
a
</li>
</ul>
</li>
</ul>

I was tempted to open an issue on the djot language repository, because I'm really interested in clarifying the specification, but right now the playground is in a bit of a messy state. I'd rather have a clear spec and align every parser on it, but as the reference implementation I would understand if you wanted to put it in order before enshrining it as the djot spec.

Please consider the following djot source:

# Heading
continued

- [test link][Heading
continued]
- [test link][Heading continued]
- foot[^ref bar]

[ref2
foo]: https://example.com/

[^ref
foo]: some note

Currently the playground rejects the link reference (because of the newline), but the heading creates a link reference embedding a newline, so both text link don't match the heading (and as far as I can tell the heading cannot ever be referenced).

However the footnote accepts a multiline reference, but it seems to match only the first part, so here [^ref bar] does refer to [^ref\nfoo]:, which I don't think is intended.

Single lines starting with `{a="` disappear

{a=" inline text

gives no output instead of

<p>{a=&rdquo; inline text</p>

Issues with ellipses and dashes when converting to pandoc format

Hi,

When converting the djot file with the following content (using djot -t pandoc test.dj)

57--33 oxen---and no sheep...

to pandoc format, I get the output below. The output does not preserve the en-dash, em-dash, and ellipsis in the djot file. Also, the ellipsis is turned into a vertically centered ellipsis. Is this a limitation of the pandoc format or am I doing something wrong?

{
  "pandoc-api-version": [
    1,
    23
  ],
  "meta": {},
  "blocks": [
    {
      "t": "Para",
      "c": [
        {
          "t": "Str",
          "c": "57"
        },
        {
          "t": "Str",
          "c": "-"
        },
        {
          "t": "Str",
          "c": "33"
        },
        {
          "t": "Space"
        },
        {
          "t": "Str",
          "c": "oxen"
        },
        {
          "t": "Str",
          "c": "-"
        },
        {
          "t": "Str",
          "c": "and"
        },
        {
          "t": "Space"
        },
        {
          "t": "Str",
          "c": "no"
        },
        {
          "t": "Space"
        },
        {
          "t": "Str",
          "c": "sheep"
        },
        {
          "t": "Str",
          "c": "⋯"
        }
      ]
    }
  ]
}

I am using version 0.2.3 of djot.

Update pandoc API version emitted

Tried to convert a djot file to another format via Pandoc.
My command was:
djot -f djot -t pandoc mydoc.dj | pandoc -f json -t html -s -o mydoc.html

I got the following error:
JSON parse error: Error in $: Incompatible API versions: encoded with [1,22,2,1] but attempted to decode with [1,23].

I have reinstalled node.js and tried to remove the old djot GitHub project installation I had, but it seems to be broken anyways. Hope you can help me out on this.

Footnote

Looks like a bug related to backlinks snuck into the main branch:

$ yarn test
yarn run v1.22.19
$ jest
 PASS  src/find.spec.ts
 PASS  src/filter.spec.ts
 PASS  src/inline.spec.ts
 PASS  src/block.spec.ts
 PASS  src/ast.spec.ts
 PASS  src/pathological.spec.ts
 PASS  src/html.spec.ts
 PASS  src/attributes.spec.ts
 FAIL  src/functional.spec.ts
  ● test/footnotes.test › line 1

    ReferenceError: structuredClone is not defined

      108 |
      109 |   addBacklink(orignote: Footnote, ident: number): Footnote {
    > 110 |     const note = structuredClone(orignote); // we modify a deep copy
          |                  ^
      111 |     const backlink: Link = {
      112 |       tag: "link",
      113 |       destination: `#fnref${ident}`,

      at HTMLRenderer.addBacklink (src/html.ts:110:18)
      at HTMLRenderer.render (src/html.ts:481:27)
      at renderHTML (src/html.ts:494:19)
      at Object.<anonymous> (src/functional.spec.ts:128:30)

Test Suites: 1 failed, 8 passed, 9 total
Tests:       1 failed, 309 passed, 310 total
Snapshots:   0 total
Time:        3.961 s, estimated 4 s
Ran all test suites.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
$

Edit: after rebuilding from scratch, I get different failures.

multiline link urls no longer trimmed and concatenated

Previously djot would allow me to have long urls split across lines:

[link](
  https://some-really-long-url/
  with-several-parts/
  that-i-want-across-lines)

Before 0.2.4, it would eliminate the whitespace between those URL components:

$ djot test.dj 
<p><a href="https://some-really-long-url/with-several-parts/that-i-want-across-lines">link</a></p>

However, with 0.2.4, it now keeps spaces in the URL:

$ djot test.dj 
<p><a href="  https://some-really-long-url/  with-several-parts/  that-i-want-across-lines">link</a></p>

I didn't notice at first because the browser seems to "do the right thing", but is this intended / guaranteed to always work?

Many thanks for all the great work on djot!!

Increase consistency between tag names and type names

blockqote -> block_quote
hardbreak -> hard_break
softbreak -> soft_break
symbol -> symb

Not sure if this is worth doing, but it would make tag names predictable from the type names.

TypeError on missing footnote

[^a]
[^b]

[^b]:

yields

TypeError: Cannot read properties of undefined (reading 'children')
    at HTMLRenderer.addBacklink (lib/html.js:103:18)
    at HTMLRenderer.renderNotes (lib/html.js:149:31)
    at HTMLRenderer.renderAstNodeDefault (lib/html.js:164:36)
    at HTMLRenderer.renderAstNode (lib/html.js:137:21)
    at HTMLRenderer.render (lib/html.js:422:21)
    at renderHTML (lib/html.js:428:21)
    at Object.<anonymous> (lib/cli.js:199:60)
    at Module._compile (node:internal/modules/cjs/loader:1218:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1272:10)
    at Module.load (node:internal/modules/cjs/loader:1081:32)%

Add API documentation to README.md

Expose overrides as a cli flag

#4 added an ability to customize html rendering. It only added programmatic API, but we should expose this as a CLI as well.

I think a good name for the flag would be --html-template slides.js (might also rename "overrides" to "template" in the API).

Playground: mathjax doesn't seem to load for me

When I try to use math notation it looks like this:

I tried with both Chrome and Firefox on Ubuntu 22.04 and also on my phone.

I can see in the source code that the mathjax script is included but it doesn't seem to do anything. It could be because it's in an iframe with sandboxing?

Clarification around list items

First off, I'd like to appreciate the effort and patience you've put into both common mark and djot 🙏🏽 🙏🏽

In trying to understand the syntax better, I ran across what appears to be an inconsistency in nesting block elements inside list items.

playground link

A bracket (i.e. < ) inside a list item gets rendered literally in html (instead of rendering it as a block quote) 👍🏽
Three colons (i.e. :::) inside a list item gets rendered literally in html (instead of rending it as a <div>) 👍🏽
But three backticks (i.e. ```) inside a list item gets rendered as a code block 🤔

Is this intentional? If so, can you provide some reasoninng behind it (or a link if this has been brought up before)? I'm just trying to get a feel for the rules

Thanks again!

should "a" be represented by dj.substring(0, 1) - or (0,0) ?

If you parse the Djot string "a"
the following events will be produced:

[{ startpos: 0, endpos: 0, annot: "+para" }
,{ startpos: 0, endpos: 1, annot: "str" }
,{ startpos: 2, endpos: 2, annot: "-para" }]

"+para" 'is' the first char of "ab"
(same startpos and endpos as 'a')
"-para" 'is' the char after "ab"
(the char at offset 2) - but this
char does not exist!

Even if this 'works' in an implementation - a more
concise and clearer concept should be considered:

[{ startpos: 0, endpos: 0, annot: "+para" }
,{ startpos: 0, endpos: 2, annot: "str" }
,{ startpos: 2, endpos: 2, annot: "-para" }]

startpos would be 'at' the start of the
first char' (the point before "ab")
and endpos at the start of the char following
the last char that should be included
(the point after "ab")
and "str" would be the chars between this two
points

As far as i know Java, JavaScript, Scala and
many other programming languages use this
concept.

In my opinion it this might be the better
way in the long run.

Frank

Make library available via CDN

I would like to see this library made available from CDN for web users.

TaskListItem: "undefined" ?

Right now the checkbox value of TaskListItem
can be "checked" or "unchecked".

Wouldn't it be nice to have a value for the case
when this checking still has to be done?
("undefined")

(otherwise why not switch to a boolean value:
"checked": true|false ? )

Revise filter API

Perhaps it should work as in pandoc:

If you don't return a value, the node doesn't change.
If you do, the node is replaced by the node you return.
If you return an array of nodes, they are spliced in at the position of the node. (This can be used to delete elements by returning an empty array.)

Add docstrings

When you get a chance, it'd be nice to have tsdoc/jsdoc docstrings in the codebase :-)

Also:

❯ man djot
No manual entry for djot

Get faster

Benchmarks are currently running about 3X slower than for commonmark.js. It would be good to understand why and narrow the gap.

"I don't" gets parsed as smart_punctuation right_single_quote

I have the following content:

I don't xxxxxx

The single quote gets parsed into

tag: "smart_punctuation"
type: "right_single_quote"

is this correct? I expect this to be parsed as a single text block.

Automatic enumeration for roman numeral lists

Both decimal and alphabetic lists can be automatically enumerated, e.g:

0. a
0. b
0. c

a. c
a. b
a. c

yields

<ol>
<li>
a
</li>
<li>
b
</li>
<li>
c
</li>
</ol>
<ol type="a">
<li>
a
</li>
<li>
b
</li>
<li>
c
</li>
</ol>

However, trying to do the same with roman numerals instead yields an alphabetic list, starting at 'i':

i. a
i. b
i. c

<ol start="9" type="a">
<li>
a
</li>
<li>
b
</li>
<li>
c
</li>
</ol>

Currently, one seems to need at least two consecutive numbers to turn it into a roman numeral list:

i. a
ii. b
i. c

<ol type="i">
<li>
a
</li>
<li>
b
</li>
<li>
c
</li>
</ol>

This decreases the value of letting it automatically enumerate.

Would it be better to prioritize the roman over the alphabetic numbering in the previous case? One could for example consider any list that contains only roman digits to be a roman numeral list until the first non-roman digit is encountered. This would also remove the need to parse any numbers during the block parsing stage.

Undefined table header text alignment

Leaving out columns in the separator row causes the text-align value to be "undefined":

|a|b|c|
|-|
|1|2|3|

<table>
<tr>
<th>a</th>
<th style="text-align: undefined;">b</th>
<th style="text-align: undefined;">c</th>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
</tr>
</table>

I would expect it to be simply

<table>
<tr>
<th>a</th>
<th>b</th>
<th>c</th>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
</tr>
</table>

I am assuming this is a null value error.

Benchmarks not working for matklad

Ok, I think I've figured that out. npm run build produces ./dist, but not ./lib. npx tsc is the thing which produces ./lib.

Footnote references within footnotes are ignored

[^a]: [^b]
[^b]: [^a]
[^c]: [^c]

yields empty output. I would expect the footnotes to be visible as there are references to them.

A slightly less silly example:

text[^footnote].

[^footnote]: very long footnote[^another-footnote]
[^another-footnote]: bla bla[^another-footnote]

yields

<p>text<a id="fnref1" href="#fn1" role="doc-noteref"><sup>1</sup></a>.</p>
<section role="doc-endnotes">
<hr>
<ol>
<li id="fn1">
<p>very long footnote<a id="fnref2" href="#fn2" role="doc-noteref"><sup>2</sup></a><a href="#fnref1" role="doc-backlink">↩︎︎</a></p>
</li>
</ol>
</section>

instead of

<p>text<a id="fnref1" href="#fn1" role="doc-noteref"><sup>1</sup></a>.</p>
<section role="doc-endnotes">
<hr>
<ol>
<li id="fn1">
<p>very long footnote<a id="fnref2" href="#fn2" role="doc-noteref"><sup>2</sup></a><a href="#fnref1" role="doc-backlink">↩︎︎</a></p>
</li>
<li id="fn2">
<p>bla bla<a href="#fnref2" role="doc-backlink">↩︎︎</a></p>
</li>
</ol>
</section>

Can not install command line tool

Have tried to install Djot.js several times.

I have to run the install script as sudo to get it to work since the admin user doesn't have the right levels of permisions to the /usr/local/lib/node_modules directory where Djot can be installed.

When I type
djot into the terminal it says: Command not found.
I installed Node.js from their website instead of using Homebrew hoping this would give better results.

playground: fill placeholder with an example djot document

I think it would be nice if there is a djot document to play around with, readme.dj for example, rather than a blank document.

I suggest we fetch the file from GitHub on page load.

Some alternative:

Inline it inside the HTML in the make build process.
Fetch on demand, i.e add some sort of buttons or selects (this could be better to extend to more examples file)

I could try a PR if it is fine.

Task lists not rendering as expected when converting from Djot to HTML

When making a task list in Djot and converting it via djot.js the task list items are not rendered. It turns out that for this to work an standalone HTML-document must be presented.

Steps to reproduce:

Make a task list like.


- [ ] This is not a task well done.
- [ ] - [x] This task is done now.
- [ ] ```
Try to convert via djot like

djot -f djot -t html list.dj > list-converted.html

Open the HTML-document and you will find that the task list items did not render as expected.

This was tested on: Version 0.2.0

Attributes cannot contain consecutive backslashes

a{a="\\\\\\"}

yields

<p><span a="\">a</span></p>

instead of

<p><span a="\\\">a</span></p>

Inline text may contain consecutive backslashes as expected, though:

\\\\\\

<p>\\\</p>

Non-recursive implementation of filter

The recursive call here will cause a stack overflow on deeply nested documents:

djot.js/src/filter.ts

Lines 122 to 124 in e1138ed

    
           node.children.forEach((child : AstNode) => { 
        
             handleAstNode(child, filterpart); 
        
           });

Replace with a non-recursive algorithm.

Shared HTML id in multiple references to the same footnote

Hello,

when using several references to the same footnote, multiple a elements are generated with the same id attribute. Isn't that invalid HTML?

The simplest example I found is foo[^a] bar[^a], which currently generates in the playground:

<p>foo<a id="fnref1" href="#fn1" role="doc-noteref"><sup>1</sup></a> bar<a id="fnref1" href="#fn1" role="doc-noteref"><sup>1</sup></a></p>
<section role="doc-endnotes">
<hr>
<ol>
<li id="fn1">
<p><a href="#fnref1" role="doc-backlink">↩︎︎</a></p>
</li>
</ol>
</section>

I'm not sure what the appropriate solution would be (I'm not even completely sure whether it really is a problem, my HTML knowledge is a bit outdated), but I would have expected one backlink per reference (as is usual in wikipedia).

While there, I noticed that the backlink text is U+21A9, U+FE0E, U+FE0E. My Unicode knowledge is even worse than my HTML knowledge, but is the double variation selector-15 intended or useful?

Deno Third Party Module

I've been experimenting with deno for typescript and importing djot would be simpler if it was submitted as a third party module. The process seems simple enough, with no real cost as far as I can tell.

Would there be interest in doing this?

Caption parsing is buggy

| 1 | 2 |

 ^ cap1
 
 ^ cap2

Ast:

doc
  table
    caption
      str text="cap2"
    caption
      str text="cap1"
    caption
    row head=false
      cell head=false align="default"
        str text="1"
      cell head=false align="default"
        str text="2"

Two problems here:

a dummy extra empty caption
our types require exactly one caption

Why entities in html output?

Probably a small issue, but for the smart punctuation examples, you output as html entities. Why not just the appropriate unicode characters?

Beside the obvious, one can't process the output as XML:

❯ djot /tmp/test.dj | xmllint -
-:9: parser error : Entity 'hellip' not defined
<p>And now &hellip;</p>
                   ^
-:14: parser error : Entity 'ldquo' not defined
<p>&ldquo;Hello,&rdquo; said the spider.</p>
          ^
-:14: parser error : Entity 'rdquo' not defined
<p>&ldquo;Hello,&rdquo; said the spider.</p>

Djot CLI: Can not convert stand-alone documents via Pandoc

If one tries to convert from docx or a stand-alone HTML-document via Pandoc to Djot it will not work at the moment. I presume that Djot can't handle the metadata blocks at the top of the HTML-files or those blocks stored somewhere in the docx container.

The interesting thing here is that Pandoc gladly converts from Djot to stand-alone HTML, but not the other way around.

Rendering from HTML to Djot gives unexpected results

When converting a document from Djot to HTML and later back to Djot again there can be some interesting side effects, like explicit heading identifiers being broken since the HTML render both makes id's for the section-tag and the heading-tag.

Here is how to reproduce:

Save this file as Djot:


{#top}
# Djot test document

Welcome to this test document.

{#second}
Here is the second paragraph.

## Moving to links

We will now show that we can jump to the [top](#top) and to the [second paragraph](#second) with these two links.

The end```

Now convert this document to HTML by running Djot via Pandoc. This is important since we want a stand-alone HTML-version of the document. I did:
djot -f djot hello.dj.txt -t pandoc | pandoc -f json -t html -s -o hello.html

Pandoc will give a warning since there is no title specified when we converted this.

Now try to convert the document from HTML back to Djot again by doing:

pandoc hello.html -f html -t json | djot -f pandoc -t djot > hello2.dj.txt

This shows the following document

{#top}
{#djot-test-document}
# Djot test document

Welcome to this test document.

Here is the second paragraph.

{#Moving-to-links}
{#moving-to-links}
## Moving to links

We will now show that we can jump to the [top](#top) and to the [second
paragraph](#second) with these two links.

The end

The following changed during the convertion:

The explicit heading identifiers was ignored and another identifier was given, and that may break the headings. Example: The original identifier for the Djot test document heading was top, but the djot-test-document was added, and this would ignore the top identifier, and make # Djot test document heading shown as raw Djot and not the h1-heading. @jgm says that the section-html- tag may be in the converted HTML-document and that is exactly right. How can this be fixed?
The link to the second paragraph which was defined by the {#second} attribute was removed from the HTML-document, so the link would not work. Why was this removed?

If I on the other hand convert the original document via Djot to HTML the link to second will work, but then I do not get the stand-alone HTML-version as Pandoc can produce.

Math -> DisplayMath/InlineMath?

Math shadows the built in JS Math.
Consider changing to DisplayMath/InlineMath, parallel to SingleQuoted/DoubleQuoted?
Alternatively, change the latter to Quoted with an additional parameter?
Or, keep it as Math and let people sort out the conflict (if we do that, Symb should go back to Symbol).

Request for djot supporting Latin 1 if encoding isn't UTF-8

I wonder if Djot can support Latin 1 as an alternative text encoding when converting between formats.
Can I do something about this myself via Pandoc, or should this be handled directly through Djot?

Port fuzz tests from djot.lua

Add CHANGELOG

Especially important to track syntax and AST changes since the split from djot.lua.

Consider default CSS?

This is very much a feature-creep issue, but one which is perhaps worth considering.

For djot-as-a-cli-tool, it would be nice if the resulting html file included some minimal css to make output aesthetically nice to look at. That would provide two benefits:

convenience for the users: if you are writing some quick ad-hoc note, it would be very useful if you can djot note.dj and get an .html output which you can immediately present (bonus points if you can use browser's "print" functionality to render that to a beautiful pdf).
brand recognition: asciidoctor's default stylesheet is very recognizable, so, when you see, eg, https://shipilev.net/labs/threadripper-efficiency/, you immediately know it's powered by asciidoctor. "looks nicely by default" would be a bad reason to use djot, but I imagine that would be a very practically effective reason none-the-less.

doing this right obviously requires some non-trivial css skills and good web-design taste
as djot is new, we can assume modern browser, and that makes writing css much more straightforward
we can take a stance that djot is primarily a library/spec, and let someone else to build eyecandy on top. That's reasonable, though I feel that defaults really do matter in this case.

URL doesn't seem to be rendered properly

In my code, I generate a djot AST and pass it to djot.js to render the djot format.

I found that it doesn't render the URLs (for example, {"tag":"url","text":"https://pandoc.org"}) into the djot output <https://pandoc.org>. If I render it into HTML instead, I could see the URLs.

the issue can be reproduced with

let json = String.raw`{"tag":"doc","references":{"this is a title":{"tag":"reference","label":"this is a title","destination":"#this_is_a_title"}},"footnotes":{},"children":[{"tag":"section","children":[{"tag":"para","children":[{"tag":"str","text":"and reference "},{"tag":"url","text":"https://pandoc.org/lua-filters"},{"tag":"str","text":" and inline? "},{"tag":"inline_math","text":"x^n + y^n = z^n"}]}]}]}`;

console.log(djot.renderDjot(JSON.parse(json)))

Heading attributes disappear from hierachical sections

{a=b}
# abc

results in

<section id="abc">
<h1>abc</h1>
</section>

instead of

<section a="b" id="abc">
<h1>abc</h1>
</section>

<section id="abc">
<h1 a="b">abc</h1>
</section>

The only attribute that does not disappear is the id, if provided it overwrites the id on the section as expected.

Headings within block containers are working as expected:

:::
{a=b}
# abc
:::

<div>
<h1 a="b" id="abc">abc</h1>
</div>

Is it possible to publish a new version on npm registry now?

Hello, I found that the block attributes on the title were discarded when using the library. I found a related issue on the issue list and discovered that the problem was fixed on January 31st. However, the version on npm registry was published on January 18th. I don't know if it's possible to publish a new version on npm registry now.

[text]({a=b})

yields

doc
  para
    link destination="{ab}"
      str text="text"

instead of

doc
  para
    link destination="{a=b}"
      str text="text"

	node.children.forEach((child : AstNode) => {
	handleAstNode(child, filterpart);
	});

jgm / djot.js Goto Github PK

djot.js's People

Contributors

Stargazers

Watchers

Forkers

djot.js's Issues

Recommend Projects

Recommend Topics

Recommend Org