jgm / djot Goto Github PK

View Code? Open in Web Editor NEW

1.6K 30.0 43.0 533 KB

A light markup language

Home Page: https://djot.net

License: MIT License

Makefile 3.32% Vim Script 34.77% HTML 61.91%

commonmark lua markdown markup-language pandoc

djot's People

Contributors

Stargazers

Watchers

Forkers

wooorm vassudanagunta tarleb gregorycrane sermetpekin uvtc robjwells soapdog cyberflamego bpj dradetsky tapeinosyne frankfischer matklad jrcribb mna waldyrious hellux krontzo fredericmoulins rhysd triptych marklodato xvhfeng ratmice yingpengsha bannmann dingmingxin peacemedia kianmeng yibit mayebejames gemmaro haitrungle pranabekka bryanchance thevinhluong102 tmke8 gmh5225 sivukhin silky dai

djot's Issues

Syntax documenttation on thematic breaks contridicted by test

The syntax documentation for thematic breaks says:

A line containing four or more * or - characters, and nothing else (except spaces or tabs) is treated is a thematic break (<hr> in HTML).

I interpreted this to mean four characters, excluding spaces.

However the first test in the thematic breaks file contradicts this:

djot/test/thematic_breaks.test

Lines 1 to 11 in 47f9b3b

    
           ``` 
        
           hello 
        
           - - - 
        
           there 
        
           . 
        
           <p>hello</p> 
        
           <hr> 
        
           <p>there</p> 
        
           ```

Is the count supposed to include spaces, or should it only include * and - characters?

README.md says djot is compatible with lua versions 5.1 to 5.4 but "goto" statement was defined in 5.2

The statement in README.md:

The code for djot (excluding the test suite) is standard Lua, compatible with 5.1–5.4, including luajit.

needs correcting to range over 5.2-5.4 as demonstrated by the Debian Linux version 11 lua package installing version 5.1.5 and an error is reported in djot/block.lua:

lua: error loading module 'djot.block' from file './djot/block.lua':
	./djot/block.lua:683: '=' expected near 'finish'
stack traceback:
	[C]: ?
	[C]: in function 'require'
	./djot.lua:1: in main chunk
	[C]: in function 'require'
	bin/main.lua:1: in main chunk
	[C]: ?

A new installation of lua version 5.4.4 behaves as expected.

Consider adding "AST in JSON" output fromat

Problem: I'd love to experiment with djot, but I don't know lua, and would rather use a language I am already familiar with (Rust or TypeScript).

Proposed solution: add djot -a -j to output AST in some JSON format which then could be easily consumed by other programs.

I think I probably can get something like this via pandoc, but I'd rather avoid adding one more tool to the pipeline.

Yeah, the appropriate pandoc spell is pandoc -f djot-reader.lua -t json example.djot. There's another drawback with that approach: today's pandoc json output is a rather low-level encoding of Haskell data structures, it's not something you can just JSON.parse in javascript and get a natural API. I think there's some benefit to defining a more first-class JSON AST encoding for djot.

JS implementation available?

Hi!
I like a lot the way djot language is defined, and I would like to try it out in web projects.

Is there a JS implementation?
If not, i'm happy to collaborate => suggestions welcome.

Thanks!

Provide line/col position in matches

instead of (current) byte offset?

Wikilinks

It would be good to support wikilinks, in the style of obsidian:

[[Page Name|optional description]]

Support for smart numbering when making numbered lists

Is it possible to make support for the following syntax when writing lists:


1. This is the first list item.
#. This is the second item.
#. This is the third list item.

This could make numbering lists very nice, when moving list items around when editing a document.

`make test` fails without running `luarocks test` first

With the proviso that this is the first time I've tried to use Lua(!), I was initially stumped that make test fails on a freshly cloned repository, with both lua and luarocks installed.

Summary

luarocks test must be run at least once to install the test dependencies (currently only luafilesystem), so that make test will run without an error caused by failing to find lfs.

Details

With this reproduction case:

#!/usr/local/bin/zsh
set -x
# Ensure a clean luarocks environment
luarocks remove luafilesystem 
luarocks remove djot 
rm -rf ./djot/ 
# Clone and attempt to run the tests
git clone https://github.com/jgm/djot
cd djot
make test

The tests fail with a missing library (lfs / luafilesystem), with this output:

+./repro.sh:4> luarocks remove luafilesystem

Error: Could not find rock 'luafilesystem' in /Users/robjwells/.asdf/installs/lua/5.4.4/luarocks
+./repro.sh:5> luarocks remove djot

Error: Could not find rock 'djot' in /Users/robjwells/.asdf/installs/lua/5.4.4/luarocks
+./repro.sh:6> rm -rf ./djot/
+./repro.sh:8> git clone https://github.com/jgm/djot
Cloning into 'djot'...
remote: Enumerating objects: 96, done.
remote: Counting objects: 100% (96/96), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 96 (delta 26), reused 77 (delta 13), pack-reused 0
Receiving objects: 100% (96/96), 93.96 KiB | 590.00 KiB/s, done.
Resolving deltas: 100% (26/26), done.
+./repro.sh:9> cd djot
+./repro.sh:10> make test
LUA_PATH='./?.lua' lua test.lua
/Users/robjwells/.asdf/installs/lua/5.4.4/bin/lua: test.lua:2: module 'lfs' not found:
	no field package.preload['lfs']
	no file './lfs.lua'
	no file '/Users/robjwells/.asdf/installs/lua/5.4.4/share/lua/5.4/lfs.lua'
	no file '/Users/robjwells/.asdf/installs/lua/5.4.4/share/lua/5.4/lfs/init.lua'
	no file '/Users/robjwells/.asdf/installs/lua/5.4.4/luarocks/share/lua/5.4/lfs.lua'
	no file '/Users/robjwells/.asdf/installs/lua/5.4.4/luarocks/share/lua/5.4/lfs/init.lua'
	no file '/usr/local/lib/lua/5.4/lfs.so'
	no file '/usr/local/lib/lua/5.4/loadall.so'
	no file './lfs.so'
	no file '/Users/robjwells/.asdf/installs/lua/5.4.4/lib/lua/5.4/lfs.so'
	no file '/Users/robjwells/.asdf/installs/lua/5.4.4/luarocks/lib/lua/5.4/lfs.so'
stack traceback:
	[C]: in function 'require'
	test.lua:2: in main chunk
	[C]: in ?
make: *** [test] Error 1

The Makefile runs the tests directly with lua test.lua, without ensuring the test dependencies are present first.

Running the tests through luarocks test will ensure the test dependencies are installed. luarocks test has a --prepare flag intended to ensure the dependencies are present without running the tests, but --prepare is currently broken.

Metadata

Should there be a built-in format for metadata, or should that be considered distinct from the markup syntax?

If so, what?

Do we need structured keys such as YAML provides? Would be nice to avoid the complexity of YAML, but otherwise YAML is nice for this. Maybe some simplified subset of YAML.

Multiple representations of thematic break contradicts goal 11

The syntax spec says,

A line containing three or more * or - characters, and nothing else (except spaces or tabs) is treated is a thematic break (<hr> in HTML).
Then they went to sleep.

 * * * *

When they woke up, ...

We already enforced canonical representation for headings or code blocks. I don't see any rationale to keep both *** and --- for <hr>.
Allowing arbitrary length >= 3 and spacing inside seem also unnecessarily complicate the syntax, since they require infinite-lookahead in the parser (though still linear), in case of ----------- and -----------text. Since this mark can only appears as an individual block, there's no real need to extend the length to "align with some other texts".

Personally I always use --- (exactly 3 characters) in markdown because it's more like the rendered horizontal line (hr).

Distinguish bare <> from normal ()[] links in the ast

<http://example.com> [My Link](http://example.com)

[
  {
    "children": [
      {
        "tag": "str",
        "text": "http://example.com"
      }
    ],
    "tag": "link",
    "destination": "http://example.com"
  },
  {
    "tag": "str",
    "text": " "
  },
  {
    "children": [
      {
        "tag": "str",
        "text": "My Link"
      }
    ],
    "tag": "link",
    "destination": "http://example.com"
  }
]

The problem here is that the two cases are indistinguishable on the AST layer, but it might be usful. In particular, they might want different word-break css.

Djot-reader.lua filter produces error log

When using the custom reader for Pandoc (djot-reader.lua) it gives the following error:
Error running Lua:
djot-reader.lua:88: attempt to index a nil value (local 'elt')
stack traceback:
djot-reader.lua:110: in function djot-reader.lua:109
(...tail calls...)

I ran the following command to get the error as shown above:
pandoc -f djot-reader.lua -t Markdown -o djot.txt Markdown-result.md
Note that filenames are changed, but the syntax and structure is left as I typed it.

Heading attribute is not set inline for headings

When converting from Markdown to Djot I see that attributes for headings are placed on a blank line before the heading. As I understand the syntax attributes for inline can be at the end of the line like.

# Heading {#top}

Djot gives the following output when I convert the sample above:

{#top}
# Heading

This was found when experimenting with the filter via Pandoc.

Blank lines in comments

{% This is a comment, spanning
multiple lines 

It also cintains a blank line
%}

doc
  para
  para
    str s="multiple lines"
  para
    str s="It also cintains a blank line"
    softbreak
    str s="%}"

It feels like blank lines in comments is a rather reasonable thing to want.

Note that we don't support blanks in general attribute syntax (ie, we allow newlines, but not blank lines) but that I think is reasonable.

Figures

We need a general way of producing a figure with a caption and label.

Pandoc's "implicit figures" are too limiting. Figures can include multiple images, and also non-image content like code.

Captions

Tables can have captions. There should be a way to attach a caption to a pipe table. But captions are more general than that: other things can also have them (code blocks, figures, maybe equations). So perhaps we should have a more generic syntax for attaching a caption? Captions can, in general, contain inline formatting, and perhaps they should be allowed to contain block formatting. (Multi-paragraph captions can be seen.) It would also be nice to provide a way to include a "short caption," which could be used in a list of figures.

Lua error on reference link definition

When running pandoc with djot-reader.lua on this input

[bar][baz]

[baz]: /baz

I get this Lua error

Error running Lua:
.../home/.local/share/pandoc/readers/djot-reader.lua:88: attempt to index a nil value (local 'elt')
stack traceback:
	.../home/.local/share/pandoc/readers/djot-reader.lua:110: in function <.../home/.local/share/pandoc/readers/djot-reader.lua:109>
	(...tail calls...)

which goes away if I comment out the link definition.

pandoc 2.18 on Android with termux

Please ignore if it is because reference links haven't been
implemented yet...

error when parsing empty table cells

"| a |" (without the quotes) produces a one cell table.
If you delete just the letter "a" then
https://djot.net/playground/ hangs and has to be reloaded
(caused by "ast.lua:520: unmatched -row encountered at byte xxxx").

A workaround is "| \ |" (which puts a non breaking space into the cell).
But this looks a little bit clumsy.

Simple macro syntax

Jumping off what was said here, it seems that the bare {foo} attribute syntax could be a simple way for an in markdown macro definition! Just like reference links the foo would just be a label for whatever was defined in the label definition

{foo}: bar

One could even imagine going further and having label functions ie macros, used like {foo(arg1,arg2)}

And then defined as

{foo(arg1,arg2)}: do something with (arg1) and (arg2)

The only thing is that this would be very useful in the math context but of course clashes with latex syntax..

Idea: let `:::` pick its own HTML element

This may be more HTML-centric than what you had in mind for djot, but:

Unless I've missed something, all elements that have some way to attach a class to them require prefixing the class with a period, à la .warning. The div construct, on the other hand, doesn't.

What do you think about repurposing the div to make it a generic any-element wrapper by redefining the word after the three dots as an element name, as in:

::: figure
  ![last night’s dinner: steak and potatoes][IMG_2345.JPEG]
  ::: figcaption
    Don’t worry, I had a giant salad for lunch.
  :::
:::

(The indentation and also nesting of ::: elements may very well be a separate issue. I'm only adding the indentation for clarity.)

Parenthesis make attributes disapper

(some text){.attr}

{
  "tag": "doc",
  "children": [
    {
      "tag": "para",
      "children": [
        {
          "tag": "str",
          "text": "(some text)"
        }
      ]
    }
  ],
  "footnotes": [],
  "references": []
}

There's no trace of .attr in the output, and no warnings.

I feel like something other than silently discarding attributes should happen here, though I am not sure what exactly

Line blocks

Pandoc offers a line block format which is useful for addresses and such things as poetry. In this format, newlines are hard breaks, and all spaces are significant, even leading spaces.

| The limerick packs laughs anatomical
| Into space that is quite economical.
|   But the good ones I've seen
|   So seldom are clean
| And the clean ones so seldom are comical.

Note that the same effect can be achieved with backslash-newline and backslash-space, but it arguably looks less natural.

The limerick packs laughs anatomical\
Into space that is quite economical\
\ \ But the good ones I've seen\
\ \ So seldom are clean\
And the clean ones so seldom are comical.

In addition, the pipes may create confusion with pipe tables.
So, not sure this feature is worth it.

Support entities (anyway, please!)

I know it's said that HTML-style entities are not supported because djot is not to favor any target format, but I wonder if it wouldn't be a good idea to have a mechanism for including characters which are hard to type, and entities is a well-known syntax for that, which I would say is good enough.¹ I can share a Lua table mapping HTML 5 entity names to UTF-8 characters, but supporting only numeric entities would be a reasonable limitation, since djot would only borrow the syntax. Those can be handled very effectively in Lua, e.g.

str:gsub('(%&(%#?%w%w-)%;)', function (entity,id)
    if id:match('^#') then
      local cp = tonumber(id:gsub('^#', '0'))
      if cp and cp >= 0 and cp <= 0x10ffff then
        return char(cp)
      end
    end
    error("Unsupported or invalid entity: " .. entity, 2)
  end
)

where char can be either utf8.char or this:

function char(a)
  local cp = math.floor(assert( tonumber(a), "Expected number but got " .. tostring(type(cp))))
  if cp < 0 or cp > 0x10ffff then
    error("Codepoint is out of range: " .. a)
  end 
  if cp < 128 then
    return string.char(cp)
  end
  local s = ""
  local prefix_max = 32
  while true do
    local suffix = cp % 64
    s = string.char(128 + suffix) .. s
    cp = (cp - suffix) / 64
    if cp < prefix_max then
      return string.char((256 - (2 * prefix_max)) + cp) .. s
    end
    prefix_max = prefix_max / 2
  end
end

I would prefer a paired delimiter. My string interpolation DSL uses @(...) where the parentheses may contain one or more of (1) a decimal code point like 331, (2) a hex codepoint like 0x14b, (3) an entity name like eng, or a Unicode name in angle brackets like <Latin small letter eng> (in the Perl implementation). ↩

div blocks don't allow underscores in class names

[inline]{.a_b_c}

::: a_b_c

:::

I'd expect both a_b_c to parse as a class name. The second one doesn't:

doc
  para
    span class="a_b_c"
      str s="inline"
  para
    str s="::: a"
    emph
      str s="b"
    str s="c"
  div
references = {
}
footnotes = {
}

Suggested syntax for {|underline|}, {!strikeout!} and {.small caps.}

In the announcement thread on pandoc-discuss I suggested to add these syntaxes:

{|underline|} 
{!strikeout!}
{.small caps.}

@jgm asked me to open an issue here.

I would very much appreciate to have a syntax for small caps in particular.

In syntax def doc examples, show html output rendered as well

It would help to quickly visualize and learn what a bit of djot markup does if the examples contained rendered html in addition to the raw html output.

Glossary and accronym

It would be great to have a nice syntax for acronyms (abbr in HTML) and glossary entries. I am not aware of any markdown flavor that handles this.

Maybe this ties into a more robust cross-referencing syntax..

Cross-references and numbering

LaTeX has a flexible system for creating numbering counters, labels, and cross-references. This can be used with headings, tables, figures, equations, even list items. This is a must for serious academic writing, but it's not easy to see how to create a system that is sufficiently flexible but still natural for plain text writing and easy to use.

vim syntax highlighting bug

I don't know why, but this fails:

```
test

ok
```

Only the first part is highlighted as code.

Add playground

I don’t think there’s one yet. Would be very useful!

`tree-sitter` grammar

Hi,

Seeing as the projet already has a syntax definition for vim it would make sense to also have a tree-sitter grammar which could be used by a number of editors (e.g. neovim).
I guess a good starting point would be the markdown grammar, but personally I'm having a hard time understanding how it works.

Looking forward to using djot in the future!

Citations

We need a syntax for citations that can be plugged into citeproc-lua or sent to pandoc for processing.

Pandoc's citation syntax seems a good basis. One thing we might change would be the syntax for author-in-text citations, which is currently a bit tricky to parse, because it requires lookahead.

Perhaps instead of

@foo [p. 15]

we should have something like

[+@foo, p. 15]

em, i, cite

A lot of the time when we use italics it's for emphasis text (), other times it's book title (<cite>) or some weirdo other language quote or Linneaen flower name (in which case we have to use ). The commonmark way to do that is to use raw HTML, but that's more cumbersome in djot, and raw HTML isn't something we wanna leave on for world-readable forums and wikis anyway.

That's why I suggest that djot produces  and  instead of  and . Since the former or hypernyms or superset of the latter, they're never wrong, it's just that a lot of the time the latter are more precise (at the expense of sometimes being completely wrong).

(The other thing I've always wanted to change about Markdown is supporting • for list bullets.)

attributes on bare words are English-biased

hello{.en}
привет{.ru}

doc
  para
    str s="hello" class="en"
    softbreak
    str s="привет"
references = {
}
footnotes = {
}

I think both cases should either apply or not apply the attribute

Link between files and block

Why

I saw you plan to add wiki link syntax (#26), then I was wondering if you plan to expand that wiki syntax to make it able to express accurate link to other djot files, even other djot blocks.

If we want to add a link to a.md, whose content is

# Head 1 {#anchor}

some content.

via regular markdown link, we have to write it as [link to a](a.html#anchor), which assume we want to convert the file to html. Wiki link is a better solution since it doesn't need to specify file extension name. But there are some cons of wiki link:

It can't specify link to an anchor such as a.html#anchor
It can be ambiguous. Djot doesn't require document title must be the same as its file name, therefore two documents can have same title.
It only supports notes in the same level.

An example syntax

Following syntax is a rough proposal, just for expressing what function it can give. You probably need to polish these syntaxs before they come to spec.

[[Tiger]]() means a link to Tiger.djot.
[[Tiger]](../a.djot) means a link to ../a.djot with text Tiger.
[[Tiger]](a.djot#anchor) means a link to an anchor point named anchor in a.djot.

When being converted to html, link to a.djot should be replaced by a.html.

Pros and Cons

If djot have a spec about links between file (even with blocks and inline elements), there will be some pros:

We can have a unified rule to jump between djot notes/documents in editors (vscode, vim/neovim) or other note taking software (although currently there isn't such software based on djot).
It will be easy to keep documents link relationship when publishing djot files as a website or other thing.
Compared to markdown, a spec about link relationship between files avoids division about link syntax in different implementation. Link syntax extensions of markdown in different softwares such as Obsidian, emanote, neuron, zk, zettlr are not consistent.

Possible cons of that syntax is:

It might need conversion software has awareness about file location when processing djot files with link, which may cause some inconvenience. For example, pandoc just knows the Text of inputs instead of (FilePath, Text).

List tables

Tables whose cells contain block-level content (multiple paragraphs, lists, code blocks) can't be represented as pipe tables. For these cases we might want to provide "list tables" as in RST. These could be rerpresented as a list in a div with attributes.

::: table aligns="lc" widths="25 50"
- * one
  * two
- -----
- * three
  * ^
- * five

    multi-paragraph
  * ~

Any content below the list is the caption.
~ means: merge with the cell above.
^ means; merge with cell to left.
:::

`{:lang}` as substitute for `{lang="lang"}`

@jgm,

after having to tag languages in many documents, I think this would be handy:

This is [French]{:fr}.
And this is [ancient Greek]{:grc}.

It is consistent with the syntax for {#id} and {.class}.

Would it be possible to have this handle in djot?

Many thanks for your help and your excellent work.

Lua filter API

Create a Lua filter API, like pandoc's Lua filters API.

Doing this might require changing the way we currently represent the AST in Lua. The current method is designed for speed, but uses some conventions that make it fragile for direct interaction (e.g., annotation as first component of an array). (Perhaps we could use metatables to provide a friendly public interface to the current array-based structure.)

Alternatively, we could leave the AST as it is but provide special functions for manipulating it.

API documentation

The documentation of the use of the library in README.md is a bit sparse.
It would be better to have proper API documentation for the Lua library.

Merged Cells in Tables

One feature many markdown table syntax lacks is merged cells in tables.

Versioning or Roadmap?

First of all - love this. A principled, light-weight markup unsaddled by baggage, and designed by someone with the chops to appreciate the nuances of every decision.

I was wondering how stable you considered the current implementation. How far is this from a 1.0 release? What would it take for this to become fully integrated into Pandoc? Do you hope for this to eventually replace Pandoc Markdown?

Editorial change tracking

I wonder about all the goals of this project. How "deep" it does want to go?

Seeing other issues like #10 I somewhat fear that djot will not be compatible with e.g. CriticMarkup which I myself consider among the fundamental Markdown extensions with large user base.

Do you plan to keep djot compatible with CriticMarkup and/or other extensions from ExtraMark, MultiMarkdown, etc. (see https://gist.github.com/vimtaai/99f8c89e7d3d02a362117284684baa0f )?

Typo in Vim syntax file superscript -> subscript

There is a typo/copy error on line 22 of the Vim syntax file. It says superscript like the line above when it should say subscript. It currently has no visible effect, but will cause problems if someone tries to redefine the highlighting for subscript.

verbatim blocks tildes syntax supported?

I notice that, in addition to triple-backticks, I can get verbatim blocks with tildes as well:

~~~
works for
code blocks
~~~

but it's not mentioned in the syntax reference.

Will djot continue to support the triple-tilde syntax for code blocks?

Incidentally, if triple-tilde delimiters were no longer used for code blocks, maybe they could be used for figures (see #31).

English words/commands in djot markup?

I know that, with Markdown, the rule is to only use punctuation symbols for markup; the prime directive being "optimize for readability". The markup shouldn't look like markup!

Then folks wanted to add more features to markdown, which required no small amount of creativity in finding ways to use the limited set of ascii punctuation as syntax for these features. This entailed lots of discussion, sometimes resulting in epic multi-year threads on the best syntax for them.

On the other end of the spectrum is something like LaTeX or Texinfo, where you just have \command or @command for a given feature and you're done. These markup formats are easy to write, have more features, and I'm guessing are easier to implement/parse. But readability suffers.

I think readability is key to adoption. If people dislike reading a markup format, they won't write it (or, maybe other people will come along and create lightweight markup languages that compile/transpile to the less-readable but more-featureful markup lang).

I'm wondering where djot falls on this spectrum. My impression is that it's "markdown 2.0 ++" --- ambiguous syntax fixed, syntax simplified (or made more consistent) in some places. But with further ambitions...

If ambitions are to be able to write real academic papers and textbooks with djot (sounds good!), then in order to avoid ascii-soup syntax and decades-long discussions of "the right syntax for {feature}", is it acceptable to introduce some generic english command syntax into djot?

My guess is that the sweet spot is: optimize for readability for the most commonly-used markup, and also introduce a generic english command syntax (like, e.g. @this) to help implement the advanced features (while still striving for good readability).

Potential examples of where an @command might be used: issue #28, @caption, #35 @metadata, #32 @cite, #31 @figure.

Alternative implementations

I've spend some time looking at various hypothetical alternative implementations. I didn't do anything practical, but I've learned a bunch, so hopefully this might be useful for someone.

The backstory here is that I'd love to access Djot from Deno, as that seems like the perfect runtime for rendering an extensible lite markup. I've used JS template literals for this in the past, and that's quiet neat, and Deno's security model is also very appropriate for these kinds of converters.

Here's some options:

the main benefit of an alternative impl would be that you'd be able to massage the output programmatically in the language of your choice (if you need just .html, its always possible to shell-out). To achieve that, we need to actually define the AST model, so that alternative impls don't just export whatever internal representation they have, but share the general shape of the API. I think a good AST model is already present in the Lua impl, it needs to be documented in an abstract form in the spec (and we can add a canonical JSON encoding for it #58).
we could (and, long term, absolutely should) provide a native implementation in something like C, Rust or Zig. Given that djot is a small, nicely organized code-base, this shouldn't be much trouble. I see only two potential snags:
- how stable is djot? It would be no fun to chase upstream from Rust. It seems like it should be pretty stable
- Lua implementation relies on Lua's regexes, and bringing in full regex engine for a native impl seems like an overkill. So, either some amount of manual code-uglification is required, or some compilering to implement a proc-macro or some such to transform find!("^[*+-] %[[Xx ]%]%s") into an inline automaton at compile time. Or maybe just bring in regex for the first version and leave a todo :)
if we had a, say C, impl, compiling that to Wasm and exposing to node&deno would be trivial.
Can be just derive a bunch of implementations from a unified grammar? From what I understand how those things actually work in practice, no, not really.
lua and JavaScript seem sufficiently close (eg, both have regexes built-in), so that manual "transpiling" of .lua to .js might make sense? Perhaps long-term Wasm would be strictly better than .js, but in today's world .js can be operationally easier, so why not?
lua is implemented in C, so we can compile Lua itself to Wasm, and then interpret djot in Wasm. That seems like the most horrible, but also the most easy way to get going without rewriting everything. And lua to wasm is how playground works.

Sadly, there's a couple of problem on that path. The fundamental thing is that neither browsers nor deno support just importing a WASM module. Instead, you need to do a dance of getting an Uint8Array from somewhere and than manually instantiate that. The way this typically works is that wasm bytes are fetched from some server, but that's very much not a self-contained library then. This fetching is what the wasmoon, the library used by playground, is doing.

An alternative, more friendly for consumers approach is to embed .wasm as a base64 string directly into the source code example. This I think is what should be done for this approach, but, as far as I can tell, no-one actually done this for luajit so far? This approach is also somewhat not great, in a sense that the loading would block the JS event loop.

So yeah, the next step for this approach would be to re-recreate what wasmoon did with compiling lua with emcc (Emscripten), embed the result (togethre with .lua files for djot) into a .js file, and write the required glue code to specialize wasm runtime to lua interpreter and djot parser!

Consider revealing `:smile:` -> 😀 in the AST

The following djot document:

Hello -- :smile:

produces the following ast:

{
  "footnotes": [],
  "references": [],
  "type": "doc",
  "children": [
    {
      "type": "para",
      "children": [
        {
          "type": "str",
          "text": "Hello "
        },
        {
          "type": "en_dash",
          "text": "--"
        },
        {
          "type": "str",
          "text": " "
        },
        {
          "type": "emoji",
          "text": ":smile:"
        }
      ]
    }
  ]
}

The problem here is that smile="😄", part is implicit -- consumer of such ast would have to replicate djot's emoji table. It would help to add "rendered" emojis to the output, even if that info is in some sense redundant.

Thinking more about this, maybe we don't even need dedicated AST nodes like emoji or en_dash? We can say that they are in fact str nodes, just with a raw attribute:

{
  "footnotes": [],
  "references": [],
  "type": "doc",
  "children": [
    {
      "type": "para",
      "children": [
        {
          "type": "str",
          "text": "Hello "
        },
        {
          "type": "str",
          "text": "–",
          "raw": "--"
        },
        {
          "type": "str",
          "text": " "
        },
        {
          "type": "str",
          "text": "😄",
          "raw": ":smile:"
        }
      ]
    }
  ]
}

There might be some terminological mishappening here. In the literal syntax tree, we certainly have the type: "emoji" syntax node. But what we want from -a -j is probably not as much an AST, as an abstract document model. So, syntactically :smile: is emoji, but semantically it wants to be very close 😄 (eg, substituting :emoji: syntax with their unicode equivaents shouldn't chage the meaning of a djot document).

Suggested principle: Invisible whitespace is never significant

Djot's treatment of hard line breaks violates this principle. In djot, this:

I can write it on the door   \
I can put it on the floor   \
I can do anything that you want me for   \
If you want me to   \

nicely renders as:

I can write it on the door
I can put it on the floor
I can do anything that you want me for
If you want me to

But this:

Do it right, do it wrong    \ 
'Cause a matter of fact, it'll turn out to be strong    \
If you want me to    \

renders as

Do it right, do it wrong 'Cause a matter of fact, it'll turn out to be strong
If you want me to

because there happens to be an accidental trailing space on the first line.

Suggested principle: Invisible whitespace is never significant

I think in a human oriented plain text format, invisible or non-obvious whitespace should have no significance. By "invisible", I mean not obviously present to the human eye. That there is a space between words is obvious, but not how many spaces, or that there are spaces at the end of a line, or the number of spaces at the beginning of a line (when a non-fixed width font is used).

I think djot gets it right in this regard with one exception I've found so far (above). For example, I like that djot doesn't have a magic indent threshold like Markdown's 4 spaces: the transition from 3 to 4 results in dramatically different output, and the transition from 4 to 5 spaces results in subtly but significantly different output. Likewise one doesn't have to count spaces to make sure that successive lines of a list item are treated as such. In fact the interaction between Commonmark's four spaces code block and the space to signify list item continuation results in very unintuitive behavior.

Link parsing bug

 % djot
[Beyond
Markdown](https://johnmacfarlane.net/beyond-markdown.html). (See
[Rationale](#rationale), below.)
^D
<p><a href="https://johnmacfarlane.net/beyond-markdown.html">Beyond
Markdown</a>. (See
<a href="#rationale), below.">Rationale</a></p>

Note that ", below" is parsed as part of the destination.
Oddly this doesn't happen if we trim off the first link above.

New vs continuing paragraph after block quote or other set-off content

Here are two kinds of texts we might want to distinguish:

paragraph content

> block quote

continuation of paragraph

paragraph content

> block quote

new paragraph

A deficiency of Markdown is that there is no way to distinguish these cases. The problem is reduced if one renders in a format that does not indent new paragraphs, because then there is no visual distinction between the cases. But they are semantically different and can be distinguished, e.g., in print output with indented paragraphs. There should be a way to distinguish them in the source.

The problem is not raised only by block quotes but occurs also with set-off equations, images, tables, code, and lists.

I recently found myself creating a pandoc Lua filter that implements the following syntax for the "continued paragraph case":

paragraph content

> block quote

_ continuation of paragraph

(The filter just inserts a LaTeX \noindent command where the _ is.) This is not too bad actually. It would be nice if djot had some way of making the distinction.

jgm / djot Goto Github PK

djot's People

Contributors

Stargazers

Watchers

Forkers

djot's Issues

Summary

Details

Footnotes

Why

An example syntax

Pros and Cons

Suggested principle: Invisible whitespace is never significant

Recommend Projects

Recommend Topics

Recommend Org