jgm / djot Goto Github PK
View Code? Open in Web Editor NEWA light markup language
Home Page: https://djot.net
License: MIT License
A light markup language
Home Page: https://djot.net
License: MIT License
The syntax documentation for thematic breaks says:
A line containing four or more
*
or-
characters, and nothing else (except spaces or tabs) is treated is a thematic break (<hr>
in HTML).
I interpreted this to mean four characters, excluding spaces.
However the first test in the thematic breaks file contradicts this:
djot/test/thematic_breaks.test
Lines 1 to 11 in 47f9b3b
Is the count supposed to include spaces, or should it only include *
and -
characters?
The statement in README.md:
The code for djot (excluding the test suite) is standard Lua, compatible with 5.1–5.4, including luajit.
needs correcting to range over 5.2-5.4 as demonstrated by the Debian Linux version 11 lua package installing version 5.1.5 and an error is reported in djot/block.lua:
lua: error loading module 'djot.block' from file './djot/block.lua':
./djot/block.lua:683: '=' expected near 'finish'
stack traceback:
[C]: ?
[C]: in function 'require'
./djot.lua:1: in main chunk
[C]: in function 'require'
bin/main.lua:1: in main chunk
[C]: ?
A new installation of lua version 5.4.4 behaves as expected.
Problem: I'd love to experiment with djot, but I don't know lua, and would rather use a language I am already familiar with (Rust or TypeScript).
Proposed solution: add djot -a -j
to output AST in some JSON format which then could be easily consumed by other programs.
I think I probably can get something like this via pandoc
, but I'd rather avoid adding one more tool to the pipeline.
Yeah, the appropriate pandoc spell is pandoc -f djot-reader.lua -t json example.djot
. There's another drawback with that approach: today's pandoc json
output is a rather low-level encoding of Haskell data structures, it's not something you can just JSON.parse
in javascript and get a natural API. I think there's some benefit to defining a more first-class JSON AST encoding for djot.
Hi!
I like a lot the way djot language is defined, and I would like to try it out in web projects.
Is there a JS implementation?
If not, i'm happy to collaborate => suggestions welcome.
Thanks!
instead of (current) byte offset?
It would be good to support wikilinks, in the style of obsidian:
[[Page Name|optional description]]
Is it possible to make support for the following syntax when writing lists:
1. This is the first list item.
#. This is the second item.
#. This is the third list item.
This could make numbering lists very nice, when moving list items around when editing a document.
With the proviso that this is the first time I've tried to use Lua(!), I was initially stumped that make test
fails on a freshly cloned repository, with both lua and luarocks installed.
luarocks test
must be run at least once to install the test dependencies (currently only luafilesystem), so that make test
will run without an error caused by failing to find lfs.
With this reproduction case:
#!/usr/local/bin/zsh
set -x
# Ensure a clean luarocks environment
luarocks remove luafilesystem
luarocks remove djot
rm -rf ./djot/
# Clone and attempt to run the tests
git clone https://github.com/jgm/djot
cd djot
make test
The tests fail with a missing library (lfs / luafilesystem), with this output:
+./repro.sh:4> luarocks remove luafilesystem
Error: Could not find rock 'luafilesystem' in /Users/robjwells/.asdf/installs/lua/5.4.4/luarocks
+./repro.sh:5> luarocks remove djot
Error: Could not find rock 'djot' in /Users/robjwells/.asdf/installs/lua/5.4.4/luarocks
+./repro.sh:6> rm -rf ./djot/
+./repro.sh:8> git clone https://github.com/jgm/djot
Cloning into 'djot'...
remote: Enumerating objects: 96, done.
remote: Counting objects: 100% (96/96), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 96 (delta 26), reused 77 (delta 13), pack-reused 0
Receiving objects: 100% (96/96), 93.96 KiB | 590.00 KiB/s, done.
Resolving deltas: 100% (26/26), done.
+./repro.sh:9> cd djot
+./repro.sh:10> make test
LUA_PATH='./?.lua' lua test.lua
/Users/robjwells/.asdf/installs/lua/5.4.4/bin/lua: test.lua:2: module 'lfs' not found:
no field package.preload['lfs']
no file './lfs.lua'
no file '/Users/robjwells/.asdf/installs/lua/5.4.4/share/lua/5.4/lfs.lua'
no file '/Users/robjwells/.asdf/installs/lua/5.4.4/share/lua/5.4/lfs/init.lua'
no file '/Users/robjwells/.asdf/installs/lua/5.4.4/luarocks/share/lua/5.4/lfs.lua'
no file '/Users/robjwells/.asdf/installs/lua/5.4.4/luarocks/share/lua/5.4/lfs/init.lua'
no file '/usr/local/lib/lua/5.4/lfs.so'
no file '/usr/local/lib/lua/5.4/loadall.so'
no file './lfs.so'
no file '/Users/robjwells/.asdf/installs/lua/5.4.4/lib/lua/5.4/lfs.so'
no file '/Users/robjwells/.asdf/installs/lua/5.4.4/luarocks/lib/lua/5.4/lfs.so'
stack traceback:
[C]: in function 'require'
test.lua:2: in main chunk
[C]: in ?
make: *** [test] Error 1
The Makefile runs the tests directly with lua test.lua
, without ensuring the test dependencies are present first.
Running the tests through luarocks test
will ensure the test dependencies are installed. luarocks test
has a --prepare
flag intended to ensure the dependencies are present without running the tests, but --prepare
is currently broken.
Should there be a built-in format for metadata, or should that be considered distinct from the markup syntax?
If so, what?
Do we need structured keys such as YAML provides? Would be nice to avoid the complexity of YAML, but otherwise YAML is nice for this. Maybe some simplified subset of YAML.
The syntax spec says,
A line containing three or more * or - characters, and nothing else (except spaces or tabs) is treated is a thematic break (<hr> in HTML).
Then they went to sleep. * * * * When they woke up, ...
We already enforced canonical representation for headings or code blocks. I don't see any rationale to keep both ***
and ---
for <hr>
.
Allowing arbitrary length >= 3 and spacing inside seem also unnecessarily complicate the syntax, since they require infinite-lookahead in the parser (though still linear), in case of -----------
and -----------text
. Since this mark can only appears as an individual block, there's no real need to extend the length to "align with some other texts".
Personally I always use ---
(exactly 3 characters) in markdown because it's more like the rendered horizontal line (hr).
<http://example.com> [My Link](http://example.com)
[
{
"children": [
{
"tag": "str",
"text": "http://example.com"
}
],
"tag": "link",
"destination": "http://example.com"
},
{
"tag": "str",
"text": " "
},
{
"children": [
{
"tag": "str",
"text": "My Link"
}
],
"tag": "link",
"destination": "http://example.com"
}
]
The problem here is that the two cases are indistinguishable on the AST layer, but it might be usful. In particular, they might want different word-break
css.
When using the custom reader for Pandoc (djot-reader.lua) it gives the following error:
Error running Lua:
djot-reader.lua:88: attempt to index a nil value (local 'elt')
stack traceback:
djot-reader.lua:110: in function djot-reader.lua:109
(...tail calls...)
I ran the following command to get the error as shown above:
pandoc -f djot-reader.lua -t Markdown -o djot.txt Markdown-result.md
Note that filenames are changed, but the syntax and structure is left as I typed it.
When converting from Markdown to Djot I see that attributes for headings are placed on a blank line before the heading. As I understand the syntax attributes for inline can be at the end of the line like.
# Heading {#top}
Djot gives the following output when I convert the sample above:
{#top}
# Heading
This was found when experimenting with the filter via Pandoc.
{% This is a comment, spanning
multiple lines
It also cintains a blank line
%}
doc
para
para
str s="multiple lines"
para
str s="It also cintains a blank line"
softbreak
str s="%}"
It feels like blank lines in comments is a rather reasonable thing to want.
Note that we don't support blanks in general attribute syntax (ie, we allow newlines, but not blank lines) but that I think is reasonable.
We need a general way of producing a figure with a caption and label.
Pandoc's "implicit figures" are too limiting. Figures can include multiple images, and also non-image content like code.
Tables can have captions. There should be a way to attach a caption to a pipe table. But captions are more general than that: other things can also have them (code blocks, figures, maybe equations). So perhaps we should have a more generic syntax for attaching a caption? Captions can, in general, contain inline formatting, and perhaps they should be allowed to contain block formatting. (Multi-paragraph captions can be seen.) It would also be nice to provide a way to include a "short caption," which could be used in a list of figures.
When running pandoc with djot-reader.lua on this input
[bar][baz]
[baz]: /baz
I get this Lua error
Error running Lua:
.../home/.local/share/pandoc/readers/djot-reader.lua:88: attempt to index a nil value (local 'elt')
stack traceback:
.../home/.local/share/pandoc/readers/djot-reader.lua:110: in function <.../home/.local/share/pandoc/readers/djot-reader.lua:109>
(...tail calls...)
which goes away if I comment out the link definition.
pandoc 2.18 on Android with termux
Please ignore if it is because reference links haven't been
implemented yet...
"| a |" (without the quotes) produces a one cell table.
If you delete just the letter "a" then
https://djot.net/playground/ hangs and has to be reloaded
(caused by "ast.lua:520: unmatched -row encountered at byte xxxx").
A workaround is "| \ |" (which puts a non breaking space into the cell).
But this looks a little bit clumsy.
Jumping off what was said here, it seems that the bare {foo} attribute syntax could be a simple way for an in markdown macro definition! Just like reference links the foo would just be a label for whatever was defined in the label definition
{foo}: bar
One could even imagine going further and having label functions ie macros, used like {foo(arg1,arg2)}
And then defined as
{foo(arg1,arg2)}: do something with (arg1) and (arg2)
The only thing is that this would be very useful in the math context but of course clashes with latex syntax..
This may be more HTML-centric than what you had in mind for djot, but:
Unless I've missed something, all elements that have some way to attach a class to them require prefixing the class with a period, à la .warning
. The div construct, on the other hand, doesn't.
What do you think about repurposing the div to make it a generic any-element wrapper by redefining the word after the three dots as an element name, as in:
::: figure
![last night’s dinner: steak and potatoes][IMG_2345.JPEG]
::: figcaption
Don’t worry, I had a giant salad for lunch.
:::
:::
(The indentation and also nesting of :::
elements may very well be a separate issue. I'm only adding the indentation for clarity.)
(some text){.attr}
{
"tag": "doc",
"children": [
{
"tag": "para",
"children": [
{
"tag": "str",
"text": "(some text)"
}
]
}
],
"footnotes": [],
"references": []
}
There's no trace of .attr
in the output, and no warnings.
I feel like something other than silently discarding attributes should happen here, though I am not sure what exactly
Pandoc offers a line block format which is useful for addresses and such things as poetry. In this format, newlines are hard breaks, and all spaces are significant, even leading spaces.
| The limerick packs laughs anatomical
| Into space that is quite economical.
| But the good ones I've seen
| So seldom are clean
| And the clean ones so seldom are comical.
Note that the same effect can be achieved with backslash-newline and backslash-space, but it arguably looks less natural.
The limerick packs laughs anatomical\
Into space that is quite economical\
\ \ But the good ones I've seen\
\ \ So seldom are clean\
And the clean ones so seldom are comical.
In addition, the pipes may create confusion with pipe tables.
So, not sure this feature is worth it.
I know it's said that HTML-style entities are not supported because djot is not to favor any target format, but I wonder if it wouldn't be a good idea to have a mechanism for including characters which are hard to type, and entities is a well-known syntax for that, which I would say is good enough.1 I can share a Lua table mapping HTML 5 entity names to UTF-8 characters, but supporting only numeric entities would be a reasonable limitation, since djot would only borrow the syntax. Those can be handled very effectively in Lua, e.g.
str:gsub('(%&(%#?%w%w-)%;)', function (entity,id)
if id:match('^#') then
local cp = tonumber(id:gsub('^#', '0'))
if cp and cp >= 0 and cp <= 0x10ffff then
return char(cp)
end
end
error("Unsupported or invalid entity: " .. entity, 2)
end
)
where char
can be either utf8.char
or this:
function char(a)
local cp = math.floor(assert( tonumber(a), "Expected number but got " .. tostring(type(cp))))
if cp < 0 or cp > 0x10ffff then
error("Codepoint is out of range: " .. a)
end
if cp < 128 then
return string.char(cp)
end
local s = ""
local prefix_max = 32
while true do
local suffix = cp % 64
s = string.char(128 + suffix) .. s
cp = (cp - suffix) / 64
if cp < prefix_max then
return string.char((256 - (2 * prefix_max)) + cp) .. s
end
prefix_max = prefix_max / 2
end
end
I would prefer a paired delimiter. My string interpolation DSL uses @(...)
where the parentheses may contain one or more of (1) a decimal code point like 331
, (2) a hex codepoint like 0x14b
, (3) an entity name like eng
, or a Unicode name in angle brackets like <Latin small letter eng>
(in the Perl implementation). ↩
[inline]{.a_b_c}
::: a_b_c
:::
I'd expect both a_b_c
to parse as a class name. The second one doesn't:
doc
para
span class="a_b_c"
str s="inline"
para
str s="::: a"
emph
str s="b"
str s="c"
div
references = {
}
footnotes = {
}
In the announcement thread on pandoc-discuss I suggested to add these syntaxes:
{|underline|}
{!strikeout!}
{.small caps.}
@jgm asked me to open an issue here.
I would very much appreciate to have a syntax for small caps in particular.
It would help to quickly visualize and learn what a bit of djot markup does if the examples contained rendered html in addition to the raw html output.
It would be great to have a nice syntax for acronyms (abbr in HTML) and glossary entries. I am not aware of any markdown flavor that handles this.
Maybe this ties into a more robust cross-referencing syntax..
LaTeX has a flexible system for creating numbering counters, labels, and cross-references. This can be used with headings, tables, figures, equations, even list items. This is a must for serious academic writing, but it's not easy to see how to create a system that is sufficiently flexible but still natural for plain text writing and easy to use.
I don't know why, but this fails:
```
test
ok
```
Only the first part is highlighted as code.
I don’t think there’s one yet. Would be very useful!
Hi,
Seeing as the projet already has a syntax definition for vim it would make sense to also have a tree-sitter grammar which could be used by a number of editors (e.g. neovim).
I guess a good starting point would be the markdown grammar, but personally I'm having a hard time understanding how it works.
Looking forward to using djot
in the future!
We need a syntax for citations that can be plugged into citeproc-lua or sent to pandoc for processing.
Pandoc's citation syntax seems a good basis. One thing we might change would be the syntax for author-in-text citations, which is currently a bit tricky to parse, because it requires lookahead.
Perhaps instead of
@foo [p. 15]
we should have something like
[+@foo, p. 15]
A lot of the time when we use italics it's for emphasis text (<em>
), other times it's book title (<cite>
) or some weirdo other language quote or Linneaen flower name (in which case we have to use <i>
). The commonmark way to do that is to use raw HTML, but that's more cumbersome in djot, and raw HTML isn't something we wanna leave on for world-readable forums and wikis anyway.
That's why I suggest that djot produces <b>
and <i>
instead of <strong>
and <em>
. Since the former or hypernyms or superset of the latter, they're never wrong, it's just that a lot of the time the latter are more precise (at the expense of sometimes being completely wrong).
(The other thing I've always wanted to change about Markdown is supporting • for list bullets.)
hello{.en}
привет{.ru}
doc
para
str s="hello" class="en"
softbreak
str s="привет"
references = {
}
footnotes = {
}
I think both cases should either apply or not apply the attribute
I saw you plan to add wiki link syntax (#26), then I was wondering if you plan to expand that wiki syntax to make it able to express accurate link to other djot files, even other djot blocks.
If we want to add a link to a.md
, whose content is
# Head 1 {#anchor}
some content.
via regular markdown link, we have to write it as [link to a](a.html#anchor)
, which assume we want to convert the file to html. Wiki link is a better solution since it doesn't need to specify file extension name. But there are some cons of wiki link:
a.html#anchor
Following syntax is a rough proposal, just for expressing what function it can give. You probably need to polish these syntaxs before they come to spec.
[[Tiger]]()
means a link to Tiger.djot
.[[Tiger]](../a.djot)
means a link to ../a.djot
with text Tiger
.[[Tiger]](a.djot#anchor)
means a link to an anchor point named anchor
in a.djot
.When being converted to html, link to a.djot
should be replaced by a.html
.
If djot have a spec about links between file (even with blocks and inline elements), there will be some pros:
Possible cons of that syntax is:
Text
of inputs instead of (FilePath, Text)
.Tables whose cells contain block-level content (multiple paragraphs, lists, code blocks) can't be represented as pipe tables. For these cases we might want to provide "list tables" as in RST. These could be rerpresented as a list in a div with attributes.
::: table aligns="lc" widths="25 50"
- * one
* two
- -----
- * three
* ^
- * five
multi-paragraph
* ~
Any content below the list is the caption.
~ means: merge with the cell above.
^ means; merge with cell to left.
:::
@jgm,
after having to tag languages in many documents, I think this would be handy:
This is [French]{:fr}.
And this is [ancient Greek]{:grc}.
It is consistent with the syntax for {#id}
and {.class}
.
Would it be possible to have this handle in djot?
Many thanks for your help and your excellent work.
Create a Lua filter API, like pandoc's Lua filters API.
Doing this might require changing the way we currently represent the AST in Lua. The current method is designed for speed, but uses some conventions that make it fragile for direct interaction (e.g., annotation as first component of an array). (Perhaps we could use metatables to provide a friendly public interface to the current array-based structure.)
Alternatively, we could leave the AST as it is but provide special functions for manipulating it.
The documentation of the use of the library in README.md is a bit sparse.
It would be better to have proper API documentation for the Lua library.
One feature many markdown table syntax lacks is merged cells in tables.
First of all - love this. A principled, light-weight markup unsaddled by baggage, and designed by someone with the chops to appreciate the nuances of every decision.
I was wondering how stable you considered the current implementation. How far is this from a 1.0 release? What would it take for this to become fully integrated into Pandoc? Do you hope for this to eventually replace Pandoc Markdown?
I wonder about all the goals of this project. How "deep" it does want to go?
Seeing other issues like #10 I somewhat fear that djot will not be compatible with e.g. CriticMarkup which I myself consider among the fundamental Markdown extensions with large user base.
Do you plan to keep djot compatible with CriticMarkup and/or other extensions from ExtraMark, MultiMarkdown, etc. (see https://gist.github.com/vimtaai/99f8c89e7d3d02a362117284684baa0f )?
There is a typo/copy error on line 22 of the Vim syntax file. It says superscript
like the line above when it should say subscript
. It currently has no visible effect, but will cause problems if someone tries to redefine the highlighting for subscript.
I notice that, in addition to triple-backticks, I can get verbatim blocks with tildes as well:
~~~
works for
code blocks
~~~
but it's not mentioned in the syntax reference.
Will djot continue to support the triple-tilde syntax for code blocks?
Incidentally, if triple-tilde delimiters were no longer used for code blocks, maybe they could be used for figures (see #31).
I know that, with Markdown, the rule is to only use punctuation symbols for markup; the prime directive being "optimize for readability". The markup shouldn't look like markup!
Then folks wanted to add more features to markdown, which required no small amount of creativity in finding ways to use the limited set of ascii punctuation as syntax for these features. This entailed lots of discussion, sometimes resulting in epic multi-year threads on the best syntax for them.
On the other end of the spectrum is something like LaTeX or Texinfo, where you just have \command
or @command
for a given feature and you're done. These markup formats are easy to write, have more features, and I'm guessing are easier to implement/parse. But readability suffers.
I think readability is key to adoption. If people dislike reading a markup format, they won't write it (or, maybe other people will come along and create lightweight markup languages that compile/transpile to the less-readable but more-featureful markup lang).
I'm wondering where djot falls on this spectrum. My impression is that it's "markdown 2.0 ++" --- ambiguous syntax fixed, syntax simplified (or made more consistent) in some places. But with further ambitions...
If ambitions are to be able to write real academic papers and textbooks with djot (sounds good!), then in order to avoid ascii-soup syntax and decades-long discussions of "the right syntax for {feature}", is it acceptable to introduce some generic english command syntax into djot?
My guess is that the sweet spot is: optimize for readability for the most commonly-used markup, and also introduce a generic english command syntax (like, e.g. @this
) to help implement the advanced features (while still striving for good readability).
Potential examples of where an @command
might be used: issue #28, @caption
, #35 @metadata
, #32 @cite
, #31 @figure
.
I've spend some time looking at various hypothetical alternative implementations. I didn't do anything practical, but I've learned a bunch, so hopefully this might be useful for someone.
The backstory here is that I'd love to access Djot from Deno, as that seems like the perfect runtime for rendering an extensible lite markup. I've used JS template literals for this in the past, and that's quiet neat, and Deno's security model is also very appropriate for these kinds of converters.
Here's some options:
the main benefit of an alternative impl would be that you'd be able to massage the output programmatically in the language of your choice (if you need just .html
, its always possible to shell-out). To achieve that, we need to actually define the AST model, so that alternative impls don't just export whatever internal representation they have, but share the general shape of the API. I think a good AST model is already present in the Lua impl, it needs to be documented in an abstract form in the spec (and we can add a canonical JSON encoding for it #58).
we could (and, long term, absolutely should) provide a native implementation in something like C, Rust or Zig. Given that djot is a small, nicely organized code-base, this shouldn't be much trouble. I see only two potential snags:
find!("^[*+-] %[[Xx ]%]%s")
into an inline automaton at compile time. Or maybe just bring in regex
for the first version and leave a todo
:)if we had a, say C, impl, compiling that to Wasm and exposing to node&deno would be trivial.
Can be just derive a bunch of implementations from a unified grammar? From what I understand how those things actually work in practice, no, not really.
lua and JavaScript seem sufficiently close (eg, both have regexes built-in), so that manual "transpiling" of .lua
to .js
might make sense? Perhaps long-term Wasm would be strictly better than .js
, but in today's world .js
can be operationally easier, so why not?
lua is implemented in C, so we can compile Lua itself to Wasm, and then interpret djot in Wasm. That seems like the most horrible, but also the most easy way to get going without rewriting everything. And lua to wasm is how playground works.
Sadly, there's a couple of problem on that path. The fundamental thing is that neither browsers nor deno support just importing a WASM module. Instead, you need to do a dance of getting an Uint8Array
from somewhere and than manually instantiate that. The way this typically works is that wasm bytes are fetched
from some server, but that's very much not a self-contained library then. This fetching is what the wasmoon
, the library used by playground, is doing.
An alternative, more friendly for consumers approach is to embed .wasm
as a base64 string directly into the source code example. This I think is what should be done for this approach, but, as far as I can tell, no-one actually done this for luajit so far? This approach is also somewhat not great, in a sense that the loading would block the JS event loop.
So yeah, the next step for this approach would be to re-recreate what wasmoon did with compiling lua
with emcc
(Emscripten), embed the result (togethre with .lua
files for djot) into a .js
file, and write the required glue code to specialize wasm runtime to lua interpreter and djot parser!
The following djot document:
Hello -- :smile:
produces the following ast:
{
"footnotes": [],
"references": [],
"type": "doc",
"children": [
{
"type": "para",
"children": [
{
"type": "str",
"text": "Hello "
},
{
"type": "en_dash",
"text": "--"
},
{
"type": "str",
"text": " "
},
{
"type": "emoji",
"text": ":smile:"
}
]
}
]
}
The problem here is that smile="😄",
part is implicit -- consumer of such ast would have to replicate djot's emoji table. It would help to add "rendered" emojis to the output, even if that info is in some sense redundant.
Thinking more about this, maybe we don't even need dedicated AST nodes like emoji
or en_dash
? We can say that they are in fact str
nodes, just with a raw
attribute:
{
"footnotes": [],
"references": [],
"type": "doc",
"children": [
{
"type": "para",
"children": [
{
"type": "str",
"text": "Hello "
},
{
"type": "str",
"text": "–",
"raw": "--"
},
{
"type": "str",
"text": " "
},
{
"type": "str",
"text": "😄",
"raw": ":smile:"
}
]
}
]
}
There might be some terminological mishappening here. In the literal syntax tree, we certainly have the type: "emoji"
syntax node. But what we want from -a -j
is probably not as much an AST, as an abstract document model. So, syntactically :smile:
is emoji, but semantically it wants to be very close 😄 (eg, substituting :emoji:
syntax with their unicode equivaents shouldn't chage the meaning of a djot document).
Djot's treatment of hard line breaks violates this principle. In djot, this:
I can write it on the door \
I can put it on the floor \
I can do anything that you want me for \
If you want me to \
nicely renders as:
I can write it on the door
I can put it on the floor
I can do anything that you want me for
If you want me to
But this:
Do it right, do it wrong \
'Cause a matter of fact, it'll turn out to be strong \
If you want me to \
renders as
Do it right, do it wrong 'Cause a matter of fact, it'll turn out to be strong
If you want me to
because there happens to be an accidental trailing space on the first line.
I think in a human oriented plain text format, invisible or non-obvious whitespace should have no significance. By "invisible", I mean not obviously present to the human eye. That there is a space between words is obvious, but not how many spaces, or that there are spaces at the end of a line, or the number of spaces at the beginning of a line (when a non-fixed width font is used).
I think djot gets it right in this regard with one exception I've found so far (above). For example, I like that djot doesn't have a magic indent threshold like Markdown's 4 spaces: the transition from 3 to 4 results in dramatically different output, and the transition from 4 to 5 spaces results in subtly but significantly different output. Likewise one doesn't have to count spaces to make sure that successive lines of a list item are treated as such. In fact the interaction between Commonmark's four spaces code block and the space to signify list item continuation results in very unintuitive behavior.
% djot
[Beyond
Markdown](https://johnmacfarlane.net/beyond-markdown.html). (See
[Rationale](#rationale), below.)
^D
<p><a href="https://johnmacfarlane.net/beyond-markdown.html">Beyond
Markdown</a>. (See
<a href="#rationale), below.">Rationale</a></p>
Note that ", below" is parsed as part of the destination.
Oddly this doesn't happen if we trim off the first link above.
Here are two kinds of texts we might want to distinguish:
paragraph content
> block quote
continuation of paragraph
vs
paragraph content
> block quote
new paragraph
A deficiency of Markdown is that there is no way to distinguish these cases. The problem is reduced if one renders in a format that does not indent new paragraphs, because then there is no visual distinction between the cases. But they are semantically different and can be distinguished, e.g., in print output with indented paragraphs. There should be a way to distinguish them in the source.
The problem is not raised only by block quotes but occurs also with set-off equations, images, tables, code, and lists.
I recently found myself creating a pandoc Lua filter that implements the following syntax for the "continued paragraph case":
paragraph content
> block quote
_ continuation of paragraph
(The filter just inserts a LaTeX \noindent
command where the _
is.) This is not too bad actually. It would be nice if djot had some way of making the distinction.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.