Comments (4)
100% behind making the representation of the AST more standalone, as well as the route you're proposing.
should probably be moved to the AST package is the terminal one
That one also needs to be re-written without use of Crayons
, this is one of the main blocking issues if being able to upstream CM into the standard libs.
Just as a quick side project, I might revive CMark.jl, which links against the libcmark and libcmark-gfm C parsers, and have it also produce this standardized AST. I don't think users should use the C parsers if they're parsing Markdown (CommonMark.jl is preferred), but this could be useful for testing.
That would be quite nice to be able to cross-check against other implementations like that, would likely unearth some nice corner cases.
from commonmark.jl.
I now have put together a WIP version of a separate AST package. It needs heaps of corner-case fixing, general cleanup and more documentation, but I think the only major thing that is missing is the terminal printing.
However, it does deviate from the implementation here in some ways (I found it easier to implement these changes right away, rather than starting by verbatim lifting things from here). I would love to have your feedback on those changes, especially if you have objections or suggestions for alternatives.
The main differences from the CommonMark.jl implementation should be:
-
The
Node
type has a fewer fields, dropping a bunch of the code metadata fields, and is instead a parametric type. The type parameter can be used to add additional fields.My assumption is that in CommonMark would have its own metadata container e.g.
mutable struct CommonMarkMeta sourcepos::SourcePos last_line_blank::Bool last_line_checked::Bool is_open::Bool literal::String meta::Dict{String,Any} end const CommonMarkNode = Node{CommonMarkMeta}
and would return
Node{CommonMarkMeta}
trees. -
The "null" node in the pointers is represented by a
nothing
(as opposed toNULL_NODE
), and the fields areUnion{Node{M}, Nothing}
.This pattern feels more idiomatic to me in Julia, but I am not sure if there is e.g. a performance reason for using the
NULL_NODE
pattern instead. But I believe the small unions are performant and type-stable? -
Accessing the various fields of a
Node
are now done via accessor method (e.g.next()
,children()
,node[]
for theAbstractContainer
instance, ormeta()
for the user-defined metadata; more in docs ofNode
). However, I don't think it would be unreasonable to stick to fields/properties for some of these APIs. -
AbstractContainer
->AbstractElement
as the top level abstract type. I felt that using the word "container" here was not quite accurate, since some of them are not containers, but leafs."Element" however, feels like a good general word referring to the different types of nodes. I borrowed it from HTML, but the CommonMark standard actually also refers to "structural elements" (e.g. "We can think of a document as a sequence of blocks—structural elements like paragraphs, ...").
-
Instead of
append_child
andprepend_child
, I overloadpush!
andpushfirst!
for this. I felt that "append"/"prepend" could be confusing, as in the standard libraryappend!
andprepend!
concatenate two collections, rather than adding an element. However, at the same time, I am not really sure it makes sense to think of a node as a "collection of its children", which this choice implies. -
Some of CommonMark elements (
AbstractContainer
s) use the.literal
field for their data (e.g.Text
). I think that all information about the element should be in theAbstractElement
(AbstractContainer
) object (to avoid having to go back to theNode
). So e.g.Text()
elements now have a.text
field that contains the text content, and I think a few other elements have the same change.
Sorry again for the TL;DR 😅 There are plenty WIP items in the current implementation and I have opened a bunch of issues on that repo too, mainly keep track of stuff for myself.
from commonmark.jl.
Thanks for the updates.
The type parameter can be used to add additional fields.
Any idea yet whether that's going to require significant changes in CM.jl
to switch, or will changes be minimal? Currently the struct mirrors the what is used in the JS and Python libs on which this is based, ideally I don't want to stray too far from there so that folding changes from there into this package is easier.
But I believe the small unions are performant and type-stable?
If it's just as fast then that's fine, when originally written it was required to avoid dynamic dispatch in many places.
node[]
for theAbstractContainer
instance
Perhaps container()
rather than overloading getindex
, which adds inconsistencies in how you access particular parts of the nodes.
e.g.
next()
Probably too generic, unless we're expecting to not export
?
AbstractContainer
->AbstractElement
as the top level abstract type. I felt that using the word "container" here was not quite accurate, since some of them are not containers, but leafs.
That's fine.
Instead of
append_child
andprepend_child
, I overloadpush!
andpushfirst!
for this. I felt that "append"/"prepend" could be confusing, as in the standard libraryappend!
andprepend!
concatenate two collections, rather than adding an element. However, at the same time, I am not really sure it makes sense to think of a node as a "collection of its children", which this choice implies.
Those were intentionally not added to the push!
and pushfirst!
methods since I didn't feel they could really reasonably be classed as "array-like" enough for it not to be punning.
So e.g.
Text()
elements now have a.text
field that contains the text content, and I think a few other elements have the same change.
Same comment about matching the upstream packages that this one is based on. This might be more borderline than changing the fields of the node type, so might be more acceptable.
I'll have more comments when I can get around to looking over the package itself.
from commonmark.jl.
Ah, I didn't consider keeping the code and API similar to the other implementations as a goal. The downside of sticking to those conventions is that it would restrict us in providing the cleanest, most Julian API.
However, I think we could get the best of both worlds with some getproperty
/setproperty!
magic. We can have whatever structure / API for Node
we want, and then here just overload the method specific to CommonMark, e.g.
function Base.getproperty(node::Node{CommonMarkMeta}, name::Symbol)
if name === :literal
meta(node).literal
elseif name === :nxt
next(node)
# etc.
else
# Fallback to the properties of general Node type. This is useful if we e.g. decide
# to provide a .children property, without having to re-implement all the general
# properties of Node for the specialized Node{CommonMarkMeta}.
invoke(getproperty, Tuple{Node,Symbol}, node, name)
end
end
to provide the necessary "fields" that can be used internally in CommonMark. This way we should be able to get away with really minimal changes to CommonMark's code.
struct Bar
x :: Any
end
function Base.getproperty(value :: Bar, name :: Symbol)
if name === :typeofx
println(typeof(value.x))
elseif name === :y
println("yyy")
else
getfield(value, name)
end
end
function bbtest(b::Bar)
b.typeofx
x = b.x
b.y
return x
end
@code_llvm bbtest(Bar(123))
The LLVM IR clearly shows that the println
etc. are nicely inlined, rather than doing if
s on every getproperty
call.
So e.g.
Text()
elements now have a.text
field that contains the text content, and I think a few other elements have the same change.Same comment about matching the upstream packages that this one is based on. This might be more borderline than changing the fields of the node type, so might be more acceptable.
An option here could be to do some more complex overloading of the .literal
field, which for certain element types returns/sets the value from the element, rather than from the .meta.literal
field (or, perhaps even keeps both consistent).
But I believe the small unions are performant and type-stable?
If it's just as fast then that's fine, when originally written it was required to avoid dynamic dispatch in many places.
I am not super-informed about this, just something I've read in passing (e.g. e.g. this blog post). However, I constructed this test case:
@noinline _next(node::Node) = node.nxt
@noinline print_node(::Nothing) = println("- nothing")
@noinline print_node(::Node) = println("- node")
function print_node_dispatch(node)
nxt = _next(node)
print_node(nxt)
print_node(nxt)
if isnothing(nxt)
print_node(nxt)
end
end
@code_warntype print_node_dispatch(Node(Document()))
@code_llvm print_node_dispatch(Node(Document()))
In this case, nxt
is clearly type-unstable (::Union{Nothing,Node}
), but the LLVM indicates that Julia is smart enough to branch this such that you have either 3 sequential calls to print_node(::Nothing)
or 2 sequential calls to print_node(::Node)
. No dynamic dispatch in this case as far as I can tell. I wonder though if there are any other cases where it might still be an issue?
For the other things, I opened two bikeshedding issues: JuliaDocs/MarkdownAST.jl#10, JuliaDocs/MarkdownAST.jl#11.
from commonmark.jl.
Related Issues (20)
- Clash between TableRule and IndentedCodeBlockRule HOT 2
- Problem testing `JuliaInterpreter` on Juno/Atom HOT 1
- `@cm_str` inside list comprehension
- Interpolate a vector of items HOT 2
- HypertextLiteral.jl + CommonMark.jl = 🤯 HOT 9
- Julia 1.0: no method matching get!(::getfield(CommonMark, Symbol("##106#108")), ::IdDict{CommonMark.Node,Dict{String,Int64}}, ::CommonMark.Node) HOT 2
- Content inside `<script>` inside paragraph HOT 5
- Tables fail when there are spaces between column definitions and pipes
- Keep reference links as references HOT 6
- Indentation within HTML block with empty lines HOT 1
- Markdown printing adds unwanted newlines HOT 1
- Table parsing fail for some unicode identifiers HOT 1
- Duplicate rules when enabling/disabling rules HOT 1
- Disabling LinkRule (but not ImageRule) does not disable link parsing HOT 1
- Error handling multi unicode text HOT 5
- Some detailed code issues HOT 3
- `benchmark/`: simplify
- Migrate to PrecompileTools HOT 3
- [Bug?] CommonMark Markdown Table cells are not CommonMark leaf blocks HOT 23
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from commonmark.jl.