Coder Social home page Coder Social logo

Comments (4)

MichaelHatherly avatar MichaelHatherly commented on July 22, 2024 1

100% behind making the representation of the AST more standalone, as well as the route you're proposing.

should probably be moved to the AST package is the terminal one

That one also needs to be re-written without use of Crayons, this is one of the main blocking issues if being able to upstream CM into the standard libs.

Just as a quick side project, I might revive CMark.jl, which links against the libcmark and libcmark-gfm C parsers, and have it also produce this standardized AST. I don't think users should use the C parsers if they're parsing Markdown (CommonMark.jl is preferred), but this could be useful for testing.

That would be quite nice to be able to cross-check against other implementations like that, would likely unearth some nice corner cases.

from commonmark.jl.

mortenpi avatar mortenpi commented on July 22, 2024

I now have put together a WIP version of a separate AST package. It needs heaps of corner-case fixing, general cleanup and more documentation, but I think the only major thing that is missing is the terminal printing.

However, it does deviate from the implementation here in some ways (I found it easier to implement these changes right away, rather than starting by verbatim lifting things from here). I would love to have your feedback on those changes, especially if you have objections or suggestions for alternatives.

The main differences from the CommonMark.jl implementation should be:

  1. The Node type has a fewer fields, dropping a bunch of the code metadata fields, and is instead a parametric type. The type parameter can be used to add additional fields.

    My assumption is that in CommonMark would have its own metadata container e.g.

    mutable struct CommonMarkMeta
        sourcepos::SourcePos
        last_line_blank::Bool
        last_line_checked::Bool
        is_open::Bool
        literal::String
        meta::Dict{String,Any}
    end
    
    const CommonMarkNode = Node{CommonMarkMeta}

    and would return Node{CommonMarkMeta} trees.

  2. The "null" node in the pointers is represented by a nothing (as opposed to NULL_NODE), and the fields are Union{Node{M}, Nothing}.

    This pattern feels more idiomatic to me in Julia, but I am not sure if there is e.g. a performance reason for using the NULL_NODE pattern instead. But I believe the small unions are performant and type-stable?

  3. Accessing the various fields of a Node are now done via accessor method (e.g. next(), children(), node[] for the AbstractContainer instance, or meta() for the user-defined metadata; more in docs of Node). However, I don't think it would be unreasonable to stick to fields/properties for some of these APIs.

  4. AbstractContainer -> AbstractElement as the top level abstract type. I felt that using the word "container" here was not quite accurate, since some of them are not containers, but leafs.

    "Element" however, feels like a good general word referring to the different types of nodes. I borrowed it from HTML, but the CommonMark standard actually also refers to "structural elements" (e.g. "We can think of a document as a sequence of blocks—structural elements like paragraphs, ...").

  5. Instead of append_child and prepend_child, I overload push! and pushfirst! for this. I felt that "append"/"prepend" could be confusing, as in the standard library append! and prepend! concatenate two collections, rather than adding an element. However, at the same time, I am not really sure it makes sense to think of a node as a "collection of its children", which this choice implies.

  6. Some of CommonMark elements (AbstractContainers) use the .literal field for their data (e.g. Text). I think that all information about the element should be in the AbstractElement (AbstractContainer) object (to avoid having to go back to the Node). So e.g. Text() elements now have a .text field that contains the text content, and I think a few other elements have the same change.

Sorry again for the TL;DR 😅 There are plenty WIP items in the current implementation and I have opened a bunch of issues on that repo too, mainly keep track of stuff for myself.

from commonmark.jl.

MichaelHatherly avatar MichaelHatherly commented on July 22, 2024

Thanks for the updates.

The type parameter can be used to add additional fields.

Any idea yet whether that's going to require significant changes in CM.jl to switch, or will changes be minimal? Currently the struct mirrors the what is used in the JS and Python libs on which this is based, ideally I don't want to stray too far from there so that folding changes from there into this package is easier.

But I believe the small unions are performant and type-stable?

If it's just as fast then that's fine, when originally written it was required to avoid dynamic dispatch in many places.

node[] for the AbstractContainer instance

Perhaps container() rather than overloading getindex, which adds inconsistencies in how you access particular parts of the nodes.

e.g. next()

Probably too generic, unless we're expecting to not export?

AbstractContainer -> AbstractElement as the top level abstract type. I felt that using the word "container" here was not quite accurate, since some of them are not containers, but leafs.

That's fine.

Instead of append_child and prepend_child, I overload push! and pushfirst! for this. I felt that "append"/"prepend" could be confusing, as in the standard library append! and prepend! concatenate two collections, rather than adding an element. However, at the same time, I am not really sure it makes sense to think of a node as a "collection of its children", which this choice implies.

Those were intentionally not added to the push! and pushfirst! methods since I didn't feel they could really reasonably be classed as "array-like" enough for it not to be punning.

So e.g. Text() elements now have a .text field that contains the text content, and I think a few other elements have the same change.

Same comment about matching the upstream packages that this one is based on. This might be more borderline than changing the fields of the node type, so might be more acceptable.


I'll have more comments when I can get around to looking over the package itself.

from commonmark.jl.

mortenpi avatar mortenpi commented on July 22, 2024

Ah, I didn't consider keeping the code and API similar to the other implementations as a goal. The downside of sticking to those conventions is that it would restrict us in providing the cleanest, most Julian API.

However, I think we could get the best of both worlds with some getproperty/setproperty! magic. We can have whatever structure / API for Node we want, and then here just overload the method specific to CommonMark, e.g.

function Base.getproperty(node::Node{CommonMarkMeta}, name::Symbol)
    if name === :literal
        meta(node).literal
    elseif name === :nxt
        next(node)
        # etc.
    else
        # Fallback to the properties of general Node type. This is useful if we e.g. decide
        # to provide a .children property, without having to re-implement all the general
        # properties of Node for the specialized Node{CommonMarkMeta}.
        invoke(getproperty, Tuple{Node,Symbol}, node, name)
    end
end

to provide the necessary "fields" that can be used internally in CommonMark. This way we should be able to get away with really minimal changes to CommonMark's code.

I think Julia's constant propagation keeps the whole thing performant too. For example:
struct Bar
    x :: Any
end

function Base.getproperty(value :: Bar, name :: Symbol)
    if name === :typeofx
        println(typeof(value.x))
    elseif name === :y
        println("yyy")
    else
        getfield(value, name)
    end
end

function bbtest(b::Bar)
    b.typeofx
    x = b.x
    b.y
    return x
end

@code_llvm bbtest(Bar(123))

The LLVM IR clearly shows that the println etc. are nicely inlined, rather than doing ifs on every getproperty call.

So e.g. Text() elements now have a .text field that contains the text content, and I think a few other elements have the same change.

Same comment about matching the upstream packages that this one is based on. This might be more borderline than changing the fields of the node type, so might be more acceptable.

An option here could be to do some more complex overloading of the .literal field, which for certain element types returns/sets the value from the element, rather than from the .meta.literal field (or, perhaps even keeps both consistent).


But I believe the small unions are performant and type-stable?

If it's just as fast then that's fine, when originally written it was required to avoid dynamic dispatch in many places.

I am not super-informed about this, just something I've read in passing (e.g. e.g. this blog post). However, I constructed this test case:

@noinline _next(node::Node) = node.nxt
@noinline print_node(::Nothing) = println("- nothing")
@noinline print_node(::Node) = println("- node")
function print_node_dispatch(node)
    nxt = _next(node)
    print_node(nxt)
    print_node(nxt)
    if isnothing(nxt)
        print_node(nxt)
    end
end

@code_warntype print_node_dispatch(Node(Document()))
@code_llvm print_node_dispatch(Node(Document()))

In this case, nxt is clearly type-unstable (::Union{Nothing,Node}), but the LLVM indicates that Julia is smart enough to branch this such that you have either 3 sequential calls to print_node(::Nothing) or 2 sequential calls to print_node(::Node). No dynamic dispatch in this case as far as I can tell. I wonder though if there are any other cases where it might still be an issue?


For the other things, I opened two bikeshedding issues: JuliaDocs/MarkdownAST.jl#10, JuliaDocs/MarkdownAST.jl#11.

from commonmark.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.