JuliaLowering.jl is an experimental port of Julia's code lowering compiler passes, written in Julia itself. "Code lowering" is the set of compiler passes which symbolically transform and simplify Julia's syntax prior to type inference.
This work is intended to
- Bring precise code provenance to Julia's lowered form (and eventually
downstream in type inference, stack traces, etc). This has many benefits
- Talk to users precisely about their code via character-precise error and diagnostic messages from lowering
- Greatly simplify the implementation of critical tools like Revise.jl which rely on analyzing how the user's source maps to the compiler's data structures
- Allow tools like JuliaInterpreter to use type-inferred and optimized code, with the potential for huge speed improvements.
- Bring improvements for macro authors
- Prototype "automatic hygiene" (no more need for
esc()
!) - Precise author-defined error reporting from macros
- Sketch better interfaces for syntax trees (hopefully!)
- Prototype "automatic hygiene" (no more need for
Note this is a very early work in progress; most things probably don't work!
- Check out the caf/lowering-2 branch of JuliaSyntax.jl
- Run the demo
include("test/lowering.jl")
Lowering has five symbolic simplification passes:
- Macro expansion - expanding user-defined syntactic constructs by running the user's macros. This pass also includes a small amount of other symbolic simplification.
- Syntax desugaring - simplifying Julia's rich surface syntax down to a small number of syntactic forms.
- Scope analysis - analyzing identifier names used in the code to discover local variables, closure captures, and associate global variables to the appropriate module.
- Closure conversion - convert closures to types and deal with captured variables efficiently where possible.
- Flattening to linear IR - convert code in hierarchical tree form to a flat array of statements; convert control flow into gotos.
Want something something better than JuliaSyntax.SyntaxNode
! SyntaxTree
and
SyntaxGraph
provide this. (These will probably end up in JuliaSyntax
.)
We want to allow arbitrary attributes to be attached to tree nodes by analysis passes. This separates the analysis pass implementation from the data structure, allowing passes which don't know about each other to act on a shared data structure.
Design and implementation inspiration comes in several analogies:
Analogy 1: the ECS (Entity-Component-System) pattern for computer game design. This pattern is highly successful because it separates game logic (systems) from game objects (entities) by providing flexible storage
- Compiler passes are "systems"
- AST tree nodes are "entities"
- Node attributes are "components"
Analogy 2: The AoS to SoA transformation. But here we've got a kind of tree-of-structs-with-optional-attributes to struct-of-Dicts transformation. The data alignment / packing efficiency and concrete type safe storage benefits are similar.
Analogy 3: Graph algorithms which represent graphs as a compact array of node ids and edges with integer indices, rather than using a linked data structure.
To write correct hygienic macros in Julia (as of 2024), macro authors must use
esc()
on any any syntax passed to the macro so that passed identifiers escape
to the macro caller scope. However
- This is not automatic and the correct use of
esc()
is one of the things that new macro authors find most confusing. (My impression, based on various people complaining about how confusingesc()
is.) esc()
wraps expressions inExpr(:escape)
, but this doesn't work well when macros pass such escaped syntax to an inner macro call. As discussed in Julia issue #37691, macros in Julia's existing system are not composable by default. Writing composable macros in the existing system would require preserving the escape nesting depth when recursing into any macro argument nested expressions. Almost no macro author knows how to do this and is prepared to pay for the complexity of getting it right.
The requirement to use esc()
stems from Julia's pervasive use of the simple
Expr
data structure which represents a unadorned AST in which names are plain
symbols. For example, a macro call @foo x
gets passed the symbol :x
which is just a name without any information attached to indicate that it came
from the scope where @foo
was called.
In JuliaLowering we make hygiene automatic and remove esc()
by combining names
with scope information. In the language of the paper Towards the Essence of
Hygiene
by Michael Adams, this combination is called a "syntax object". In
JuliaLowering our representation is the tuple (name,scope_layer)
, also called
VarId
in the scope resolution pass.
JuliaLowering's macro expander attaches a unique scope layer to each identifier in a piece of syntax. A "scope layer" is an integer identifer combined with the module in which the syntax was created.
When expanding macros,
- Any identifiers passed to the macro are tagged with the scope layer they were defined within.
- A new unique scope layer is generated for the macro invocation, and any names in the syntax produced by the macro are tagged with this layer.
Subsequently, the (name,scope_layer)
pairs are used when resolving bindings.
This ensures that, by default, we satisfy the basic rules for hygenic macros
discussed in Adams' paper:
- A macro can't insert a binding that can capture references other than those inserted by the macro.
- A macro can't insert a reference that can be captured by bindings other than those inserted by the macro.
TODO: Write more here...
- Toward Fearless Macros - a blog post by Ashton Wiersdorf
- Towards the Essence of Hygiene - a paper by Michael Adams
- Bindings as sets of scopes - a description of Racket's scope set mechanism by Matthew Flatt
macroexpand(m::Module, x)
calls jl_macroexpand
in ast.c:
jl_value_t *jl_macroexpand(jl_value_t *expr, jl_module_t *inmodule)
{
expr = jl_copy_ast(expr);
expr = jl_expand_macros(expr, inmodule, NULL, 0, jl_world_counter, 0);
expr = jl_call_scm_on_ast("jl-expand-macroscope", expr, inmodule);
return expr;
}
First we copy the AST here. This is mostly a trivial deep copy of Expr
s and
shallow copy of their non-Expr
children, except for when they contain
embedded CodeInfo/phi/phic
nodes which are also deep copied.
Second we expand macros recursively by calling
jl_expand_macros(expr, inmodule, macroctx, onelevel, world, throw_load_error)
This relies on state indexed by inmodule
and world
, which gives it some
funny properties:
module
expressions can't be expanded: macro expansion depends on macro lookup within the module, but we can't do that withouteval
.
Expansion proceeds from the outermost to innermost macros. So macros see any
macro calls or quasiquote (quote/$
) in their children as unexpanded forms.
Things which are expanded:
quote
is expanded using flisp code injulia-bq-macro
- symbol / ssavalue ->
QuoteNode
(inert) - atom -> itself
- at depth zero,
$
expands to its content - Expressions
x
without$
expand to(copyast (inert x))
- Other expressions containing a
$
expand to a call to_expr
with all the args mapped throughjulia-bq-expand-
. Roughly! - Special handling exists for multi-splatting arguments as in
quote quote $$(x...) end end
- symbol / ssavalue ->
macrocall
proceeds with- Expand with
jl_invoke_julia_macro
- Call
eval
on the macro name (!!) to get the macro function. Look up the method. - Set up arguments for the macro calling convention
- Wraps errors in macro invocation in
LoadError
- Returns the expression, as well as the module at
which that method of that macro was defined and
LineNumberNode
where the macro was invoked in the source.
- Call
- Deep copy the AST
- Recursively expand child macros in the context of the module where the macrocall method was defined
- Wrap the result in
(hygienic-scope ,result ,newctx.m ,lineinfo)
(except for special case optimizations)
- Expand with
hygenic-scope
expandsargs[1]
withjl_expand_macros
, with the module of expansion set toargs[2]
. Ie, it's theExpr
representation of the module and expression arguments tomacroexpand
. The way this returns eitherhygenic-scope
or unwraps is a bit confusing.- "
do
macrocalls" have their own special handling because the macrocall is the child of thedo
. This seems like a mess!!
Scopes are documented in the Juila documentation on Scope of Variables
This pass disambiguates variables which have the same name in different scopes and fills in the list of local variables within each lambda.
As scope is a collection of variable names by category:
argument
- arguments to a lambdalocal
- variables declared local (at top level) or implicitly local (in lambdas) or desugared to local-defglobal
- variables declared global (in lambdas) or implicitly global (at top level)static-parameter
- lambda type arguments fromwhere
clauses
We traverse the AST starting at the root paying attention to certian nodes:
- Nodes representing identifiers (Identifier, operators, var)
- If a variable exists in the table, it's replaced with the value in the table.
- If it doesn't exist, it becomes an
outerref
- Variable scoping constructs:
local
,local-def
- collected by scope-block
- removed during traversal
- Scope metadata
softscope
,hardscope
- just removed - New scopes
lambda
creates a new scope containing itself and its arguments, otherwise copying the parent scope. It resolves the body with that new scope.scope-block
is really complicated - see below
- Scope queries
islocal
,locals
islocal
- statically expand to true/false based on whether var name is a local varlocals
- return list of locals - see@locals
require-existing-local
- somewhat likeislocal
, but allows globals too (whaa?! naming) and produces a lowering error immediately if variable is not known. Should be calledrequire-in-scope
??
break-block
,symbolicgoto
,symboliclabel
need special handling because one of their arguments is a non-quoted symbol.- Add static parameters for generated functions
with-static-parameters
method
- special handling for static params
scope-block
is the complicated bit. It's processed by
- Searching the expressions within the block for any
local
,local-def
,global
and assigned vars. Searching doesn't recurse intolambda
,scope-block
,module
andtoplevel
- Building lists of implicit locals or globals (depending on whether we're in a top level thunk)
- Figuring out which local variables need to be renamed. This is any local variable with a name which has already occurred in processing one of the previous scope blocks
- Check any conflicting local/global decls and soft/hard scope
- Build new scope with table of renames
- Resolve the body with the new scope, applying the renames
local-def
- flisp code explains this as- "a local that we know has an assignment that dominates all usages"
- "local declaration of a defined variable"
There's also this comment in JuliaLang/julia#22314:
mark the [...] variable as local-def, which would prevent it from getting Core.Boxed during the closure conversion it'll be detected as known-SSA
But maybe that's confusing. It seems like local-def
is a local which lowering
asserts is "always defined" / "definitely initialized before use". But it's not
necessarily single-assign, so not SSA.
See https://docs.julialang.org/en/v1/devdocs/ast/#Lowered-form
mutable struct CodeInfo
code::Vector{Any} # IR statements
codelocs::Vector{Int32} # `length(code)` Vector of indices into `linetable`
ssavaluetypes::Any # `length(code)` or Vector of inferred types after opt
ssaflags::Vector{UInt32} # flag for every statement in `code`
# 0 if meta statement
# inbounds_flag - 1 bit (LSB)
# inline_flag - 1 bit
# noinline_flag - 1 bit
# ... other 8 flags which are defined in compiler/optimize.jl
# effects_flags - 9 bits
method_for_inference_limit_heuristics::Any
linetable::Any
slotnames::Vector{Symbol} # names of parameters and local vars used in the code
slotflags::Vector{UInt8} # vinfo flags from flisp
slottypes::Any # nothing (used by typeinf)
rettype::Any # Any (used by typeinf)
parent::Any # nothing (used by typeinf)
edges::Any
min_world::UInt64
max_world::UInt64
inferred::Bool
propagate_inbounds::Bool
has_fcall::Bool
nospecializeinfer::Bool
inlining::UInt8
constprop::UInt8
purity::UInt16
inlining_cost::UInt16
end
In the current Julia runtime,
Base.eval()
- Uses
jl_toplevel_eval_in
which callsjl_toplevel_eval_flex
jl_toplevel_eval_flex(mod, ex)
- Lowers if necessay
- Evaluates certain blessed top level forms
:.
:module
:using
:import
:public
:export
:global
:const
:toplevel
:error
:incomplete
- Identifier and literals
- Otherwise expects
Expr(:thunk)
- Use codegen "where necessary/profitable" (eg ccall, has_loops etc)
- Otherwise interpret via
jl_interpret_toplevel_thunk
Should we lower the above blessed top level forms to julia runtime calls? Pros:
- Semantically sound. Lowering should do syntax checking in things like
Expr(:using)
rather than doing this in the runtime support functions. - Precise lowering error messages
- Replaces more Expr usage
- Replaces a whole pile of C code with significantly less Julia code
- Lowering output becomes more consistently imperative Cons:
- Lots more code to write
- May need to invent intermediate data structures to replace
Expr
- Bootstrap?
- Some forms require creating toplevel thunks
In general, we'd be replacing current declarative lowering targets like
Expr(:using)
with an imperative call to a Core
API instead. The call and
the setup of its arguments would need to go in a thunk. We've currently got an
odd mixture of imperative and declarative lowered code.
People look at Racket as an example of a very complete system of hygienic macros. We should learn from them, but keeping in mind that Racket's macro system is more inherently more complicated. Racket's current approach to hygiene is described in an accessible talk and in more depth in a paper.
Some differences which makes Racket's macro expander different from Julia:
- Racket allows local definitions of macros. Macro code can be embedded in an inner lexical scope and capture locals from that scope, but still needs to be executed at compile time. Julia supports macros at top level scope only.
- Racket goes to great lengths to execute the minimal package code necessary to expand macros; the "pass system". Julia just executes all top level statements in order when precompiling a package.
- As a lisp, Racket's surface syntax is dramatically simpler and more uniform