This changes a few things around binding logic, as part of eliminating
all of the legacy logic and weaving it into the new codebase:
* Give Scopes access to the Context object. Related, add a TryRegister
method to Scope that is like RequireRegister, except that instead of
fail-fast upon encountering a duplicate entry, it will issue an error.
* Move all typecheck visitation functions out of the big honkin' switch
and into their own member functions. As this stuff gets more complex,
having everything in one routine was starting to irk my sensibilities.
* Validate that packages have names.
* Store both the package symbol, plus the canonicalized URL used to
resolve it, in the package map. This will help us verify that versions
match for multiple package references resolving to the same symbol.
* Add nice inquiry methods to the Class symbol (Sealed, Abstract, Record,
Interface) that simplify accessing the modifiers on the underlying node.
The options structure will be shared between multiple passes of
compilation, including evaluation and graph generation. Therefore,
it must not be in the pkg/compile package, else we would create
package cycles. Now that the options structure is barebones --
and, in particular, no more "backend" settings pollute it -- this
refactoring actually works.
This is a bit of paranoia, however, an invariant of the typechecker
is that, after the final pass, all expression nodes are assigned a
type. This assertion checks that this is true.
Instead of serializing simple token strings into the AST -- in place of things
like type references, module references, export references, etc. -- we now use
1st class AST nodes. This ensures that source context flows with the tokens
as we bind them, etc., and also cleans up a few inconsistencies (like using an
ast.Identifier for NewExpression -- clearly wrong since this the resulting
MuIL is meant to contain fully bound semantic references).
This change includes some tests for token parsing and conversions. It
also fixes a bug where we treated Type tokens like ClassMembers, when
we ought to have been treating them like ModuleMembers.
This change performs typechecking during binding. This is less about
typechecking per se -- since higher level languages will have presumably
given us well-typed IL -- and more about preparing the AST so that we
can evaluate the fully bound nodes to produce a MuGL graph. It also
serves as a "verifier" for the incoming MuIL, however.
This is clearly incomplete, as the dozens of TODOs will make obvious.
But it's a clean checkpoint that does enough interesting typechecking
that I am landing it now.
This change begins to bind function bodies. This must be done as a
second pass over the AST, because dependencies between modules, and
even intra-module dependencies, might refer to top-level symbols like
types, variables, and functions, and so must be established first.
At the moment, the only node kind we handle is ast.Block, which
merely pushes and pops lexical scopes; however, the next step is to
implement the AST node-specific visitation logic for all statement
and expression nodes.
I've also rearranged how Scopes work to be a little easier to use.
The Scope type now remembers the **Scope slot in which it is rooted,
so that we can simply call Push and Pop on Scopes and have the right
thing happen.
This introduces symbol factory methods to make creating them
less error prone. In particular, we hadn't been wiring up parents
properly (since they came in after the initial symbol shape).
Now with the factory methods, we'll be reforced to visit creation
sites whenever adding new required elements to symbol types.
This change rearranges the old way we dealt with URLs. In the old system,
virtually every reference to an element, including types, was fully qualified
with a possible URL-like reference. (The old pkg/tokens/Ref type.) In the
new model, only dependency references are URL-like. All maps and references
within the MuPack/MuIL format are token and name based, using the new
pkg/tokens/Token and pkg/tokens/Name family of related types.
As such, this change renames Ref to PackageURLString, and RefParts to
PackageURL. (The convenient name is given to the thing with "more" structure,
since we prefer to deal with structured types and not strings.) It moves
out of the pkg/tokens package and into pkg/pack, since it is exclusively
there to support package resolution. Similarly, the Version, VersionSpec,
and related types move out of pkg/tokens and into pkg/pack.
This change cleans up the various binder, package, and workspace logic.
Most of these changes are a natural fallout of this overall restructuring,
although in a few places we remained sloppy about the difference between
Token, Name, and URL. Now the type system supports these distinctions and
forces us to be more methodical about any conversions that take place.
This rearranges the library code:
* sdk/... goes away.
* What used to be sdk/javascript/ is now lib/mu/, an actual MuPackage
that provides the base abstractions for all other MuPackages to use.
* lib/aws is the @mu/aws MuPackage that exposes all AWS resources.
* lib/mux is the @mu/x MuPackage that provides cross-cloud abstractions.
A lot of what used to be in lib/mu goes here. In particular, autoscaler,
func, ..., all the "general purpose" abstractions, really.
In the old system, the core runtime/toolset understood that we are targeting
specific cloud providers at a very deep level. In fact, the whole code-generation
phase of the compiler was based on it.
In the new system, this difference is less of a "special" concern, and more of
a general one of mapping MuIL objects to resource providers, and letting *them*
gather up any configuration they need in a more general purpose way.
Therefore, most of this stuff can go. I've merged in a small amount of it to
the mu/x MuPackage, since that has to switch on cloud IaaS and CaaS providers in
order to decide what kind of resources to provision. For example, it has a
mu.x.Cluster stack type that itself provisions a lot of the barebone essential
resources, like a virtual private cloud and its associated networking components.
I suspect *some* knowledge of this will surface again as we implement more
runtime presence (discovery, etc). But for the time being, it's a distraction
getting the core model running. I've retained some of the old AWS code in the
new pkg/resource/providers/aws package, in case I want to reuse some of it when
implementing our first AWS resource providers. (Although we won't be using
CloudFormation, some of the name generation code might be useful.) So, the
ships aren't completely burned to the ground, but they are certainly on 🔥.
As I do local development, I noticed errant newlines in the error
messages coming from TypeScript. That's because its formatting appends
newlines, whereas ours does not (and requires the code printing the
diagnostic to add one). To make these uniform, we will strip newlines,
if they exist, from the TypeScript preformatted diagnostics.
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
This change implements a significant amount of the top-level package
and module binding logic, including module and class members. It also
begins whittling away at the legacy binder logic (which I expect will
disappear entirely in the next checkin).
The scope abstraction has been rewritten in terms of the new tokens
and symbols layers. Each scope has a symbol table that associates
names with bound symbols, which can be used during lookup. This
accomplishes lexical scoping of the symbol names, by pushing and
popping at the appropriate times. I envision all name resolution to
happen during this single binding pass so that we needn't reconstruct
lexical scoping more than once.
Note that we need to do two passes at the top-level, however. We
must first bind module-level member names to their symbols *before*
we bind any method bodies, otherwise legal intra-module references
might turn up empty-handed during this binding pass.
There is also a type table that associates types with ast.Nodes.
This is how we avoid needing a complete shadow tree of nodes, and/or
avoid needing to mutate the nodes in place. Every node with a type
gets an entry in the type table. For example, variable declarations,
expressions, and so on, each get an entry. This ensures that we can
access type symbols throughout the subsequent passes without needing
to reconstruct scopes or emulating lexical scoping (as described above).
This is a work in progress, so there are a number of important TODOs
in there associated with symbol table management and body binding.
This massages the symbol layer to reflect more closely what we need.
There is a Symbol interface. It is an interface because it's polymorphic
and we'll need to switch on type tests throughout the code a fair bit.
In addition to the Symbol interface, there are three other interfaces:
* ModuleMember, for any Module member symbols.
* ClassMember, for any Class member symbols.
* Type, to permit polymorphic treatment of Classes and built-in types.
There are concrete symbols for Module, ModuleProperty, ModuleMethod,
Class, ClassProperty, and ClassMethod. These map directly to the
corresponding AST abstractions and simply permit us to annotate those
AST nodes with some semantic information and, more importantly, to
inject them into the symbol table as we perform binding/typechecking.
Class implements the Type abstraction.
There is also a primitive node with four constant types, AnyType,
BoolType, NumberType, and StringType, each of which is registered in
an export Primitives map, keyed by their name/keyword/token. These
of course implement the Type abstraction.
Finally, there are ArrayType and MapType symbols, which also implement
Type. They wrap other types as keys/elements.
I'm peeling off this part from my gigantic pending change, since this is
mostly standalone, and ideally leads to more independent chunks.
This change further merges the new AST and MuPack/MuIL formats and
abstractions into the core of the compiler. A good amount of the old
code is gone now; I decided against ripping it all out in one fell
swoop so that I can methodically check that we are preserving all
relevant decisions and/or functionality we had in the old model.
The changes are too numerous to outline in this commit message,
however, here are the noteworthy ones:
* Split up the notion of symbols and tokens, resulting in:
- pkg/symbols for true compiler symbols (bound nodes)
- pkg/tokens for name-based tokens, identifiers, constants
* Several packages move underneath pkg/compiler:
- pkg/ast becomes pkg/compiler/ast
- pkg/errors becomes pkg/compiler/errors
- pkg/symbols becomes pkg/compiler/symbols
* pkg/ast/... becomes pkg/compiler/legacy/ast/...
* pkg/pack/ast becomes pkg/compiler/ast.
* pkg/options goes away, merged back into pkg/compiler.
* All binding functionality moves underneath a dedicated
package, pkg/compiler/binder. The legacy.go file contains
cruft that will eventually go away, while the other files
represent a halfway point between new and old, but are
expected to stay roughly in the current shape.
* All parsing functionality is moved underneath a new
pkg/compiler/metadata namespace, and we adopt new terminology
"metadata reading" since real parsing happens in the MetaMu
compilers. Hence, Parser has become metadata.Reader.
* In general phases of the compiler no longer share access to
the actual compiler.Compiler object. Instead, shared state is
moved to the core.Context object underneath pkg/compiler/core.
* Dependency resolution during binding has been rewritten to
the new model, including stashing bound package symbols in the
context object, and detecting import cycles.
* Compiler construction does not take a workspace object. Instead,
creation of a workspace is entirely hidden inside of the compiler's
constructor logic.
* There are three Compile* functions on the Compiler interface, to
support different styles of invoking compilation: Compile() auto-
detects a Mu package, based on the workspace; CompilePath(string)
loads the target as a Mu package and compiles it, regardless of
the workspace settings; and, CompilePackage(*pack.Package) will
compile a pre-loaded package AST, again regardless of workspace.
* Delete the _fe, _sema, and parsetree phases. They are no longer
relevant and the functionality is largely subsumed by the above.
...and so very much more. I'm surprised I ever got this to compile again!
This change introduces a new visitation API to the new MuIL AST.
The ast.Walk API takes an ast.Visitor implementation and walks the
tree in depth-first order, invoking the visitor along the way.
The visitor gets to choose whether to continue visitation (by returning
a non-nil visitor object), or to stop it (by returning nil). The
visitation will proceed with that returned visitor, so that a visitor
can "swap out" the visitor used for child nodes if needed.
At the end, the PostVisit function is called, for any clean up logic.
Finally, the ast.Inspector type is available as a simple way of consing
up visitors simply using a function that returns a bool indicating
whether visitation should continue.
This change helps move us one step closer to eliminating the old metadata-
based AST goo, and replacing it with MuPack/MuIL AST and symbol information.
In particular, all name/token "symbol" code -- things like identifiers,
package/member references, and version specs -- move out of the pkg/ast
package and into the top-level pkg/symbols package, alongside the existing
MuPack/MuIL symbol token types.
This is the first change of many to merge the MuPack/MuIL formats
into the heart of the "compiler".
In fact, the entire meaning of the compiler has changed, from
something that took metadata and produced CloudFormation, into
something that takes MuPack/MuIL as input, and produces a MuGL
graph as output. Although this process is distinctly different,
there are several aspects we can reuse, like workspace management,
dependency resolution, and some amount of name binding and symbol
resolution, just as a few examples.
An overview of the compilation process is available as a comment
inside of the compiler.Compile function, although it is currently
unimplemented.
The relationship between Workspace and Compiler has been semi-
inverted, such that all Compiler instances require a Workspace
object. This is more natural anyway and moves some of the detection
logic "outside" of the Compiler. Similarly, Options has moved to
a top-level package, so that Workspace and Compiler may share
access to it without causing package import cycles.
Finally, all that templating crap is gone. This alone is cause
for mass celebration!
The prior workaround to avoid truncated pending stdout writes, it
turns out, doesn't actually work. (Piping output leads to more buffering
and asynchrony, and turned up additional problems.) Digging through some
GitHub issues led me to these "best practices":
https://nodejs.org/api/process.html#process_process_exit_code
The reason this is problematic is because writes to process.stdout in
Node.js are sometimes non-blocking and may occur over multiple ticks of
the Node.js event loop. Calling process.exit(), however, forces the
process to exit before those additional writes to stdout can be performed.
Rather than calling process.exit() directly, the code should set the
process.exitCode and allow the process to exit naturally by avoiding
scheduling any additional work for the event loop.
This change adopts this part of the best practices, by simply setting
exitCode upon normal termination and letting the event loop quiesce.
Note that I am still not obeying the other part of the guidance:
If it is necessary to terminate the Node.js process due to an error
condition, throwing an uncaught error and allowing the process to
terminate accordingly is safer than calling process.exit().
Somewhat confusingly, writes to process.stderr do not suffer from these
same problems, because writes to stderr are synchronous. We prefer to
tear down the process gracefully, without an unhandled exception, and
we are okay losing some stdout writes as a result, given that all error-
related ones will have gone to stderr.
This adds scaffolding but no real functionality yet, as part of
marapongo/mu#41. I am landing this now because I need to take a
not-so-brief detour to gut and overhaul the core of the existing
compiler (parsing, semantic analysis, binding, code-gen, etc),
including merging the new pkg/pack contents back into the primary
top-level namespaces (like pkg/ast and pkg/encoding).
After that, I can begin driving the compiler to achieve the
desired effects of mu compile, first and foremost, and then plan
and apply later on.
This change now recognizes Mu.yaml files, in addition to Mu.json,
from the MuJS compiler. Not the most important thing in the world,
however all of our project files are in YAML and it's less work to
implement this than to convert them all to JSON ...
This change tracks the set of imported modules in the ast.Module
structure. Although we can in principle gather up all imports simply
by looking through the fully qualified names, that's slightly hokey;
and furthermore, to properly initialize all modules, we need to know
in which order to do it (in case there are dependencies). I briefly
considered leaving it up to MetaMu compilers to inject the module
initialization calls explicitly -- for infinite flexibility and perhaps
greater compatibility with the source languages -- however, I'd much
prefer that all Mu code use a consistent module initialization story.
Therefore, MetaMus declare the module imports, in order, and we will
evaluate the initializers accordingly.
This change emits more types. In particular:
* Previously, only primitive types got emitted, yielding "any" for any
custom types. Now we emit custom types, including fully qualified
module names for type references resolving to imported modules.
* Prior to this change, we erroneously used the type node on the function
declaration itself as an approximation for return type. To get the
true return type, we need to dig through a few nodes, including the
Declaration and Signature. This change now properly emits return types.
This doesn't close out marapongo/mu#46, however we are getting close.