This change revives some compiler tests that are still lingering around
from the old architecture, before our latest round of ship burning.
It also fixes up some bugs uncovered during this:
* Don't claim that a symbol's kind is incorrect in the binder error
message when it wasn't found. Instead, say that it was missing.
* Do not attempt to compile if an error was issued during workspace
resolution and/or loading of the Mufile. This leads to trying to
load an empty path and badness quickly ensues (crash).
* Issue an error if the Mufile wasn't found (this got lost apparently).
* Rename the ErrorMissingPackageName message to ErrorInvalidPackageName,
since missing names are now caught by our new fancy decoder that
understands required versus optional fields. We still need to guard
against illegal characters in the name, including the empty string "".
* During decoding, reject !src.IsValid elements. This represents the
zero value and should be treated equivalently to a missing field.
* Do not permit empty strings "" as Names or QNames. The old logic
accidentally permitted them because regexp.FindString("") == "", no
matter the regex!
* Move the TestDiagSink abstraction to a new pkg/util/testutil package,
allowing us to share this common code across multiple package tests.
* Fix up a few messages that needed tidying or to use Infof vs. Info.
The binder tests -- deleted in this -- are about to come back, however,
I am splitting up the changes, since this represents a passing fixed point.
This change looks up the main module from a package, and that module's
entrypoint, when performing evaluation. In any case, the arguments are
validated and bound to the resulting function's parameters.
This change eliminates all traces of the old, legacy errors and warnings.
It also renumbers everything so that we don't have any "gaps". Finally,
it introduces a factory method for new errors and warnings, private to
this package, that ensures we don't accidentally register duplicate IDs.
This change deletes a bunch of old legacy code. I've tagged the
prior changelist as "0.1" in case we need to go back and recover
some of the code (as I expect some of the constraint type logic
and AWS-specific code to come in handy down the road).
🔥🔥🔥
This adds support for LoadLocationExpression, which will be key to
getting virtually everything else working, including class and module
member accesses, local variable gets/sets, function calls, and more.
In particular, an extra level of indirection is added to the globals,
locals, and properties maps. This new thing is called a *Reference,
and it is simply a mutable slot that contains an *Object. Instead
of storing and retrieving objects directly, this extra level of
indirection enables us to create *Object instances that themselves
simply refer to other objects (for subsequent loads and stores).
For now, we do not permit overwriting of method properties. This
is a future work item (see marapongo/mu#56).
This change adds support for function calls. We do not yet wire it
into the various function call expressions, however, that should be
relatively trivial after this change. (It is dependent on figuring
out the runtime representation of load location objects and subsequent
uses of them). This change also updates the scoping logic to respect
"activation frame" style of lexical scoping, where an inner function
as a result of a call should not have access to the callee context.
This change also includes logic to detect unhandled exceptions and to
print an error as a result. Eventually, we will want stack traces here.
This change eliminates the scope-based symbol table. Because we now
require that all module, type, function, and variable elements are
encoded as fully qualified tokens, there is no need for the scope-based
lookups. Instead, the languages themselves decide how the names bind
to locations and just encode that information directly.
The scope is still required for local variables, however, since those
don't have a well-defined "fixed" notion of name. This is also how
we will ensure the evaluator stores values correctly -- including
discarding them -- in a lexically scoped manner.
This change checks in an enormously rudimentary interpreter. There is
a lot left to do, as evidenced by the countless TODOs scattered throughout
pkg/compiler/eval/eval.go. Nevertheless, the scaffolding and some important
pieces are included in this change.
In particular, to evaluate a package, we must locate its entrypoint and then,
using the results of binding, systematically walk the full AST. As we do
so, we will assert that aspects of the AST match the expected shape,
including symbols and their types, and produce value objects for expressions.
An *unwind structure is used for returns, throws, breaks, and continues (both
labeled and unlabeled). Each statement or expression visitation may optionally
return one and its presence indicates that control flow transfer is occurring.
The visitation logic then needs to propagate this; usually just by bailing out
of the current context immediately, but sometimes -- as is the case with
TryCatchBlock statements, for instance -- the unwind is manipulated in more
"interesting" ways.
An *Object structure is used for expressions yielding values. This is a
"runtime object" in our system and is comprised of three things: 1) a Type
(essentially its v-table), 2) an optional data pointer (for tagged primitives),
and 3) an optional bag of properties (for complex object property values).
I am still on the fence about whether to unify the data representations.
The hokiest aspect of this change is the scoping and value management. I am
trying to avoid needing to implement any sort of "garbage collection", which
means our bag-of-values approach will not work. Instead, we will need to
arrange for scopes to be erected and discarded in the correct order during
evaluation. I will probably tackle that next, along with fleshing out the
many missing statement and expression cases (...and tests, of course).
This change introduces a binder.Context structure, with a
*core.Context embedded, that carries additional semantic information
forward to future passes in the compiler. In particular, this is how
the evaluation/interpretation phase will gain access to types, scopes,
and symbols.
This change memoizes decorated types to avoid creating boatloads
of redundant symbols at runtime. We want it to be the case that you
can create instances of decorated types as needed throughout the
compiler without worry for excess garbage (e.g., arrays, pointers, etc).
During evaluation, we need to perform very delicate walking of the AST,
and are likely to use the ability to substitute visitors "in situ." As
a result, I'd like to make anonymous visitors easier to produce. This
change permits both single function forms of Visit/After, in addition to
anonymous structures that pair up a Visit and an After, to produce a Visitor.
This change completes my testing of decorator parsing for now. It tests the token
`*[]map[string]map[()*(bool,string,test/package:test/module/Crazy)number][][]test/package:test/module/Crazy`.
This turned up some bugs, most notably in the way we returned the "full" token for
the parsed types. We need to extract the subset of the token consumed by the parsing
routine, rather than the entire thing. To do this, we introduce a tokenBuffer type
that allows for convenient parsing of tokens (eating, advancing, extraction, etc).
This isn't comprehensive yet, however it caught two bugs:
1. parseNextType should operate on "rest" in most cases, not "tok".
2. We must eat the "]" map separator before moving on to the element type.
Part of the token grammar permits so-called "decorated" types. These
are tokens that are pointer, array, map, or function types. For example:
* `*any`: a pointer to anything.
* `[]string`: an array of primitive strings.
* `map[string]number`: a map from strings to numbers.
* `(string,string)bool`: a function with two string parameters and a
boolean return type.
* `[]aws:s3/Bucket`: an array of objects whose class is `Bucket` from
the package `aws` and its module `s3`.
This change introduces this notion into the parsing and handling of
type tokens. In particular, it uses recursive parsing to handle complex
nested structures, and the binder.bindTypeToken routine has been updated
to call out to these as needed, in order to produce the correct symbol.
This changes a few things around binding logic, as part of eliminating
all of the legacy logic and weaving it into the new codebase:
* Give Scopes access to the Context object. Related, add a TryRegister
method to Scope that is like RequireRegister, except that instead of
fail-fast upon encountering a duplicate entry, it will issue an error.
* Move all typecheck visitation functions out of the big honkin' switch
and into their own member functions. As this stuff gets more complex,
having everything in one routine was starting to irk my sensibilities.
* Validate that packages have names.
* Store both the package symbol, plus the canonicalized URL used to
resolve it, in the package map. This will help us verify that versions
match for multiple package references resolving to the same symbol.
* Add nice inquiry methods to the Class symbol (Sealed, Abstract, Record,
Interface) that simplify accessing the modifiers on the underlying node.
The options structure will be shared between multiple passes of
compilation, including evaluation and graph generation. Therefore,
it must not be in the pkg/compile package, else we would create
package cycles. Now that the options structure is barebones --
and, in particular, no more "backend" settings pollute it -- this
refactoring actually works.
This is a bit of paranoia, however, an invariant of the typechecker
is that, after the final pass, all expression nodes are assigned a
type. This assertion checks that this is true.
Instead of serializing simple token strings into the AST -- in place of things
like type references, module references, export references, etc. -- we now use
1st class AST nodes. This ensures that source context flows with the tokens
as we bind them, etc., and also cleans up a few inconsistencies (like using an
ast.Identifier for NewExpression -- clearly wrong since this the resulting
MuIL is meant to contain fully bound semantic references).
This change includes some tests for token parsing and conversions. It
also fixes a bug where we treated Type tokens like ClassMembers, when
we ought to have been treating them like ModuleMembers.
This change performs typechecking during binding. This is less about
typechecking per se -- since higher level languages will have presumably
given us well-typed IL -- and more about preparing the AST so that we
can evaluate the fully bound nodes to produce a MuGL graph. It also
serves as a "verifier" for the incoming MuIL, however.
This is clearly incomplete, as the dozens of TODOs will make obvious.
But it's a clean checkpoint that does enough interesting typechecking
that I am landing it now.
This change begins to bind function bodies. This must be done as a
second pass over the AST, because dependencies between modules, and
even intra-module dependencies, might refer to top-level symbols like
types, variables, and functions, and so must be established first.
At the moment, the only node kind we handle is ast.Block, which
merely pushes and pops lexical scopes; however, the next step is to
implement the AST node-specific visitation logic for all statement
and expression nodes.
I've also rearranged how Scopes work to be a little easier to use.
The Scope type now remembers the **Scope slot in which it is rooted,
so that we can simply call Push and Pop on Scopes and have the right
thing happen.
This introduces symbol factory methods to make creating them
less error prone. In particular, we hadn't been wiring up parents
properly (since they came in after the initial symbol shape).
Now with the factory methods, we'll be reforced to visit creation
sites whenever adding new required elements to symbol types.
This change rearranges the old way we dealt with URLs. In the old system,
virtually every reference to an element, including types, was fully qualified
with a possible URL-like reference. (The old pkg/tokens/Ref type.) In the
new model, only dependency references are URL-like. All maps and references
within the MuPack/MuIL format are token and name based, using the new
pkg/tokens/Token and pkg/tokens/Name family of related types.
As such, this change renames Ref to PackageURLString, and RefParts to
PackageURL. (The convenient name is given to the thing with "more" structure,
since we prefer to deal with structured types and not strings.) It moves
out of the pkg/tokens package and into pkg/pack, since it is exclusively
there to support package resolution. Similarly, the Version, VersionSpec,
and related types move out of pkg/tokens and into pkg/pack.
This change cleans up the various binder, package, and workspace logic.
Most of these changes are a natural fallout of this overall restructuring,
although in a few places we remained sloppy about the difference between
Token, Name, and URL. Now the type system supports these distinctions and
forces us to be more methodical about any conversions that take place.
In the old system, the core runtime/toolset understood that we are targeting
specific cloud providers at a very deep level. In fact, the whole code-generation
phase of the compiler was based on it.
In the new system, this difference is less of a "special" concern, and more of
a general one of mapping MuIL objects to resource providers, and letting *them*
gather up any configuration they need in a more general purpose way.
Therefore, most of this stuff can go. I've merged in a small amount of it to
the mu/x MuPackage, since that has to switch on cloud IaaS and CaaS providers in
order to decide what kind of resources to provision. For example, it has a
mu.x.Cluster stack type that itself provisions a lot of the barebone essential
resources, like a virtual private cloud and its associated networking components.
I suspect *some* knowledge of this will surface again as we implement more
runtime presence (discovery, etc). But for the time being, it's a distraction
getting the core model running. I've retained some of the old AWS code in the
new pkg/resource/providers/aws package, in case I want to reuse some of it when
implementing our first AWS resource providers. (Although we won't be using
CloudFormation, some of the name generation code might be useful.) So, the
ships aren't completely burned to the ground, but they are certainly on 🔥.
I was sloppy in my use of names versus tokens in the original AST.
Now that we're actually binding things to concrete symbols, etc., we
need to be more precise. In particular, names are just identifiers
that must be "interpreted" in a given lexical context for them to
make any sense; whereas, tokens stand alone and can be resolved without
context other than the set of imported packages, modules, and overall
module structure. As such, names are much simpler than tokens.
As explained in the comments, tokens.Names are simple identifiers:
Name = [A-Za-z_][A-Za-z0-9_]*
and tokens.QNames are fully qualified identifiers delimited by "/":
QName = [ <Name> "/" ]* <Name>
The legal grammar for a token depends on the subset of symbols that
token is meant to represent. However, the most general case, that
accepts all specializations of tokens, is roughly as follows:
Token = <Name> |
<PackageName>
[ ":" <ModuleName>
[ "/" <ModuleMemberName>
[ "." <Class MemberName> ]
]
]
where:
PackageName = <QName>
ModuleName = <QName>
ModuleMemberName = <Name>
ClassMemberName = <Name>
Please refer to the comments in pkg/tokens/tokens.go for more details.
This change implements a significant amount of the top-level package
and module binding logic, including module and class members. It also
begins whittling away at the legacy binder logic (which I expect will
disappear entirely in the next checkin).
The scope abstraction has been rewritten in terms of the new tokens
and symbols layers. Each scope has a symbol table that associates
names with bound symbols, which can be used during lookup. This
accomplishes lexical scoping of the symbol names, by pushing and
popping at the appropriate times. I envision all name resolution to
happen during this single binding pass so that we needn't reconstruct
lexical scoping more than once.
Note that we need to do two passes at the top-level, however. We
must first bind module-level member names to their symbols *before*
we bind any method bodies, otherwise legal intra-module references
might turn up empty-handed during this binding pass.
There is also a type table that associates types with ast.Nodes.
This is how we avoid needing a complete shadow tree of nodes, and/or
avoid needing to mutate the nodes in place. Every node with a type
gets an entry in the type table. For example, variable declarations,
expressions, and so on, each get an entry. This ensures that we can
access type symbols throughout the subsequent passes without needing
to reconstruct scopes or emulating lexical scoping (as described above).
This is a work in progress, so there are a number of important TODOs
in there associated with symbol table management and body binding.
This massages the symbol layer to reflect more closely what we need.
There is a Symbol interface. It is an interface because it's polymorphic
and we'll need to switch on type tests throughout the code a fair bit.
In addition to the Symbol interface, there are three other interfaces:
* ModuleMember, for any Module member symbols.
* ClassMember, for any Class member symbols.
* Type, to permit polymorphic treatment of Classes and built-in types.
There are concrete symbols for Module, ModuleProperty, ModuleMethod,
Class, ClassProperty, and ClassMethod. These map directly to the
corresponding AST abstractions and simply permit us to annotate those
AST nodes with some semantic information and, more importantly, to
inject them into the symbol table as we perform binding/typechecking.
Class implements the Type abstraction.
There is also a primitive node with four constant types, AnyType,
BoolType, NumberType, and StringType, each of which is registered in
an export Primitives map, keyed by their name/keyword/token. These
of course implement the Type abstraction.
Finally, there are ArrayType and MapType symbols, which also implement
Type. They wrap other types as keys/elements.
I'm peeling off this part from my gigantic pending change, since this is
mostly standalone, and ideally leads to more independent chunks.
This change further merges the new AST and MuPack/MuIL formats and
abstractions into the core of the compiler. A good amount of the old
code is gone now; I decided against ripping it all out in one fell
swoop so that I can methodically check that we are preserving all
relevant decisions and/or functionality we had in the old model.
The changes are too numerous to outline in this commit message,
however, here are the noteworthy ones:
* Split up the notion of symbols and tokens, resulting in:
- pkg/symbols for true compiler symbols (bound nodes)
- pkg/tokens for name-based tokens, identifiers, constants
* Several packages move underneath pkg/compiler:
- pkg/ast becomes pkg/compiler/ast
- pkg/errors becomes pkg/compiler/errors
- pkg/symbols becomes pkg/compiler/symbols
* pkg/ast/... becomes pkg/compiler/legacy/ast/...
* pkg/pack/ast becomes pkg/compiler/ast.
* pkg/options goes away, merged back into pkg/compiler.
* All binding functionality moves underneath a dedicated
package, pkg/compiler/binder. The legacy.go file contains
cruft that will eventually go away, while the other files
represent a halfway point between new and old, but are
expected to stay roughly in the current shape.
* All parsing functionality is moved underneath a new
pkg/compiler/metadata namespace, and we adopt new terminology
"metadata reading" since real parsing happens in the MetaMu
compilers. Hence, Parser has become metadata.Reader.
* In general phases of the compiler no longer share access to
the actual compiler.Compiler object. Instead, shared state is
moved to the core.Context object underneath pkg/compiler/core.
* Dependency resolution during binding has been rewritten to
the new model, including stashing bound package symbols in the
context object, and detecting import cycles.
* Compiler construction does not take a workspace object. Instead,
creation of a workspace is entirely hidden inside of the compiler's
constructor logic.
* There are three Compile* functions on the Compiler interface, to
support different styles of invoking compilation: Compile() auto-
detects a Mu package, based on the workspace; CompilePath(string)
loads the target as a Mu package and compiles it, regardless of
the workspace settings; and, CompilePackage(*pack.Package) will
compile a pre-loaded package AST, again regardless of workspace.
* Delete the _fe, _sema, and parsetree phases. They are no longer
relevant and the functionality is largely subsumed by the above.
...and so very much more. I'm surprised I ever got this to compile again!
This change introduces a new visitation API to the new MuIL AST.
The ast.Walk API takes an ast.Visitor implementation and walks the
tree in depth-first order, invoking the visitor along the way.
The visitor gets to choose whether to continue visitation (by returning
a non-nil visitor object), or to stop it (by returning nil). The
visitation will proceed with that returned visitor, so that a visitor
can "swap out" the visitor used for child nodes if needed.
At the end, the PostVisit function is called, for any clean up logic.
Finally, the ast.Inspector type is available as a simple way of consing
up visitors simply using a function that returns a bool indicating
whether visitation should continue.
This change helps move us one step closer to eliminating the old metadata-
based AST goo, and replacing it with MuPack/MuIL AST and symbol information.
In particular, all name/token "symbol" code -- things like identifiers,
package/member references, and version specs -- move out of the pkg/ast
package and into the top-level pkg/symbols package, alongside the existing
MuPack/MuIL symbol token types.
This is the first change of many to merge the MuPack/MuIL formats
into the heart of the "compiler".
In fact, the entire meaning of the compiler has changed, from
something that took metadata and produced CloudFormation, into
something that takes MuPack/MuIL as input, and produces a MuGL
graph as output. Although this process is distinctly different,
there are several aspects we can reuse, like workspace management,
dependency resolution, and some amount of name binding and symbol
resolution, just as a few examples.
An overview of the compilation process is available as a comment
inside of the compiler.Compile function, although it is currently
unimplemented.
The relationship between Workspace and Compiler has been semi-
inverted, such that all Compiler instances require a Workspace
object. This is more natural anyway and moves some of the detection
logic "outside" of the Compiler. Similarly, Options has moved to
a top-level package, so that Workspace and Compiler may share
access to it without causing package import cycles.
Finally, all that templating crap is gone. This alone is cause
for mass celebration!
This change tracks the set of imported modules in the ast.Module
structure. Although we can in principle gather up all imports simply
by looking through the fully qualified names, that's slightly hokey;
and furthermore, to properly initialize all modules, we need to know
in which order to do it (in case there are dependencies). I briefly
considered leaving it up to MetaMu compilers to inject the module
initialization calls explicitly -- for infinite flexibility and perhaps
greater compatibility with the source languages -- however, I'd much
prefer that all Mu code use a consistent module initialization story.
Therefore, MetaMus declare the module imports, in order, and we will
evaluate the initializers accordingly.
This change makes considerable progress on the `mu describe` command;
the only thing remaining to be implemented now is full IL printing. It
now prints the full package/module structure.
For example, to print the set of exports from our scenarios/point test:
$ mujs tools/mujs/tests/output/scenarios/point/ | mu describe - -e
package "scenarios/point" {
dependencies []
module "index" {
class "Point" [public] {
method "add": (other: any): any
property "x" [public, readonly]: number
property "y" [public, readonly]: number
method ".ctor": (x: number, y: number): any
}
}
}
This is just a pretty-printed, but is coming in handy with debugging.
The NewExpression AST node type was missing a JSON annotation on
its Type field, leading to decoding errors.
Now, with this, the full suite of MuJS test cases can be unmarshaled
into fully populated MuPack and MuIL structures.
This change overhauls the approach to custom decoding. Instead of decoding
the parts of the struct that are "trivial" in one pass, and then patching up
the structure afterwards with custom decoding, the decoder itself understands
the notion of custom decoder functions.
First, the general purpose logic has moved out of pkg/pack/encoding and into
a new package, pkg/util/mapper. Most functions are now members of a new top-
level type, Mapper, which may be initialized with custom decoders. This
is a map from target type to a function that can decode objects into it.
Second, the AST-specific decoding logic is rewritten to use it. All AST nodes
are now supported, including definitions, statements, and expressions. The
overall approach here is to simply define a custom decoder for any interface
type that will occur in a node field position. The mapper, upon encountering
such a type, will consult the custom decoder map; if a decoder is found, it
will be used, otherwise an error results. This decoder then needs to switch
on the type discriminated kind field that is present in the metadata, creating
a concrete struct of the right type, and then converting it to the desired
interface type. Note that, subtly, interface types used only for "marker"
purposes don't require any custom decoding, because they do not appear in
field positions and therefore won't be encountered during the decoding process.
This change splits up the decoding logic into multiple files, to
mirror the AST package structure that the functions correspond to.
Additionally, there is now less "loose" reflection and dynamic lookup
code scattered throughout; it is now consolidated into the decoder,
with a set of "generic" functions like `fieldObject`, `asString`, etc.
This change implements custom class member decoding. As with module methods,
the function body AST nodes remain nil, as custom AST decoding isn't yet done.
This change begins to implement some of the AST custom decoding, beneath
the Package's Module map. In particular, we now unmarshal "one level"
beyond this, populating each Module's ModuleMember map. This includes
Classes, Exports, ModuleProperties, and ModuleMethods. The Class AST's
Members have been marked "custom", in addition to Block's Statements,
because they required kind-directed decoding. But Exports and
ModuleProperties can be decoded entirely using the tag-directed decoding
scheme. Up next, custom decoding of ClassMembers. At that point, all
definition-level decoding will be done, leaving MuIL's ASTs.
This change just moves the assertion/failure functions from the pkg/util
package to pkg/util/contract, so things read a bit nicer (i.e.,
`contract.Assert(x)` versus `util.Assert(x)`).
This fixes a few things so that MuPackages now unmarshal:
* Mark Module.Members as requiring "custom" decoding. This is required
because that's the first point in the tree that leverages polymorphism.
Everything "above" this unmarshals just fine (e.g., package+modules).
* As such, stop marking Package.Modules as "custom".
* Add the Kind field to the Node. Although we won't use this for type
discrimination in the same way, since Go gives us RTTI on the structs,
it is required for unmarshaling (to avoid "unrecognized fields" errors)
and it's probably handy to have around for logging, messages, etc.
* Mark Position.Line and Column with "json" annotations so that they
unmarshal correctly.
This change adjusts pointers correctly when unmarshaling into target
pointer types. This handles arrays and maps of pointer elements, in
addition to consolidating existing logic for marshaling into a
destination top-level pointer as well.
This change eliminates boilerplate decoding logic in all the different
data structures, and instead uses a new tag-directed decoding scheme.
This works a lot like the JSON deserializers, in that it recognizes the
`json:"name"` tags, except that we permit annotation of fields that
require custom deserialization, as `json:"name,custom"`. The existing
`json:"name,omitempty"` tag is recognized for optional fields.
This adds basic custom decoding for the MuPack metadata section of
the incoming JSON/YAML. Because of the type discriminated union nature
of the incoming payload, we cannot rely on the simple built-in JSON/YAML
unmarshaling behavior. Note that for the metadata section -- what is
in this checkin -- we could have, but the IL AST nodes are problematic.
(To know what kind of structure to creat requires inspecting the "kind"
field of the IL.) We will use a reflection-driven walk of the target
structure plus a weakly typed deserialized map[string]interface{}, as
is fairly customary in Go for scenarios like this (though good libaries
seem to be lacking in this area...).
This change carries over all of the metadata shapes in the MuPack
and MuIL file formats to our Go toolset. This includes creating a
proper discriminated AST type tree along with correct annotations
so that the metadata will serialize and deserialize correctly.
* Persue the default/optional checking if a property value == nil.
* Use the Interface() function to convert a reflect.Type to its underlying
interface{} value. This is required for typechecking to check out.
* Also, unrelated to the above, change type assertions to use nil rather than
allocating real objects. Although minimal, this incurs less GC pressure.
In some cases, we want to specialize template generation based on
the options passed to the compiler. This change flows them through
so that they can be accessed as
{{if .Options.SomeSetting}}
...
{{end}}
This change reverts the syntax for arrays back to T[] from []T. The main
reason is that YAML doesn't permit unquoted strings beginning with [], meaning
any array type needs to be quoted as in "[]T", which is annoying compared to all
other primitive types which don't require quotes. And, anyway, this syntax is
more familiar too.
I've also added a number of tests.
Any of the bindXValue routines can fail if there was no way to convert
the interface{} to an ast.Literal. In such a case, we need to issue an
error about the wrong type being passed. Unfortunately, in the most
recent set of changes, we began simply returning nils without issuing
the error. This change fixes that.
This change renames Schemas to Types on Stack. More interestingly, it
renames the JSON/YAML property used to specify them, from "schemas:" to
"types:"; I feel like this reads more naturally, especially as a sister
to the existing "services:" section.
This checkin continues progress on marapongo/mu#9. It's still not
complete, however we're getting there. In particular, this includes:
* Rename of ComplexLiteral to SchemaLiteral, as it is used exclusively
for schematized types. Also includes a set of changes associated
with this, like deep value conversion to `map[string]interface{}`.
* Binding of schema types included within a Stack. This allows names in
type references to be bound to those schema types during typechecking.
This also includes binding schema properties, reusing all the existing
property binding logic for stacks. In this way, properties between
stacks and custom schema types are one and the same, which is nice.
* Enforcement for custom schema constraints; this includes Pattern,
MaxLength, MinLength, Maximum, and Minimum, as per the JSON Schema
specification.
This change overhauls the core of how types are used by the entire
compiler. In particular, we now have an ast.Type, and have begun
using its use where appropriate. An ast.Type is a union representing
precisely one of the possible sources of types in the system:
* Primitive type: any, bool, number, string, or service.
* Stack type: a resolved reference to an actual concrete stack.
* Schema type: a resolved reference to an actual concrete schema.
* Unresolved reference: a textual reference that hasn't yet been
resolved to a concrete artifact.
* Uninstantiated reference: a reference that has been resolved to
an uninstantiated stack, but hasn't been bound to a concrete
result yet. Right now, this can point to a stack, however
eventually we would imagine this supporting inter-stack schema
references also.
* Decorated type: either an array or a map; in the array case, there
is a single inner element type; in the map case, there are two,
the keys and values; in all cases, the type recurses to any of the
possibilities listed here.
All of the relevant AST nodes have been overhauled accordingly.
In addition to this, we now have an ast.Schema type. It is loosely
modeled on JSON Schema in its capabilities (http://json-schema.org/).
Although we parse and perform some visitation and binding of these,
there are mostly placeholders left in the code for the interesting
aspects, such as registering symbols, resolving dependencies, and
typechecking usage of schema types.
This is part of the ongoing work behind marapongo/mu#9.
This change leverages intrinsics in place of the predefined types.
It remains to be seen if we can reach 100% on this, however I am hopeful.
It's also nice that the system will be built "out of itself" with this
approach; in other words, each of the types is simply a Mufile that can
use conditional targeting as appropriate for the given cloud providers.
If we find that this isn't enough, we can always bring back the concept.
This change stops using the short-hand "!Ref" YAML syntax. The Golang
marshaler encodes it with quotes and, apparently, has no way to suppress
this behavior; this isn't surprising, since the YAML parser we're using
admits it doesn't support this aspect of the YAML spec fully. But that's
okay, the long-hand syntax works just fine, and has the added benefit
that we don't need to special case the logig for JSON versus YAML.
I think things have gotten a little out of hand with the way mu/x/cf
auto-maps properties. In the beginning, it looked like everything could
be trivially auto-mapped, and I wanted to avoid the verbosity of mapping
each property by hand (since you can easily fat finger a name, mess up
capitalization, forget one, etc). But then we began mapping service
references using proper CloudFormation !Refs, which meant suppressing
some of the auto-mappings, etc., etc. This led to properties, extraProperties,
skipProperties, renamedProperties, and so on... Pretty confusing IMHO.
I just took a step back and decided to eliminate auto-mapping. Instead,
you get two options: properties just lists a set of property name mappings,
and extraProperties lets you do template magic to map thing instead if you
wish to take matters into your own hands. The result isn't too verbose
and has a lot less magic going on so it's easier to understand.
This change eliminates the special type mu/extension in favor of extensible
intrinsic types. This subsumes the previous functionality while also fixing
a number of warts with the old model.
In particular, the old mu/extension approach deferred property binding until
very late in the compiler. In fact, too late. The backend provider for an
extension simply received an untyped bag of stuff, which it then had to
deal with. Unfortunately, some operations in the binder are inaccessible
at this point because doing so would cause a cycle. Furthermore, some
pertinent information is gone at this point, like the scopes and symtables.
The canonical example where we need this is binding services names to the
services themselves; e.g., the AWS CloudFormation "DependsOn" property should
resolve to the actual service names, not the string values. In the limit,
this requires full binding information.
There were a few solutions I considered, including ones that've required
less code motion, however this one feels the most elegant.
Now we permit types to be marked as "intrinsic." Binding to these names
is done exactly as ordinary name binding, unlike the special mu/extension
provider name. In fact, just about everything except code-generation for
these types is the same as ordinary types. This is perfect for the use case
at hand, which is binding properties.
After this change, for example, "DependsOn" is expanded to real service
names precisely as we need.
As part of this change, I added support for three new basic schema types:
* ast.StringList ("string[]"): a list of strings.
* ast.StringMap ("map[string]any"): a map of strings to anys.
* ast.ServiceList ("service[]"): a list of service references.
Obviously we need to revisit this and add a more complete set. This work
is already tracked by marapongo/mu#9.
At the end of the day, it's likely I will replace all hard-coded predefined
types with intrinsic types, for similar reasons to the above.
There are two cases to consider in the case of a stack's bound properties.
First, it was a stack we didn't need to construct. This is the case for
built-in primitives. In that case, we must rebind each service uniquely.
Second, it was an imported stack, which by definition we had to construct.
In this case, we don't need to rebind the properties -- we already did so
-- and can just reuse them in the service's own bound properties.
I am pondering a better way of representing this and will probably do this
soon. The concept of unconstructed vs. constructed types should be unified
between the built-in types and imported ones. (Kind of like generics.)
But, I still need a little bit more time prototyping before I make up my
mind what a better way to represent all of this might look like.
We need to access the bound property values for a given stack, especially
during code-generation. This information was present for services before,
however not for stacks constructed via other means (e.g., the top-most one).
This change adds a PropertyValues bag plus a corresponding BoundPropertyValues
to the ast.Stack type.
During subtypeOf checking, we need to walk the chain of documents from
which a stack came. This is because, due to template expansion, we'll
end up with a different document for instantiated types than uninstantiated
ones. This change keeps track of the parent and walks it appropriately.
The prior change wasn't quite right vis-a-vis service selection. By default,
we actually want to pick the service itself that was named by a capability reference.
We only want to pick one of its public exported services as the selected one when
a selector is given. For convenience, we still have "<service>:." to pick the sole
public service, however, no longer is a selector absolutely required.
This implements support for arbitrary service types on properties,
not just the weakly typed "service". For example, in the AWS stacks,
the aws/ec2/route type requires a routeTable, among other things:
name: aws/ec2/route
properties:
routeTable:
type: aws/ec2/routeTable
This not only binds the definition of such properties, but also the
callsites of those creating stacks and supplying values for them.
This includes checking for concrete, instantiated, and even base
types, so that, for instance, if a custom stack derived from
aws/ec2/routeTable using the base property, in the above example
it could be supplied as a legal value for the routeTable property.
The previous code stored dependencies in a map. This caused non-determinism
in the order in which the resulting dependencies would be bound. Instead of
doing that, this change tracks them in an array, simply using a map to avoid
binding duplicate dependencies.
A stack property can refer to other stack types. For example:
properties:
gateway:
type: aws/ec2/internetGateway
...
In such cases, we need to validate the property during binding,
in addition to binding it to an actual type so that we can later
validate callers who are constructing instances of this stack
and providing property values that we must typecheck.
Note that this binding is subtly different than existing stack
type binding. All the name validation, resolution, and so forth
are the same. However, notice that in this case we are not actually
supplying any property setters. That is, internetGateway is not
an "expanded" type, in that we have not processed any of its templates.
An analogy might help: this is sort of akin referring to an
uninstantiated generic type in a traditional programming language,
versus its instantiated form. In this case, certain properties aren't
available to us, however we can still use it for type identity, etc.
This change properly transforms literal AST nodes during code-gen.
This includes emitting CloudFormation !Refs where appropriate, for
intra-stack references (capability types).
This change adds a handful of property binding tests.
It also fixes:
* AsName should assert IsName.
* Enumerate properties stably, so that it is deterministic.
* Do not issue errors about unrecognized properties for the special
`mu/extension` type. It's entire purpose in life is to offer an
entirely custom set of properties, which the provider is meant to
validate.
* Default to an empty map if properties are missing.
* Add a "/" to the end of the namespace from the workspace, if present.
And rearranges some code:
* Rename the LiteralX types to XLiteral; e.g., StringLiteral instead of
LiteralString. I kept typing XLiteral erroneously.
* Eliminate the Mu prefix on all of the predefined type and service
functions and types. It's superfluous and reads nicer this way.
* Swap the order of "expected" vs. "got" in the error message about
incorrect property types. It used to say "got %v, expected %v"; I
personally find that it is more helpful if it says "expected %v,
got %v". YMMV.
This change permits a workspace to specify a namespace, which is just a name
part that is trimmed off the front of directories when probing for inter-
workspace dependencies. For example, if our namespace is aws/, normally we'd
need to organize our namespace into directories like:
<root>
| aws/
| | dynamodb/
| | ec2/
| | s3/
... and so on ...
If we instead specify a namespace
namespace: aws
Then we can instead organize our project workspace as follows:
<root>
| dynamodb/
| ec2/
| s3/
... and so on ...
This is an initial pass at property binding. For all stack instantiations,
we must verify that the set of properties supplied are correct. We also must
remember the bound property information so that code-generation has all of
the information it needs to generate correct code (including capability refs).
This entails:
* Ensuring required properties are provided.
* Expanding missing properties that have Default values.
* Type-checking that supplied properties are of the right type.
* Expanding property values into AST literal nodes.
To do this requires a third AST pass in the semantic analysis part of the
compiler. In the 1st pass, dependencies aren't even known yet; in the 2nd
pass, dependencies have not yet been bound; therefore, we need a 3rd pass,
which can depend on the full binding information for the transitive closure
of AST nodes and dependencies to have been populated with types.
There are a few loose ends in here:
* We don't yet validate top-level stack properties.
* We don't yet validate top-level stack base type properties.
* We don't yet support complex schema property types.
* We don't yet support even "simple" complex property types, like `[ string ]`.
* We don't yet support strongly typed capability property types (just `service`).
That said, I am going to turn to writing a few tests for the basic cases, and then
resume to finishing this afterwards (tracked by marapongo/mu#25).
The prior code could miss arrays of strings during conversion because
the arrays created by the various marshalers are weakly typed. In other
words, even though they contain strings, the array type is []interface{}.
This change introduces the encoding.ArrayOfStrings function to perform
this conversion, first by checking for []string and returning that directly
where possible, and second, if that fails, checking each element and copying.
This changes the way binding dependencies works slightly, to ensure that
the full transitive closure of dependencies is bound appropriately before
hitting code-generation. Namely, now binder.PrepareStack returns a list
of unresolved dependency Refs; the compiler is responsible for turning this
into a map from Ref to the loaded diag.Document, before calling BindStack;
then, BindStack instantiates these as necessary (template expansion, etc),
returning an array of unbound *ast.Stacks that the compiler must then bind.
Well, it turns out glog.Fail is slightly better than panic, because it explicitly
dumps the stacks of *all* goroutines. This is especially good in logging scenarios.
It's really annoying that glog suppresses printing the stack trace (see here
https://github.com/golang/glog/blob/master/glog.go#L719), however this is better.
This change switches away from using glog.Fatalf, and instead uses panic,
should a fail-fast arise (due to a call to util.Fail or a failed assertion
by way of util.Assert). This leads to a better debugging experience no
matter what flags have been passed to glog. For example, glog.Fatal* seems
to suppress printing stack traces when --logtostderr is supplied.
This change moves parse-tree analysis into the Parse* functions, so that
any callers doing parsing don't need to do this as a multi-step activity.
(We had neglected to do the parse-tree analysis phase during dependency
resolution, for example, meaning services were left untyped.)
This change makes workspace file naming a little more consistent with respect
to Mufile naming. Instead of having a .mu/ directory, under which a workspace.yaml
and/or a stacks directory might exist, we now have a Muspace.yaml (or .json) file,
and a .Mudeps/ directory. This has nicer symmetric with respect to Mu.yaml files.
If compiling a stack that accepts properties directly, we need a way to
pass arguments to that stack at the command line. This change permits this
using the ordinary "--" style delimiter; for example:
$ mu build -- --name=Foo
This is super basic and doesn't handle all the edge cases, but is sufficient
for testing and prototyping purposes.
This change introduces the notion of "perturbing" properties. Changing
one of these impacts the live service, possibly leading to downtime. As
such, we will likely encourage blue/green deployments of them just to be
safe. Note that this is really just a placeholder so I can keep track of
metadata as we go, since AWS CF has a similar notion to this.
I'm not in love with the name. I considered `interrupts`, however,
I must admit I liked that `readonly` and `perturbs` are symmetric in
the number of characters (meaning stuff lines up nicely...)
This change detects the target cloud earlier on in the compilation process.
Prior to this change, we didn't know this information until the backend code-generation.
Clearly we need to know this at least by then, however, templates can specialize on this
information, so we actually need it sooner. This change moves it into the frontend part.
Note that to support this we now eliminate the ability to specify target clusters in
the Mufile alone. That "feels" right to me anyway, since Mufiles are supposed to be
agnostic to their deployment environment, other than template specialization. Instead,
this information can come from the CLI and/or the workspace settings file.
In some cases, a dependency will resolve to a diag.Document, rather than
a fully instantiated ast.Stack (in fact, that is the common case). The
binder needs to detect and tolerate this situation.
This change adds the notion of readonly properties to stacks. Although these
*can* be "changed", doing so implies recreation of the resources all over again.
As a result, all dependents must be recreated, in a cascading manner.
The "require" template function simply checks a condition and, if it
is false, issues an error and quits template processing immediately.
This is useful for concisely encoding validation logic.
In some cases, we actually want to suppress auto-mapping for all or
most of the properties. In those cases, it's easier to specify those
that we *do* want rather than the ones we *do not* want. Now with
properties, skipProperties, and extraProperties, we have all the
necessary flexibility to control auto-mapping for CF templates.
The new "has" function lets templates conveniently check the existence of
keys in property bag-like maps. For example:
{{if has .Properties "something"}}
...
{{end}}
Most properties in CF templates are auto-mapped by the aws/cf extension
provider. However, sometimes we want to inject extra properties that are
outside of that auto-mapping (like our convenient shortcut for supplying
name to mean adding a "Name=v" tag). And sometimes we want to skip auto-
mapping for certain properties (like our capability-based approach to
passing service references, versus string IDs).
This change adds a super simple initial whack at a basic cluster topology
comprised of VPC, subnet, internet gateway, attachments, and route tables.
This is actually written in Mu itself, and I am committing this early, since
there are quite a few features required before we can actually make progress
getting this up and running.
This introduces a "panic" template function, so that templates may abandon
evaluation if something unexpected occurs. It accepts a string, indicating
the error, and optional arguments, if the string is to be formatted.
For example:
{{if eq .Target "aws"}}
...
{{else}}
{{panic "Unrecognized cloud target: %v" .Target}}
{{end}}
This lets YAML files include others, often conditionally, based on things
like the cloud target. For example, I am currently using this to define the
overall cluster stack by doing things like:
name: mu/cluster
services:
{{if eq .Target "aws"}}
{{include "Mu-aws.yaml" | indent 4}}
{{else}}
...
{{end}}
This change performs template expansion both for root stack documents in
addition to the transitive closure of dependencies. There are many ongoing
design and implementation questions about how this should actually work;
please see marapongo/mu#7 for a discussion of them.
For now, we can simply auto-map the Mu properties to CF properties,
eliminating the need to manually map them in the templates. Eventually
we'll want more sophistication here to control various aspects of the CF
templates, but this eliminates a lot of tedious manual work in the meantime.
The only two AST nodes that track any semblance of location right now
are ast.Workspace and ast.Stack. This is simply because, using the standard
JSON and YAML parsers, we aren't given any information about the resulting
unmarshaled node locations. To fix that, we'll need to crack open the parsers
and get our hands dirty. In the meantime, we can crudely implement diag.Diagable
on ast.Workspace and ast.Stack, however, to simply return their diag.Documents.
This change adds a new Diagable interface from which you can obtain
a diagnostic's location information (Document and Location). A new
At function replaces WithDocument, et al., and will be used soon to
permit all arbitrary AST nodes to report back their position.
This change completes the implementation of dependency and type binding.
The top-level change here is that, during the first semantic analysis AST walk,
we gather up all unknown dependencies. Then the compiler resolves them, caching
the lookups to ensure that we don't load the same stack twice. Finally, during
the second and final semantic analysis AST walk, we populate the bound nodes
by looking up what the compiler resolved for us.
This changes the logic when parsing RefParts so that we can actually
round-trip them faithfully when required. To do this, by default we
preserve blank ""s in the RefPart fields that are legally omitted from
the Ref string. Then, there is a Defaults() method that can populate
the missing fields with the defaults if desired.
This change implements dependency versions, including semantic analysis, per the
checkin 83030685c3.
There's quite a bit in here but at a top-level this parses and validates dependency
references of the form
[[proto://]base.url]namespace/.../name[@version]
and verifies that the components are correct, as well as binding them to symbols.
These references can appear in two places at the moment:
* Service types.
* Cluster dependencies.
As part of this change, a number of supporting changes have been made:
* Parse Workspaces using a full-blown parser, parser analysis, and semantic analysis.
This allows us to share logic around the validation of common AST types. This also
moves some of the logic around loading workspace.yaml files back to the parser, where
it can be unified with the way we load Mu.yaml files.
* New ast.Version and ast.VersionSpec types. The former represents a precise version
-- either a specific semantic version or a short or long Git SHA hash -- and the
latter represents a range -- either a Version, "latest", or a semantic range.
* New ast.Ref and ast.RefParts types. The former is an unparsed string that is
thought to contain a Ref, while the latter is a validated Ref that has been parsed
into its components (Proto, Base, Name, and Version).
* Added some type assertions to ensure certain structs implement certain interfaces,
to speed up finding errors. (And remove the coercions that zero-fill vtbl slots.)
* Be consistent about prefixing error types with Error or Warning.
* Organize the core compiler driver's logic into three methods, FE, sema, and BE.
* A bunch of tests for some of the above ... more to come in an upcoming change.
Right now, the AWS ECS scheduler simply passes through to the underlying
AWS cloud provider. However, now we have the necessary hooks to start
incrementally recognizing stack types and emitting specialized code for
them (e.g., starting with mu/container).
This change adds code-generation for Stack references other than the built-in types.
This permits you to bind to a dependency and have it flow all the way through to the
code-generation phases. It still most likely bottoms out on something that fails,
however for pure AWS resources like aws/s3/bucket everything now works.
This change adds support for Workspaces, a convenient way of sharing settings
among many Stacks, like default cluster targets, configuration settings, and the
like, which are not meant to be distributed as part of the Stack itself.
The following things are included in this checkin:
* At workspace initialization time, detect and parse the .mu/workspace.yaml
file. This is pretty rudimentary right now and contains just the default
cluster targets. The results are stored in a new ast.Workspace type.
* Rename "target" to "cluster". This impacts many things, including ast.Target
being changed to ast.Cluster, and all related fields, the command line --target
being changed to --cluster, various internal helper functions, and so on. This
helps to reinforce the desired mental model.
* Eliminate the ast.Metadata type. Instead, the metadata moves directly onto
the Stack. This reflects the decision to make Stacks "the thing" that is
distributed, versioned, and is the granularity of dependency.
* During cluster targeting, add the workspace settings into the probing logic.
We still search in the same order: CLI > Stack > Workspace.
This changes the probing logic for dependency resolution. The old logic was
inconsistent between the various roots. The new approach simply prefers locations
with a base URL component -- since they are more specific -- but will allow for
locations missing a base URL component. This is convenient for developers managing
a workspace where needing to specify the base URL in the path is annoying and
slightly too "opinionated" for my taste (especially for migrating existing services).
Also add some convenience helper methods to deal with names, including
IsName and AsName, which validate that names obey the intended regular expressions.
This change includes logic to resolve dependencies declared by stacks. The design
is described in https://github.com/marapongo/mu/blob/master/docs/deps.md.
In summary, each stack may declare dependencies, which are name/semver pairs. A
new structure has been introduced, ast.Ref, to distinguish between ast.Names and
dependency names. An ast.Ref includes a protocol, base part, and a name part (the
latter being an ast.Name); for example, in "https://hub.mu.com/mu/container/",
"https://" is the protocol, "hub.mu.com/" is the base, and "mu/container" is the
name. This is used to resolve URL-like names to package manager-like artifacts.
The dependency resolution phase happens after parsing, but before semantic analysis.
This is because dependencies are "source-like" in that we must load and parse all
dependency metadata files. We stick the full transitive closure of dependencies
into a map attached to the compiler to avoid loading dependencies multiple times.
Note that, although dependencies prohibit cycles, this forms a DAG, meaning multiple
inbound edges to a single stack may come from multiple places.
From there, we rely on ordinary visitation to deal with dependencies further.
This includes inserting symbol entries into the symbol table, mapping names to the
loaded stacks, during the first phase of binding so that they may be found
subsequently when typechecking during the second phase and beyond.
This change introduces a Workspace interface that can be used as a first
class object. We will embellish this as we start binding to dependencies,
which requires us to search multiple paths. This change also introduces a
workspace.InstallRoot() function to fetch the Mu install path.
This change moves the workspace and Mufile detection logic out of the compiler
package and into the workspace one.
This also sketches out the overall workspace structure. A workspace is "delimited"
by the presence of a .mu/ directory anywhere in the parent ancestry. Inside of that
directory we have an optional .mu/clusters.yaml (or .json) file containing cluster
settings shared among the whole workspace. We also have an optional .mu/stacks/
directory that contains dependencies used during package management.
The notion of a "global" workspace will also be present, which is essentially just
a .mu/ directory in your home, ~/.mu/, that has an equivalent structure, but can be
shared among all workspaces on the same machine.
This change mostly replaces explicit if/then/glog.Fatalf calls with
util.Assert calls. In addition, it adds a companion util.Fail family
of methods that does the same thing as a failed assertion, except that
it is unconditional.
If the first rune is unprintable, then we don't want to go ahead and force
capitalization on the next character. (Unlike any other non-first rune,
where of course we do.) In the case of a first rune, we want to let the
current default based on the pascal parameter take charge.
This uses normal AWS resource naming conventions during stack template
creation. Part of this is just a "best practice" thing, however, part of it
is also that we generate illegal names if Mu stacks have illegal characters
like /, -, and so on.
AWS uses capitalized property names for its markup, so we should
be looking for "Type" and "Properties", not "type" and "properties"
when validating that a aws/cf is formatted correctly.
This change eliminates the diag.Sink field, Diag, on the Compiland struct.
Instead, we should provide it at backend provider construction time. This
is consistent with how other phases of the compiler work and also ensures
the backends can properly implement the core.Phase interface.
This change implements the aws/cf extension provider, so that AWS resources
may be described and encapsulated inside of other stacks. Each aws/cf instantiation
requires just two fields -- type and properties -- corresponding to the equivalent
AWS resource object. The result is simply plugged in as an AWS resource, after
Mu templates have been expanded, permitting stack properties, etc. to be used.
The more I live with the current system, the more I prefer "properties" to
"parameters" for stacks and services. Although it is true that these things
are essentially construction-time arguments, they manifest more like properties
in the way they are used; in fact, if you think of the world in terms of primary
constructors, the distinction is pretty subtle anyway.
For example, when creating a new service, we say the following:
services:
private:
some/service:
a: 0
b: true
c: foo
This looks like a, b, and c are properties of the type some/service. If, on
the other hand, we kept calling these parameters, then you'd arguably prefer to
see the following:
services:
private:
some/service:
arguments:
a: 0
b: true
c: foo
This is a more imperative than declarative view of the world, which I dislike
(especially because it is more verbose).
Time will tell whether this is the right decision or not ...
This introduces three assertion functions:
* util.Assert: simply asserts a boolean condition, ripping the process using
glog should it fail, with a stock message.
* util.AssertM: asserts a boolean condition, also using glog if it fails,
but it comes with a message to help with debugging too.
* util.AssertMF: the same, except it formats the message with arguments.
During unmarshaling, the default behavior of the stock Golang JSON marshaler,
and consequently the YAML one we used which mimics its behavior, is to toss away
unrecognized properties. This isn't what we want for two reasons:
First, we want to issue errors/warnings on unrecognized fields to aid in diagnostics;
we will set aside some extensible section for 3rd parties to use. This is not
addressed in this change, however.
Second, and more pertinent, is that we need to retain unrecognized fields for certain
types like services, which are extensible by default.
Until golang/go#6213 is addressed -- imminent, it seems -- we will have to do a
somewhat hacky workaround to this problem. This change contains what I consider to
be the "least bad" in that we won't introduce a lot of performance overhead, and
just have to deal with the slight annoyance of the ast.Services node type containing
both Public/Private *and* PublicUntyped/PrivateUntyped fields alongside one another.
The marshaler dumps property bags into the *Untyped fields, and the parsetree analyzer
expands them out into a structured ast.Service type. Subsequent passes can then
ignore the *Untyped fields altogether.
Note that this would cause some marshaling funkiness if we ever wanted to remarshal
the mutated ASTs back into JSON/YAML. Since we don't do that right now, however, I've
not made any attempt to keep the two pairs in synch. Post-parsetree analyzer, we
literally just forget about the *Untyped guys.
This change rearranges the last checkin a little bit. Rather than storing
shadow BoundPublic/BoundPrivate maps, we will store the *ast.Stack directly on
the ast.Service node itself. This helps with context-free manipulation (e.g.,
you don't need access to the parent map just to interact with the node), and
simplifies the backend code quite a bit (again, less context to pass).
This is another change of mostly placeholders.
In general, there will be three kinds of types handled by code-generation:
* Mu primitives will be expanded into AWS goo in a very specialized way, to
accomplish the desired Mu semantics for those abstractions.
* AWS-specific extension types (mu/extension) will be recognized, so that we
can create special AWS resources like S3 buckets, DynamoDB tables, etc.
* Anything else is interpreted as a reference to another stack that will be
instantiated at deployment time (basically through template expansion).
This change does rearrange two noteworthy things in the core compiler, however:
first, it creates a place for bound nodes in the public and private service
references, so that the backend can access the raw stack types behind them; and
second, it moves the predefined types underneath their own package to avoid cycles.
This change introduces the notion of "Stack subclassing" in two ways:
1. A Stack may declare that it subclasses another one using the base property:
name: mystack
base: other/stack
.. as before ..
2. A Stack may declare that it is abstract; in other words, that it is meant
solely for subclassing, and cannot be compiled and deployed independently:
name: mystack
abstract: true
.. as before ..
Note that non-abstract Stacks are required to declare at least one Service,
whether that is public, private, or both.
This change rejiggers a few things so that we can more clearly introduce
a boundary between front- and back-end compiler phases, including sharing more,
like a diagnostics sink. Future extensions will include backend code-generation
options.
This change includes a few steps towards AWS backend code-generation:
* Add a BoundDependencies property to ast.Stack to remember the *ast.Stack
objects bound during Stack binding.
* Make a few CloudFormation properties optional (cfOutput Export/Condition).
* Rename clouds.ArchMap, clouds.ArchNames, schedulers.ArchMap, and
schedulers.ArchNames to clouds.Values, clouds.Names, schedulers.Values,
and schedulers.Names, respectively. This reads much nicer to my eyes.
* Create a new anonymous ast.Target for deployments if no specific target
was specified; this is to support quick-and-easy "one off" deployments,
as will be common when doing local development.
* Sketch out more of the AWS Cloud implementation. We actually map the
Mu Services into CloudFormation Resources; well, kinda sorta, since we
don't actually have Service-specific logic in here yet, however all of
the structure and scaffolding is now here.
We previously used stable enumeration of the various AST maps in the core
visitor, however we now need stable enumeration in more places (like the AWS
backend I am working on). This change refactors this logic to expose a set
of core ast.StableX routines that stably enumerate maps, and then simply uses
them in place of the existing visitor logic. (Missing generics right now...)
This change simply adds the necessary AWS CloudFormation struct types to
permit marshaling and unmarshaling to/from JSON and YAML. This will be used
by the AWS cloud provider's backend code-generation. It adds a bit of strong
typing so that we can catch more errors at compile-time (both for our own sanity
but also to provide developers better diagnostics when compiling their stacks).
This change adds a Backend Phase to the compiler, implemented by each of the
cloud/scheduler implementations. It also reorganizes some of the modules to
ensure we can do everything we need without cycles, including introducing the
mu/pkg/compiler/backends package, under which the clouds/ and schedulers/
sub-packages now reside. The backends.New(Arch) factory function acts as the
entrypoint into the entire thing so callers can easily create new Backend instances.
This change is more consistent with our approach to file extensions
elsewhere in the compiler, and prevents an explosion of APIs should we
ever want to support more.
This adds some tests around cloud targeting, in addition to enabling builds
to use in-memory Mufiles (mostly to make testing simpler, but this is a
generally useful capability to have when hosting the compiler API).
This change implements most of the cloud target and architecture detection
logic, along with associated verification and a bunch of new error messages.
There are two settings for picking a cloud destination:
* Architecture: this specifies the combination of cloud (e.g., AWS, GCP, etc)
plus scheduler (e.g., none, Swarm, ECS, etc).
* Target: a named, preconfigured entity that includes both an Architecture and
an assortment of extra default configuration options.
The general idea here is that you can preconfigure a set of Targets for
named environments like "prod", "stage", etc. Those can either exist in a
single Mufile, or the Mucluster file if they are shared amongst multiple
Mufiles. This can be specified at the command line as such:
$ mu build --target=stage
Furthermore, a given environment may be annointed the default, so that
$ mu build
selects that environment without needing to say so explicitly.
It is also possible to specify an architecture at the command line for
scenarios where you aren't intending to target an existing named environment.
This is good for "anonymous" testing scenarios or even just running locally:
$ mu build --arch=aws
$ mu build --arch=aws:ecs
$ mu build --arch=local:kubernetes
$ .. and so on ..
This change does little more than plumb these settings around, verify them,
etc., however it sets us up to actually start dispating to the right backend.
This change creates a new mu/pkg/compiler/core package for any fundamental
compiler types that need to be shared among the various compiler packages
(.../compiler, .../compiler/clouds/..., and .../compiler/schedulers/...).
This avoids package cycles.
This adds two packages:
mu/pkg/compiler/clouds
mu/pkg/compiler/schedulers
And introduces enums for the cloud targets we expect to support.
It also adds the ability at the command line to specify a provider;
for example:
$ mu build --target=aws # AWS native
$ mu build --target=aws:ecs # AWS ECS
$ mu build -t=gcp:kubernetes # Kube on GCP
This prepopulates the symbol table with our predefined "primitive" types
like mu/container, mu/gateway, mu/func, and the like. Also added a positive
test case to ensure the full set works; this will obviously need updating as
we embellish the predefined types with things like required parameters.
This change adds rudimentary type binding to phase 2 of the binder. Note that
we still don't have the notion of predefined types (for the primitives), so this
basically rejects any well-formed Mufile. Primitives are on deck.
Instead of:
name: mystack
public:
someservice
private:
someotherservice
we want it to be:
name: mystack
services:
public:
someservice
private
someotherservice
I had always intended it to be this way, but coded up the ASTs wrong.
Neither the YAML nor JSON decoders appreciate having pointers in the AST
structures. This is unfortunate because we end up mutating them later on.
Perhaps we will need separate parse trees and ASTs after all ...
This change lays some groundwork that registers symbols when doing semantic
analysis of the resulting AST. For now, that just entails detecting duplicate
services by way of symbol registration.
Note that we've also split binding into two phases to account for the fact
that intra-stack dependencies are wholly legal.
Go's map iteration order is undefined. As such, anywhere we enumerate one
we need to go out of our way to sort the keys so that iteration is stable.
Unfortunately, this isn't a single line of code, as it would seem, due to
Go's lack of generics or even a reflectionless Keys() function on maps. To
do this more elegently, I've created a new InOrderVisitor that handles the
stable enumeration so that other Visitors can remain simple and focus just
on the logic they need to get their job done.
This change introduces a check during parse-tree analysis that dependencies
are valid, along with some tests. Note that this could technically happen later
during semantic analysis and I will likely move it so that we can get better
diagnostics (more errors before failing). I've also cleaned up and unified some
of the logic by introducing the general notion of a Visitor interface, which the
parse tree analyzer, binder, and analyzers to come will all implement.
This adds a few tests for parse tree validation, and further restructures
the existing test logic. The common_test.go file now contains helper methods
common to all tests in the mu/compiler package. I've also adopted a naming
convention for the testdata/ directory to keep some sanity; namely, each
directory uses "(good|bad)_testname[_seqnum]" as a naming scheme.
This is a placeholder for future use; .mu_modules will be our moral
equivalent to NPM/Yarn's node_modules directory. I've chosen a dot
since for the most part developers can ignore its existence.
This change ensures that Stack semantic version numbers specified in Mufiles
are correct. Note that we do not accept ranges for the version number itself,
although obviously when checking dependencies we will permit it.
This change begins to lay the groundwork for doing semantic analysis and
lowering to the cloud target's representation. In particular:
* Split the mu/schema package. There is now mu/ast which contains the
core types and mu/encoding which concerns itself with JSON and YAML
serialization.
* Notably I am *not* yet introducing a second AST form. Instead, we will
keep the parse tree and AST unified for the time being. I envision very
little difference between them -- at least for now -- and so this keeps
things simpler, at the expense of two downsides: 1) the trees will be
mutable (which turns out to be a good thing for performance), and 2) some
fields will need to be ignored during de/serialization. We can always
revisit this later when and if the need to split them arises.
* Add a binder phase. It is currently a no-op.
This change adds a few more compiler tests and rearranges some bits and pieces
that came up while doing so. For example, we now issue warnings for incorrect
casing and/or extensions of the Mufile (and test these conditions). As part of
doing that, it became clear the layering between the mu/compiler and mu/workspace
packages wasn't quite right, so some logic got moved around; additionally, the
separation of concerns between mu/workspace and mu/schema wasn't quite right, so
this has been fixed also (workspace just understands Mufile related things while
schema understands how to unmarshal the specific supported extensions).
This change adds a compiler test that just checks the basic "Mufile is missing"
error checking. The test itself is mostly uninteresting; what's more interesting
is the addition of some basic helper functionality that can be used for future
compiler tests, like capturing of compiler diagnostics for comparisons.
Eric rightly pointed out in a CR that Semver is...different. Since it
is an abbreviation of the lengthier SemanticVersion name, SemVer seems
more appropriate. Changed.
This change recognizes .yml in addition to the official .yaml extension,
since .yml is actually very commonly used. In addition, while in here, I've
centralized more of the extensions logic so that it's more "data-driven"
and easier to manage down the road (one place to change rather than two).
Error messages could get quite lengthy as the code was written previously,
because we always used the complete absolute path for the file in question.
This change "prettifies" this to be relative to whatever contextual path
the user has chosen during compilation. This shortens messages considerably.
This change includes some progress on actual compilation (albeit with several
TODOs remaining before we can actually spit out a useful artifact). There are
also some general cleanups sprinkled throughout. In a nutshell:
* Add a compiler.Context object that will be available during template expansion.
* Introduce a diag.Document abstraction. This is better than passing raw filenames
around, and lets us embellish diagnostics as we go. In particular, we will be
in a better position to provide line/column error information.
* Move IO out of the Parser and into the Compiler, where it can cache and reuse
Documents. This will become important as we start to load up dependencies.
* Rename PosRange to Location. This reads nicer with the new Document terminology.
* Rename the mu/api package to mu/schema. It's likely we will need to introduce a
true AST that is decoupled from the serialization format and contains bound nodes.
As a result, treating the existing types as "schema" is more honest.
* Add in a big section of TODOs at the end of the compiler.Compiler.Build function.
* Rename Meta to Metadata.
* Rename Target's CloudOS and CloudScheduler properties to Cloud
and Scheduler, respectively. Also rename Target's JSON properties
to match (they had drifted); they are now "cloud" and "scheduler".
* Rename Diags() to Diag() on the Compiler and Parser interfaces.
* Rename defaultDiags to defaultSink, to match the interface name.
* Add a few useful logging outputs.
This adds a bunch of general scaffolding and the beginning of a `build` command.
The general engineering scaffolding includes:
* Glide for dependency management.
* A Makefile that runs govet and golint during builds.
* Google's Glog library for logging.
* Cobra for command line functionality.
The Mu-specific scaffolding includes some packages:
* mu/pkg/diag: A package for compiler-like diagnostics. It's fairly barebones
at the moment, however we can embellish this over time.
* mu/pkg/errors: A package containing Mu's predefined set of errors.
* mu/pkg/workspace: A package containing workspace-related convenience helpers.
in addition to a main entrypoint that simply wires up and invokes the CLI. From
there, the mu/cmd package takes over, with the Cobra-defined CLI commands.
Finally, the mu/pkg/compiler package actually implements the compiler behavior.
Or, it will. For now, it simply parses a JSON or YAML Mufile into the core
mu/pkg/api types, and prints out the result.
This change adds the core Mufile types (Cluster, Stack, and friends), including
mapping them to the relevant JSON names in the serialized form. Notably absent
are all of the Identity-related types, which will come later on.