This change uses an enum rather than set of bools for unwind's inner
state, makes its fields private, and adds accessor methods. This makes
it easier to protect invariants and wonky states like being a break and
a throw simultaneously. Now that Unwind is part of the public API (vs
being private as it used to), this is really a requirement (and was
obviously a good idea regardless). Thanks to Eric for CR feedback.
This change adds some fail statements into places that aren't yet
implemented during binding and evaluation so I can qucikly catch
stuff that isn't working as expected yet.
Three minor blocking issues fixed, discovered as I am running tests
locally to stress the interpretr in a few different ways:
* Properly initiaize the local scope.
* Log pkg.Name(), not pkg.Name, so we get a string and not an address.
* Don't dereference uw after doing a call, when it could be nil.
This change populates the special `this` and `super` variables in
instance method activation frames. This occurs both during binding --
so that we can properly typecheck and verify references to them -- and
during evaluation -- so we can actually operate on their values.
Now that we have introduced a full blown token map -- new as of just
a few changes ago -- we can start using it for all of our symbol binding.
This also addresses some order-dependent issues, like intra-module
references looking up symbols that have been registered in the token map
but not necessarily stored in the relevant parent symbols just yet.
Plus, frankly, it's much simpler and uses a hashmap lookup instead of
a fairly complex recursive tree walk.
I've kept the tree walk case, however, to improve diagnostics upon
failure. This allows us to tell developers, for example, that the reason
a binding failed was due to a missing package.
This change revives some compiler tests that are still lingering around
from the old architecture, before our latest round of ship burning.
It also fixes up some bugs uncovered during this:
* Don't claim that a symbol's kind is incorrect in the binder error
message when it wasn't found. Instead, say that it was missing.
* Do not attempt to compile if an error was issued during workspace
resolution and/or loading of the Mufile. This leads to trying to
load an empty path and badness quickly ensues (crash).
* Issue an error if the Mufile wasn't found (this got lost apparently).
* Rename the ErrorMissingPackageName message to ErrorInvalidPackageName,
since missing names are now caught by our new fancy decoder that
understands required versus optional fields. We still need to guard
against illegal characters in the name, including the empty string "".
* During decoding, reject !src.IsValid elements. This represents the
zero value and should be treated equivalently to a missing field.
* Do not permit empty strings "" as Names or QNames. The old logic
accidentally permitted them because regexp.FindString("") == "", no
matter the regex!
* Move the TestDiagSink abstraction to a new pkg/util/testutil package,
allowing us to share this common code across multiple package tests.
* Fix up a few messages that needed tidying or to use Infof vs. Info.
The binder tests -- deleted in this -- are about to come back, however,
I am splitting up the changes, since this represents a passing fixed point.
This change looks up the main module from a package, and that module's
entrypoint, when performing evaluation. In any case, the arguments are
validated and bound to the resulting function's parameters.
This change eliminates all traces of the old, legacy errors and warnings.
It also renumbers everything so that we don't have any "gaps". Finally,
it introduces a factory method for new errors and warnings, private to
this package, that ensures we don't accidentally register duplicate IDs.
This change deletes a bunch of old legacy code. I've tagged the
prior changelist as "0.1" in case we need to go back and recover
some of the code (as I expect some of the constraint type logic
and AWS-specific code to come in handy down the road).
🔥🔥🔥
This adds support for LoadLocationExpression, which will be key to
getting virtually everything else working, including class and module
member accesses, local variable gets/sets, function calls, and more.
In particular, an extra level of indirection is added to the globals,
locals, and properties maps. This new thing is called a *Reference,
and it is simply a mutable slot that contains an *Object. Instead
of storing and retrieving objects directly, this extra level of
indirection enables us to create *Object instances that themselves
simply refer to other objects (for subsequent loads and stores).
For now, we do not permit overwriting of method properties. This
is a future work item (see marapongo/mu#56).
This change adds support for function calls. We do not yet wire it
into the various function call expressions, however, that should be
relatively trivial after this change. (It is dependent on figuring
out the runtime representation of load location objects and subsequent
uses of them). This change also updates the scoping logic to respect
"activation frame" style of lexical scoping, where an inner function
as a result of a call should not have access to the callee context.
This change also includes logic to detect unhandled exceptions and to
print an error as a result. Eventually, we will want stack traces here.
This change eliminates the scope-based symbol table. Because we now
require that all module, type, function, and variable elements are
encoded as fully qualified tokens, there is no need for the scope-based
lookups. Instead, the languages themselves decide how the names bind
to locations and just encode that information directly.
The scope is still required for local variables, however, since those
don't have a well-defined "fixed" notion of name. This is also how
we will ensure the evaluator stores values correctly -- including
discarding them -- in a lexically scoped manner.
This change checks in an enormously rudimentary interpreter. There is
a lot left to do, as evidenced by the countless TODOs scattered throughout
pkg/compiler/eval/eval.go. Nevertheless, the scaffolding and some important
pieces are included in this change.
In particular, to evaluate a package, we must locate its entrypoint and then,
using the results of binding, systematically walk the full AST. As we do
so, we will assert that aspects of the AST match the expected shape,
including symbols and their types, and produce value objects for expressions.
An *unwind structure is used for returns, throws, breaks, and continues (both
labeled and unlabeled). Each statement or expression visitation may optionally
return one and its presence indicates that control flow transfer is occurring.
The visitation logic then needs to propagate this; usually just by bailing out
of the current context immediately, but sometimes -- as is the case with
TryCatchBlock statements, for instance -- the unwind is manipulated in more
"interesting" ways.
An *Object structure is used for expressions yielding values. This is a
"runtime object" in our system and is comprised of three things: 1) a Type
(essentially its v-table), 2) an optional data pointer (for tagged primitives),
and 3) an optional bag of properties (for complex object property values).
I am still on the fence about whether to unify the data representations.
The hokiest aspect of this change is the scoping and value management. I am
trying to avoid needing to implement any sort of "garbage collection", which
means our bag-of-values approach will not work. Instead, we will need to
arrange for scopes to be erected and discarded in the correct order during
evaluation. I will probably tackle that next, along with fleshing out the
many missing statement and expression cases (...and tests, of course).
This change introduces a binder.Context structure, with a
*core.Context embedded, that carries additional semantic information
forward to future passes in the compiler. In particular, this is how
the evaluation/interpretation phase will gain access to types, scopes,
and symbols.
This change memoizes decorated types to avoid creating boatloads
of redundant symbols at runtime. We want it to be the case that you
can create instances of decorated types as needed throughout the
compiler without worry for excess garbage (e.g., arrays, pointers, etc).
During evaluation, we need to perform very delicate walking of the AST,
and are likely to use the ability to substitute visitors "in situ." As
a result, I'd like to make anonymous visitors easier to produce. This
change permits both single function forms of Visit/After, in addition to
anonymous structures that pair up a Visit and an After, to produce a Visitor.
This change completes my testing of decorator parsing for now. It tests the token
`*[]map[string]map[()*(bool,string,test/package:test/module/Crazy)number][][]test/package:test/module/Crazy`.
This turned up some bugs, most notably in the way we returned the "full" token for
the parsed types. We need to extract the subset of the token consumed by the parsing
routine, rather than the entire thing. To do this, we introduce a tokenBuffer type
that allows for convenient parsing of tokens (eating, advancing, extraction, etc).
This isn't comprehensive yet, however it caught two bugs:
1. parseNextType should operate on "rest" in most cases, not "tok".
2. We must eat the "]" map separator before moving on to the element type.
Part of the token grammar permits so-called "decorated" types. These
are tokens that are pointer, array, map, or function types. For example:
* `*any`: a pointer to anything.
* `[]string`: an array of primitive strings.
* `map[string]number`: a map from strings to numbers.
* `(string,string)bool`: a function with two string parameters and a
boolean return type.
* `[]aws:s3/Bucket`: an array of objects whose class is `Bucket` from
the package `aws` and its module `s3`.
This change introduces this notion into the parsing and handling of
type tokens. In particular, it uses recursive parsing to handle complex
nested structures, and the binder.bindTypeToken routine has been updated
to call out to these as needed, in order to produce the correct symbol.
This changes a few things around binding logic, as part of eliminating
all of the legacy logic and weaving it into the new codebase:
* Give Scopes access to the Context object. Related, add a TryRegister
method to Scope that is like RequireRegister, except that instead of
fail-fast upon encountering a duplicate entry, it will issue an error.
* Move all typecheck visitation functions out of the big honkin' switch
and into their own member functions. As this stuff gets more complex,
having everything in one routine was starting to irk my sensibilities.
* Validate that packages have names.
* Store both the package symbol, plus the canonicalized URL used to
resolve it, in the package map. This will help us verify that versions
match for multiple package references resolving to the same symbol.
* Add nice inquiry methods to the Class symbol (Sealed, Abstract, Record,
Interface) that simplify accessing the modifiers on the underlying node.
The options structure will be shared between multiple passes of
compilation, including evaluation and graph generation. Therefore,
it must not be in the pkg/compile package, else we would create
package cycles. Now that the options structure is barebones --
and, in particular, no more "backend" settings pollute it -- this
refactoring actually works.