21a23b86f9
This change articulates Mu's packaging format, MuPack, along with its corresponding intermediate language and type system, MuIL. This is very much a work in progress.
225 lines
17 KiB
Markdown
225 lines
17 KiB
Markdown
# Mu Languages
|
|
|
|
Mu cloud topologies are described to the toolchain using three language formats.
|
|
|
|
At the highest level, developers write Mu modules using a high-level language. There are multiple languages to choose
|
|
from, and each is a proper subset of an existing popular programming language. MuJS is a subset of JavaScript, MuPy is
|
|
a subset of Python, MuRu is a subset of Ruby, and MuGo is a subset of Go, for example. The restrictions placed on these
|
|
languages are simply to ensure static analyzability, determinism, and compilability into an intermediate form. To
|
|
distinguish between these and their ordinary counterparts, we call these Mu Metadata Languages (MetaMus).
|
|
|
|
In the middle, a packaging format, Mu Package Metadata (MuPack), is a standard metadata representation for a MuPackage.
|
|
This is the unit of distribution in the classical package management sense. MuPack is inherently multi-langauge and
|
|
contains both internal and exported modules, types, functions, and variables. All code is serialized in an intermediate
|
|
language, MuIL, that is suitable for interpretation for the Mu toolchain. Because of computations, the final "shape" of
|
|
the cloud topology cannot yet be determined until the MuPackage is evaluated as part of a deployment planning step.
|
|
|
|
The final shape, Mu Graph Language (MuGL), represents a complete cloud topology with concrte property values. Any graph
|
|
can be compared to any other graph to compute a delta, a capability essential to incremental deployment and drift
|
|
analysis. Each graph is [directed and acyclic](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (DAG), in which
|
|
nodes are cloud services, edges are [directed dependencies](https://en.wikipedia.org/wiki/Dependency_graph) between
|
|
services, and all input and output properties are known. Any given MuPackage can create many possible MuGL graphs,
|
|
and identical MuGL graphs can be created by different MuPackages, because MuPackage can contain parameterized logic and
|
|
conditional computations. A MuGL graph can also be generated from a live environment.
|
|
|
|
This document describes the various language concepts at play, the requirements for a high-level Mu language (although
|
|
details for each language are specified elsehwere), the MuPack and MuGL formats, and the overall compilation process.
|
|
|
|
## Mu Metadata Languages (MetaMus)
|
|
|
|
We envision a collection of high-level languages so IT professionals and developers can pick the one they feel most
|
|
comfortable with. For example, we currently plan to support JavaScript (MuJS), Python (MuPy), Ruby (MuRu), and Go
|
|
(MuGo). Furthermore, we imagine translators from other cloud topology formats like AWS CloudFormation and Hashicorp
|
|
Terraform. These are called metadata languages, or MetaMus, and we call code written in them *programs*.
|
|
|
|
In principle, there is no limit to the breadth of MetaMus that Mu can support -- and it is indeed extensible by 3rd
|
|
parties -- although we do require that any MetaMu compiles down into MuPack. This is admittedly a bit more difficult
|
|
for fully dynamically typed languages -- for example, it requires a real compiler to analyze and emit code statically --
|
|
although the task is certainly not impossible (as evidenced by MuJS, MuPy, and MuRu support).
|
|
|
|
The restrictions placed on MetaMus streamline the task of producing cloud topology graphs, and ensure that programs
|
|
are deterministic. Determinism is important, otherwise two deployments from the exact same source programs might
|
|
result in two graphs that differ in surprising and unwanted ways. Evaluation of the the same program must be
|
|
idempotent so that graphs and target environments can easily converge and so that failures can be dealt with reliably.
|
|
|
|
In general, this means MetaMus may not perform these actions:
|
|
|
|
* I/O of any kind (network, file, etc).
|
|
* Syscalls (except for those excplicitly blessed as being deterministic).
|
|
* Invocation of non-MetaMu code (including 3rd party packages).
|
|
* Any action that is not transitively analyable through global analysis (like C FFIs).
|
|
|
|
Examples of existing efforts to define such a subset in JavaScript, simply as an illustration, include: [Gatekeeper](
|
|
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/gatekeeper_tr.pdf), [ADsafe](
|
|
http://www.adsafe.org/), [Caja](https://github.com/google/caja), [WebPPL](http://dippl.org/chapters/02-webppl.html),
|
|
[Deterministic.js](https://deterministic.js.org/), and even JavaScript's own [strict mode](
|
|
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Strict_mode). There are also multiple attempts to
|
|
catalogue sources of nondeterminism in [JavaScript](
|
|
https://github.com/burg/timelapse/wiki/Note-sources-of-nondeterminism) and [its variants](
|
|
https://github.com/WebAssembly/design/blob/master/Nondeterminism.md).
|
|
|
|
MetaMus may in fact consume 3rd party packages (e.g., from NPM, Pip, or elsewhere), but they must be blessed by the MetaMu
|
|
compiler for your language of choice. This means recompiling packages from source -- and dealing with the possibility
|
|
that they will fail to compile if they perform illegal operations -- or, preferably, using a package that has already
|
|
been pre-compiled using a MetaMu compiler, likely in MuPack format, in which case you are guaranteed that it will work
|
|
without any unexpected difficulties. The Mu Hub contains precompiled packages in this form for easy consumption.
|
|
|
|
Each MetaMu program is *compiled* into a MuPackage, encoded in MuPack.
|
|
|
|
## Mu Package Metadata (MuPack)
|
|
|
|
Each MuPackage is encoded in MuPack and serialized in a JSON/YAML form for easy toolability.
|
|
|
|
MuPack is the unit of sharing and reuse, and includes high-level metadata about the module's contents, in addition to
|
|
its modules, types, functions, and variables. Each MuPackage can be either a "library" -- purely meant for sharing and
|
|
reuse purposes -- or it can be an "executable" -- in which case it can create a MuGL cloud topology all on its own.
|
|
|
|
MuPack uses a "JSON-like" type system so that its type system is accessible to many MetaMus. This eases Internet-scale
|
|
interoperability and facilitates cross-language reuse. This type system may be extended with custom types, including
|
|
custom service types that encapsulate patterns of infrastructure coordination, and schema types that govern the shape of
|
|
data and property values being exchanged. Any of these types and functions may or may not be exported from the module.
|
|
|
|
All executable code is encoded using a simple intermediate language, MuIL, that is interpreted by the Mu toolchain.
|
|
This captures a deterministic, bounded set of type system and execution constructs that a subset of most higher level
|
|
languages can target and consume. The design has been inspired by existing "minimalistic" multi-language intermediate
|
|
formats, and is very similar to [CIL](https://www.ecma-international.org/publications/standards/Ecma-335.htm), with
|
|
elements of [asm.js](http://asmjs.org/spec/latest/) mixed in (particularly for the dynamic elements).
|
|
|
|
This IL is fully bound, so that IL processing needn't re-parse, re-analyze, or re-bind the resulting trees. This has
|
|
performance advantages and simplifies the toolchain. An optional verifier can check to ensure ASTs are well-formed.
|
|
|
|
The full MuPack specification is available [here](mupack.md).
|
|
|
|
There are two actions that can be taken against an executable MuPackage, both resulting in a MuGL graph:
|
|
|
|
* A MuPack may generate a *plan*, which is a form of graph that doesn't reflect an actual deployment yet. Instead, it
|
|
is essentially a "dry run" of the deployment process. To create a plan, the MuPack's main entrypoint must be invoked,
|
|
meaning that all of its arguments must be supplied. Because this is a dry run, a plan's graph may be incomplete.
|
|
|
|
* A plan may then be *applied* through a similar process. The only difference between a plan and the application of
|
|
that plan is that, if the plan contains dependencies on output properties from services that are to be created,
|
|
those values are obviously unknown a priori. Therefore, the plan might contain "holes", which will be shown in the
|
|
plan output. The most subtle aspect of this is that, thanks to conditional execution, the plan may in fact not just
|
|
have holes in the values, but also uncertainty around specifically which services will be created or updated. The
|
|
application process performs the physical deployment steps, so all outputs are known before completion.
|
|
|
|
The result of both steps is a MuGL graph, one being more complete than the other.
|
|
|
|
## Mu Graph Language (MuGL)
|
|
|
|
MuGL is the simplest and final frontier of Mu's languages. Each MuGL artifact -- something we just call a *graph* --
|
|
can be an in-memory data structure and/or a serialized JSON or YAML document. It contains a graph in which each node
|
|
represents a service, each edge is a dependeny between services, and each input and output property value in the graph
|
|
is a concrete, known value, and no unresolved computations are present in the graph (holes notwithstanding).
|
|
|
|
Each graph represents the outcome of some deployment activity, either planned or actually having taken place. Subtly,
|
|
the graph is never considered the "source of truth"; only the corresponding live running environment can be the source
|
|
of truth. Instead, the graph describes the *intended* eventual state that a deployment activity is meant to achieve. A
|
|
process called *reconciliation* may be used to compare differences between the two -- either on-demand or as part of a
|
|
continuous deployment process -- and resolve any differences as appropriate (through updates in either direction).
|
|
|
|
Each node in a graph carries the service's unique ID, human-friendly name, type, and its set of named property values.
|
|
|
|
A service's type tells the MuGL toolchain how to deal with physical resources that need to be created, read, updated, or
|
|
deleted, and governs which properties are legal and their expected types. Note that any module references within the
|
|
MuGL file still refer to the MuPackages that define abstractions, which are still resolved during evaluation. All
|
|
module references will have been "locked" to a specific version of that MuPackage, however, for repeatability.
|
|
|
|
Edges between these nodes represent dependencies, and are therefore directed, and must be explicit. Despite property
|
|
values potentially governing the dependencies, these are gone by the time MuGL is created. Therefore, the translation
|
|
from MuPack to MuGL is responsible for fully specifying the set of service dependencies.
|
|
|
|
The graph is complete. That is, even though dependencies on 3rd party modules may remain, the full [transitive
|
|
closure](https://en.wikipedia.org/wiki/Transitive_closure) of services created by all MuPackages is present.
|
|
Because the graph is a DAG, any cycles in this graph are illegal and will result in an error. It is ideal if higher-
|
|
level translation catches this, since each step in the translation process reduces the diagnosability of errors.
|
|
|
|
TODO: a complete file format specification.
|
|
|
|
TODO: specify how "holes" show up during planning.
|
|
|
|
### Resource Providers
|
|
|
|
Some services are simply abstractions. They exist solely as convenient ways to create other services, but, at the end
|
|
of the day, there are no physical manifestations of them.
|
|
|
|
Some services, however, correspond to real physical resources that must be consulted or manipulated during the planning
|
|
and/or application processes. The extensibility mechanism used to define this logic is part of the Mu SDK and is called
|
|
a *resource provider*. Each resource provider is associated with a Mu service type, is written in Go, and provides
|
|
standard create, read, update, and delete (CRUD) operations, that achieve the desired state changes in an environment.
|
|
|
|
Resource providers are dynamically loaded plugins that implement a standard set of interfaces. Each resource type in Mu
|
|
must resolve to a provider, otherwise an error occurs. Each plugin can contain multiple resource providers.
|
|
|
|
TODO: articulate the interface.
|
|
|
|
### Graph Queryability
|
|
|
|
TODO[marapongo/mu#30]: queryability (GraphQL? RDF/SPARQL? Neo4j/Cypher? Gremlin? Etc.)
|
|
|
|
## Scenarios
|
|
|
|
In this section, we'll walk through a few motivational scenarios beyond the usual compilation process from a high-level
|
|
MetaMu, to MuPack, all the way to MuGL which is deployed to an environment. We will see how the file formats are used.
|
|
|
|
### Generating MuGL from a Live Environment
|
|
|
|
An existing environment can be used to generate MuGL. This is called *graph inference*.
|
|
|
|
This can make adoption of Mu easier if you already have an environment you wish to model. It can also facilitate
|
|
identifying "drift" between a desired and actual state; we will see more about this in a moment.
|
|
|
|
Any MuGL generated in this manner may have less information than MuGL generated from MetaMu and MuPack, due to the
|
|
possibility of lossy representations and/or missing abstractions in an actual live environment. For example, there
|
|
could be "hidden" implicit dependencies between services that are not expressed in the resulting MuGL file.
|
|
Nevertheless, this can be a great first step towards adopting Mu for your existing environments.
|
|
|
|
Generating MuGL from a live environment that was created using Mu, on the other hand, can recover all of this
|
|
information reliably, thanks to special tagging that Mu performs.
|
|
|
|
Some services map to physical artifacts in a deployment -- like a VM in your favorite cloud -- while other serivces are
|
|
simply abstractions. In the case of abstractions, there is a limit to how much "reverse engineering" from a live
|
|
environment can happen. The application of an abstraction merely serves to create those physical resources that are at
|
|
the "bottom" of the dependency chain. That said, mechanisms exist to augment an environment with metadata.
|
|
|
|
### Comparing Two MuGLs
|
|
|
|
A primary feature of MuGLs is that two of them can be compared to produce a diff. This has several use cases.
|
|
|
|
Mu performs a diff between two MuGL files to determine a delta for purposes of incremental deployment. This allows it
|
|
to change the live environment only where a difference between actual and desired state exists.
|
|
|
|
As seen above, MuGL can be generated from a live environment. As such, a live environment can be compared to another
|
|
MuGL file -- perhaps generated from another live environment -- to determine and reconcile "drift" between them. This
|
|
could be used to discover differences between environments that are meant to be similar (e.g., in different zones,
|
|
staging vs. production, etc). Alternatively, this analysis could be used to to compare an environment against a MetaMu
|
|
program's resulting MuGL, to identify places where manual changes were made to an actual environment without having
|
|
made corresponding changes in the sources.
|
|
|
|
To cope with some of the potential lossiness during graph inference, Mu implements a *semantic diff*, in addition to a
|
|
more strict exact diff, algorithm. The semantic diff classifies differences due to lossy inference distinct from
|
|
ordinary semantically meaningful differences that could be impacting a live environment's behavior.
|
|
|
|
### Creating or Updating MetaMu and MuPack from MuGL
|
|
|
|
It is possible to raise MuGL into MuPack and, from there, raise MuPack into your favorite MetaMu.
|
|
|
|
It is, however, important to note one thing before getting into the details. There are many possible MuPackages that
|
|
could generate a given MuGL, due to conditional execution of code. There may even be many possible MetaMu programs
|
|
that could generate a given MuPackage, since MuPack's language constructs are intentionally smaller than what might
|
|
exist in a higher-level programming language. In short, lowering and re-raising is not a round-trippable operation.
|
|
|
|
Nevertheless, this raising can come in handy for two reasons.
|
|
|
|
The first is that, thanks to raising, it is possible to reconcile diffs in part by making changes to the source MetaMu
|
|
programs. If we just altered the MuGL for a given MetaMu program, the process would be incomplete, because then
|
|
the developer would be responsible for taking that altered MuGL and translating it by hand into edits to the
|
|
program. Automating this process as much as possible is obviously appealing even if -- similar to manual diff
|
|
resolution when applying source patches -- this process requires a little bit of manual, but tool-assistable, work.
|
|
|
|
The second helps when bootstrapping an existing environment into Mu for the first time. Not only can we generate the
|
|
MuGL that corresponds to an existing environment, but we can generate a MetaMu in your favorite language, that will
|
|
generate an equivalent graph. This is called *program inference*. As with graph inference, the inference might
|
|
miss key elements like dependencies, and might not include all of the desirable abstractions and metadata, however this
|
|
can serve as a useful starting point for subsequent refactoring that would introduce such things.
|
|
|