Fixes#2650.
We have historically relied on merging inputs and outputs in several places in the engine. This used to be necessary, as discussed in #2650 (comment), but our core engine model has moved away from depending on this. However, we still have a couple places we do this merge, and those places have triggered several severe issues recently in subtle cases.
We believe that this merging should no longer be needed for a correct interpretation of the current engine model, and indeed that doing the merge actively violates the contract with providers. In this PR we remove the remaining places where this input + output merge was being done. In all three cases, we use just the Outputs, which for most providers will already include the same values as the inputs - but correctly as determined by the provider itself.
* Load specific provider versions if requested
As part of pulumi/pulumi#2389, we need the ability for language hosts to
tell the engine that a particular resource registration, read, or invoke
needs to use a particular version of a resource provider. This was not
previously possible before; the engine prior to this commit loaded
plugins from a default provider map, which was inferred for every
resource provider based on the contents of a user's package.json, and
was itself prone to bugs.
This PR adds the engine support needed for language hosts to request a
particular version of a provider. If this occurs, the source evaluator
specifically records the intent to load a provider with a given version
and produces a "default" provider registration that requests exactly
that version. This allows the source evaluator to produce multiple
default providers for a signle package, which was previously not
possible.
This is accomplished by having the source evaluator deal in the
"ProviderRequest" type, which is a tuple of version and package. A
request to load a provider whose version matches the package of a
previously loaded provider will re-use the existing default provider. If
the version was not previously loaded, a new default provider is
injected.
* CR Feedback: raise error if semver is invalid
* CR: call String() if you want a hash key
* Update pkg/resource/deploy/providers/provider.go
Co-Authored-By: swgillespie <sean@pulumi.com>
Fixes#2633.
Currently when a user runs `refresh` and a resource is in a state of
error, the `refresh` will fail and the resource state will not be
persisted. This can make it vastly harder to incrementally fix
infrastructure. The issue mentioned above explains more of the
historical context, as well as some specific failure modes.
This commit resolves this issue by causing refresh to *not* report an
error in this case, and instead to simply log a warning that the
`refresh` has recognized that the resource is in an unhealthy state
during state sync.
This commit switches from dep to Go 1.12 modules for tracking Pulumi
dependencies. Rather than _building_ using Go modules, we instead use the `go
mod vendor` command to populate a vendor tree in the same way as `dep ensure`
was previously doing.
In order to prevent checksum mismatches, it was necessary to also update CI to
use Go 1.12 instead of 1.11 - which also necessitated fixing some linting errors
which appeared with the upgraded golangci-lint for 1.12.
* Install missing plugins on startup
This commit addresses the problem of missing plugins by scanning the
snapshot and language host on startup for the list of required plugins
and, if there are any plugins that are required but not installed,
installs them. The mechanism by which plugins are installed is exactly
the same as 'pulumi plugin install'.
The installation of missing plugins is best-effort and, if it fails,
will not fail the update.
This commit addresses pulumi/pulumi-azure#200, where users using Pulumi
in CI often found themselves missing plugins.
* Add CHANGELOG
* Skip downloading plugins if no client provided
* Reduce excessive test output
* Update Gopkg.lock
* Update pkg/engine/destroy.go
Co-Authored-By: swgillespie <sean@pulumi.com>
* CR: make pluginSet a newtype
* CR: Assign loop induction var to local var
These changes take advantage of the newly-added support for returning
inputs from Read to update a resource's inputs as part of a refresh.
As a consequence, the Pulumi engine will now properly detect drift
between the actual state of a resource and the desired state described
by the program and generate appropriate update or replace steps.
As part of these changes, a resource's old inputs are now passed to the
provider when performing a refresh. The provider can take advantage of
this to maintain the accuracy of any additional data or metadata in the
resource's inputs that may need to be updated during the refresh.
This is required for the complete implementation of
https://github.com/pulumi/pulumi-terraform/pull/349. Without access to
the old inputs for a resource, TF-based providers would lose all
information about default population during a refresh.
If a provider returns information about the top-level properties that
differ, use those keys to filter the diffs that are rendered to the
user.
Fixes#2453.
These changes add a new flag to the various `ResourceOptions` types that
indicates that a resource should be deleted before it is replaced, even
if the provider does not require this behavior. The usual
delete-before-replace cascade semantics apply.
Fixes#1620.
In general, a "delete" in Pulumi is destroying an actual physical
resource. In the case of a read resource, however, the delete is
merely removing the resource from the stack; the physical resource
is not affected. These changes attempt to clarify this situation by
using the term "discard" rather than "delete".
Fixes#2015.
This implements the new algorithm for deciding which resources must be
deleted due to a delete-before-replace operation.
We need to compute the set of resources that may be replaced by a
change to the resource under consideration. We do this by taking the
complete set of transitive dependents on the resource under
consideration and removing any resources that would not be replaced by
changes to their dependencies. We determine whether or not a resource
may be replaced by substituting unknowns for input properties that may
change due to deletion of the resources their value depends on and
calling the resource provider's Diff method.
This is perhaps clearer when described by example. Consider the
following dependency graph:
A
__|__
B C
| _|_
D E F
In this graph, all of B, C, D, E, and F transitively depend on A. It may
be the case, however, that changes to the specific properties of any of
those resources R that would occur if a resource on the path to A were
deleted and recreated may not cause R to be replaced. For example, the
edge from B to A may be a simple dependsOn edge such that a change to
B does not actually influence any of B's input properties. In that case,
neither B nor D would need to be deleted before A could be deleted.
In order to make the above algorithm a reality, the resource monitor
interface has been updated to include a map that associates an input
property key with the list of resources that input property depends on.
Older clients of the resource monitor will leave this map empty, in
which case all input properties will be treated as depending on all
dependencies of the resource. This is probably overly conservative, but
it is less conservative than what we currently implement, and is
certainly correct.
This ensures that the gRPC server is properly shut down. This fixes an
issue in which a resource plugin that is still configuring could report
log messages to the plugin host, which would in turn attempt to send
diagnostic packets over a closed channel, causing a panic.
Fixes#2170.
After #2088, we began calling `Diff` on providers that are not configured
due to unknown configuration values. This hit an assertion intended to
detect exactly this scenario, which was previously unexpected.
These changes adjust `Diff` to indicate that a Diff is unavailable and
return an error message that describes why. The step generator then
interprets the diff as indicating a normal update and issues the error
message to the diagnostic stream.
Fixes#2223.
These changes add a new resource to the Pulumi SDK,
`pulumi.StackReference`, that represents a reference to another stack.
This resource has an output property, `outputs`, that contains the
complete set of outputs for the referenced stack. The Pulumi account
performing the deployment that creates a `StackReference` must have
access to the referenced stack or the call will fail.
This resource is implemented by a builtin provider managed by the engine.
This provider will be used for any custom resources and invokes inside
the `pulumi:pulumi` module. Currently this provider supports only the
`pulumi:pulumi:StackReference` resource.
Fixes#109.
We run the same suite of changes that we did on gometalinter. This
ended up catching a few new issues, some of which were addressed and
some of which were baselined.
In preparation for some workspace restructuring, I decided to scratch a
few itches of my own in the code:
* Change project's RuntimeInfo field to just Runtime, to match the
serialized name in JSON/YAML.
* Eliminate the no-longer-used Context and NoDefaultIgnores fields on
project, and all of the associated legacy PPC-related code.
* Eliminate the no-longer-used IgnoreFile constant.
* Remove a bunch of "// nolint: lll" annotations, and simply format
the structures with comments on dedicated lines, to avoid overly
lengthy lines and lint suppressions.
* Mark Dependencies and InitErrors as `omitempty` in the JSON
serialization directives for CheckpointV2 files. This was done for
the YAML directives, but (presumably accidentally) omitted for JSON.
Downlevel versions of the Pulumi Node SDK assumed that a parallelism
level of zero implied serial execution, which current CLIs use to signal
unbounded parallelism. This commit works around the downlevel issue by
using math.MaxInt32 to signal unbounded parallelism.
The provider registry was checking for a `nil` provider instance before
checking for a non-nil error. This caused the CLI to fail to report
important errors during the plugin load process (e.g. invalid checkpoint
errors) and instead report a failure to find a matching plugin.
Some providers (namely Kubernetes) require unbounded parallelism in
order to function correctly. This commit enables the engine to operate
in a mode with unbounded parallelism and switches to that mode by
default.
The preview will proceed as if the operations had not been issued (i.e.
we will not speculate on a new state for the stack). This is consistent
with our behavior prior to the changes that added pending operations to
the checkpoint.
* Process deletions conservatively in parallel
This commit allows the engine to conservatively delete resources in
parallel when it is sure that it is legal to do so. In the absence of a
true data-flow oriented step scheduler, this approach provides a
significant improvement over the existing serial deletion mechanism.
Instead of processing deletes serially, this commit will partition the
set of condemned resources into sets of resources that are known to be
legally deletable in parallel. The step executor will then execute those
independent lists of steps one-by-one until all steps are complete.
* CR: Make ResourceSet a normal map
* Only use the dependency graph if we can trust it
* Reverse polarity of pendingDeletesAreReplaces
* CR: un-export a few types
* CR: simplify control flow in step generator when scheduling
* CR: parents are dependencies, fix loop index
* CR: Remove ParentOf, add new test for parent dependencies
Recently, we eliminated bright black text, which IMHO makes the
"same" lines really stand out more than we want them to. This is
partly just due to the heavyweight nature of the "*" character,
which we precede every line with. This has the effect of making it
toughter to scan the update to see what's going to happen. The goal
of SpecUnimportant (bright black) was that we wanted to draw less
attention to certain elements of the CLI text -- and have them fade
into the background (apparently it was too successful at this ;-))
So, this change eliminates the "*" prefix for same operations
altogether. It reads better to my eyes and keeps the original intent.
* Retire pending deletions at start of plan
Instead of letting pending deletions pile up to be retired at the end of
a plan, this commit eagerly disposes of any pending deletions that were
pending at the end of the previous plan. This is a nice usability win
and also reclaims an invariant that at most one resource with a given
URN is live and at most one is pending deletion at any point in time.
* Rebase against master
* Fix a test issue arising from shared snapshots
* CR feedback
* plan -> replacement
* Use ephemeral statuses to communicate deletions
We signal provider cancellation by hangning a goroutine off of the plan
executor's parent context. To ensure clean shutdown, this goroutine also
listens on a channel that closes once the plan has finished executing.
Unfortunately, we were closing this channel too early, and the close was
racing with the cancellation signal. These changes ensure that the
channel closes after the plan has fully completed.
Fixes#1906.
Fixespulumi/pulumi-kubernetes#185.
* Validate type tokens before using them
When registering or reading a resource, we take the type token given to
us from the language host and assume that it's valid, which resulted in
assertion failures in various places in the engine. This commit
validates the format of type tokens given to us from the language host
and issues an appropriate error if it's not valid.
Along the way, this commit also improves the way that fatal exceptions
are rendered in the Node language host.
* Pre-allocate an exception for ReadResource
* Fix integration test
* CR Feedback
This commit is a lower-impact change that fixes the bugs associated with
invalid types on component resources and only checks that a type is
valid on custom resources.
* CR Take 2: Fix up IsProviderType instead of fixing call sites
* Please gometalinter
* Introduce Result type to engine
The Result type can be used to signal the failure of a computation due
to both internal and non-internal reasons. If a computation failed due
to an internal error, the Result type carries that error with it and
provides it when the 'Error' method on a Result is called. If a
computation failed gracefully, but wished to bail instead of continue a
doomed plan, the 'Error' method provides a value of null.
* CR feedback
This commit reverts most of #1853 and replaces it with functionally
identical logic, using the notion of status message-specific sinks.
In other words, where the original commit implemented ephemeral status
messages by adding an `isStatus` parameter to most of the logging
methdos in pulumi/pulumi, this implements ephemeral status messages as a
parallel logging sink, which emits _only_ ephemeral status messages.
The original commit message in that PR was:
> Allow log events to be marked "status" events
>
> This commit will introduce a field, IsStatus to LogRequest. A "status"
> logging event will be displayed in the Info column of the main
> display, but will not be printed out at the end, when resource
> operations complete.
>
> For example, for complex resource initialization, we'd like to display
> a series of intermediate results: [1/4] Service object created, for
> example. We'd like these to appear in the Info column, but not at the
> end, where they are not helpful to the user.
- Create all refresh steps before issuing any. This is important as the
state update loop expects all steps to exist.
- Check for cancellation later in the refresher.
This also fixes races in the SnapshotManager and the test journal that
could cause panics during cancellation.
This commit will greatly improve the experience of dealing with partial
failures by simply re-trying to initialize the relevant resources on
every subsequent `pulumi up`, instead of printing a list of reasons the
resource had previously failed to initialize.
As motivation, consider our behavior in the following common, painful
scenario:
* The user creates a `Service` and a `Deployment`.
* The `Pod`s in the `Deployment` fail to become live. This causes the
`Service` to fail, since it does not target any live `Pod`s.
* The user fixes the `Deployment`. A run of `pulumi up` sees the
`Pod`s successfully initialize.
* Users will expect that the `Service` is now in a state of success,
as the `Pod`s it targets are alive. But, because we don't update the
`Service` by default, it perpetually exists in a state of error.
* The user is now required to change some trivial feature of the
`Service` just to trigger an update, so that we can see it succeed.
There are many situations like this. Another very common one is waiting
for test `Pod`s that are meant to successfully complete when some object
becomes live.
By triggering an empty update step for all resources that have any
initialization errors, we avoid all problems like this.
This commit will implement this empty-update semantics for partial
failures, as well as fix the display UX to correctly render the diff in
these cases.
Replace the Source-based implementation of refresh with a phase that
runs as the first part of plan execution and rewrites the snapshot in-memory.
In order to fit neatly within the existing framework for resource operations,
these changes introduce a new kind of step, RefreshStep, to represent
refreshes. RefreshSteps operate similar to ReadSteps but do not imply that
the resource being read is not managed by Pulumi.
In addition to the refresh reimplementation, these changes incorporate those
from #1394 to run refresh in the integration test framework.
Fixes#1598.
Fixespulumi/pulumi-terraform#165.
Contributes to #1449.
These changes simplify a couple aspects of plan execution in the hopes of
clarifying some responsibilities and preparing the code for changes to the
implementation of refresh.
1. All aspects of plan execution are now managed by the plan executor,
which is no longer exported. Instead, it is abstracted behind
`Plan.Execute`.
2. The plan executor's error-handling and reporting have been unified
and simplified somewhat.
* Log errors coming from the language host
Similar to pulumi/pulumi#1762, fixespulumi/pulumi#1775. The language
host can fail without issuing any diagnostics and it is very unclear
what happens if the engine does not log the error.
* CR feedback
1. 'readID' was never assigned to and was always the default value,
leading the refresh source to believe a resource was deleted
2. The refresh source could hang when a resource is deleted.
The plan executor assumed that the step generator was responsible for
logging its own diagnostics, which it sort-of is but also doesn't log a
majority of the diagnositcs that come out of it. This commit logs all
errors coming out of step generation so that we don't unintentionally
drop errors.
* Add a list of in-flight operations to the deployment
This commit augments 'DeploymentV2' with a list of operations that are
currently in flight. This information is used by the engine to keep
track of whether or not a particular deployment is in a valid state.
The SnapshotManager is responsible for inserting and removing operations
from the in-flight operation list. When the engine registers an intent
to perform an operation, SnapshotManager inserts an Operation into this
list and saves it to the snapshot. When an operation completes, the
SnapshotManager removes it from the snapshot. From this, the engine can
infer that if it ever sees a deployment with pending operations, the
Pulumi CLI must have crashed or otherwise abnormally terminated before
seeing whether or not an operation completed successfully.
To remedy this state, this commit also adds code to 'pulumi stack
import' that clears all pending operations from a deployment, as well as
code to plan generation that will reject any deployments that have
pending operations present.
At the CLI level, if we see that we are in a state where pending
operations were in-flight when the engine died, we'll issue a
human-friendly error message that indicates which resources are in a bad
state and how to recover their stack.
* CR: Multi-line string literals, renaming in-flight -> pending
* CR: Add enum to apitype for operation type, also name status -> type for clarity
* Fix the yaml type
* Fix missed renames
* Add implementation for lifecycle_test.go
* Rebase against master
Some time ago, we introduced the concept of the initialization error to
Pulumi (i.e., an error where the resource was successfully created but
failed to fully initialize). This was originally implemented in `Create`
and `Update` methods of the resource provider interface; when we
detected an initialization failure, we'd pack the live version of the
object into the error, and return that to the engine.
Omitted from this initial implementation was a similar semantics for
`Read`. There are many implications of this, but one of them is that a
`pulumi refresh` will erase any initialization errors that had
previously been observed, even if the initialization errors still exist
in the resource.
This commit will introduce the initialization error semantics to `Read`,
fixing this issue.
* Emit reads for external resources when refreshing
Fixespulumi/pulumi#1744. This commit educates the refresh source about
external resources. If a refresh source encounters a resource with the
External bit set, it'll send a Read event to the engine and the engine
will process it accordingly.
* CR: save last event channel instead of last event, style fixes
* Serialize SourceEvents coming from the refresh source
The engine requires that a source event coming from a source be "ready
to execute" at the moment that it is sent to the engine. Since the
refresh source sent all goal states eagerly through its source iterator,
the engine assumed that it was legal to execute them all in parallel and
did so. This is a problem for the snapshot, since the snapshot expects
to be in an order that is a legal topological ordering of the dependency
DAG.
This PR fixes the issue by sending refresh source events one-at-a-time
through the refresh source iterator, only unblocking to send the next
step as soon as the previous step completes.
* Fix deadlock in refresh test
* Fix an issue where the engine "completed" steps too early
By signalling that a step is done before committing the step's results
to the snapshot, the engine was left with a race where dependent
resources could find themselves completely executed and committed before
a resource that they depend on has been committed.
Fixespulumi/pulumi#1726
* Fix an issue with Replace steps at the end of a plan
If the last step that was executed successfully was a Replace, we could
end up in a situation where we unintentionally left the snapshot
invalid.
* Add a test
* CR: pass context.Context as first parameter to Iterate
* CR: null->nil
This is consistent with the behavior prior to the introduction of Read
steps. In order to avoid a breaking change we must do this check in the
engine itself, which causes a bit of a layering violation: because IDs
are marshaled as raw strings rather than PropertyValues, the engine must
check against the marshaled form of an unknown directly (i.e.
`plugin.UnknownStringValue`).