pulumi

History

joeduffy db5318b0a5 Make the CLI's waitForUpdates more resilient to transient failure We saw an issue where a user was mid-update, and got a networking error stating `read: operation timed out`. We believe this was simply a local client error, due to a flaky network. We should be resilient to such things during updates, particularly when there's no way to "reattach" to an in-progress udpate (see pulumi/pulumi#762). This change accomplishes this by changing our retry logic in the cloud backend's waitForUpdates function. Namely: * We recognize three types of failure, and react differently: - Expected HTTP errors. For instance, the 504 Gateway Timeouts that we already retried in the face of. In these cases, we will silently retry up to 10 times. After 10 times, we begin warning the user just in case this is a persistent condition. - Unexpected HTTP errors. The CLI will quit immediately and issue an error to the user, in the usual ways. This covers Unauthorized among other things. Over time, we may find that we want to intentionally move some HTTP errors into the above. - Anything else. This covers the transient networking errors case that we have just seen. I'll admit, it's a wide net, but any instance of this error issues a warning and it's up to the user to ^C out of it. We also log the error so that we'll see it if the user shares their logs with us. * We implement backoff logic so that we retry very quickly (100ms) on the first failure, and more slowly thereafter (1.5x, up to a max of 5 seconds). This helps to avoid accidentally DoSing our service.		2017-12-26 09:40:51 -08:00
..
cloud	Make the CLI's waitForUpdates more resilient to transient failure	2017-12-26 09:40:51 -08:00
local	Support secrets for cloud stacks.	2017-12-22 07:59:27 -08:00
state	Support secrets for cloud stacks.	2017-12-22 07:59:27 -08:00
backend.go	Support secrets for cloud stacks.	2017-12-22 07:59:27 -08:00
stack.go	Support secrets for cloud stacks.	2017-12-22 07:59:27 -08:00