Clarify select() handling for ssh connections
This change is motivated by an ssh oddity: when ControlPersist is enabled, the first (i.e. master) connection goes into the background; we see EOF on its stdout and the process exits, but we never see EOF on its stderr. So if we ran a command like this: ANSIBLE_SSH_PIPELINING=1 ansible -T 30 -vvv somehost -u someuser -m command -a whoami We would first do select([stdout,stderr], timeout) and read the command module output, then select([stdout,stderr], timeout) again and read EOF on stdout, then select([stderr], timeout) AGAIN (though the process has exited), and select() would wait for the full timeout before returning rfd=[], and then we would exit. The use of a very short timeout in the code masked the underlying problem (that we don't see EOF on stderr). It's always preferable to call select() with a long timeout so that the process doesn't use any CPU until one of the events it's interested in happens (and then select will return independent of elapsed time). (A long timeout value means "if nothing happens, sleep for up to <x>"; omitting the timeout value means "if nothing happens, sleep forever"; specifying a zero timeout means "don't sleep at all", i.e. poll for events and return immediately.) This commit uses a long timeout, but explicitly detects the condition where we've seen EOF on stdout and the process has exited, but we have not seen EOF on stderr. If and only if that happens, it reruns select() with a short timeout (in practice it could just exit at that point, but I chose to be extra cautious). As a result, we end up calling select() far less often, and use less CPU while waiting, but don't sleep for a long time waiting for something that will never happen. Note that we don't omit the timeout to select() altogether because if we're waiting for an escalation prompt, we DO want to give up with an error after some time. We also don't set exceptfds, because we're not actually acting on any notifications of exceptional conditions.
This commit is contained in:
parent
03127dcfae
commit
6e82df451a
1 changed files with 24 additions and 19 deletions
|
@ -444,11 +444,7 @@ class Connection(ConnectionBase):
|
|||
state += 1
|
||||
|
||||
while True:
|
||||
poll_timeout = 0.1
|
||||
if state <= states.index('awaiting_escalation'):
|
||||
poll_timeout = timeout
|
||||
|
||||
rfd, wfd, efd = select.select(rpipes, [], rpipes, poll_timeout)
|
||||
rfd, wfd, efd = select.select(rpipes, [], [], timeout)
|
||||
|
||||
# We pay attention to timeouts only while negotiating a prompt.
|
||||
|
||||
|
@ -536,24 +532,33 @@ class Connection(ConnectionBase):
|
|||
self._send_initial_data(stdin, in_data)
|
||||
state += 1
|
||||
|
||||
# Now we just wait for the process to exit. Output is already being
|
||||
# accumulated above, so we don't need to do anything special here.
|
||||
# Now we're awaiting_exit: has the child process exited? If it has,
|
||||
# and we've read all available output from it, we're done.
|
||||
|
||||
status = p.poll()
|
||||
# only break out if no pipes are left to read or
|
||||
# the pipes are completely read and
|
||||
# the process is terminated
|
||||
if (not rpipes or not rfd) and status is not None:
|
||||
if p.poll() is not None:
|
||||
if not rpipes or not rfd:
|
||||
break
|
||||
# No pipes are left to read but process is not yet terminated
|
||||
# Only then it is safe to wait for the process to be finished
|
||||
# NOTE: Actually p.poll() is always None here if rpipes is empty
|
||||
elif not rpipes and status == None:
|
||||
|
||||
# When ssh has ControlMaster (+ControlPath/Persist) enabled, the
|
||||
# first connection goes into the background and we never see EOF
|
||||
# on stderr. If we see EOF on stdout and the process has exited,
|
||||
# we're done. Just to be extra sure that we aren't missing any
|
||||
# output on stderr, we call select again with a small timeout.
|
||||
|
||||
if not p.stdout in rpipes:
|
||||
timeout = 0.001
|
||||
continue
|
||||
|
||||
# If the process has not yet exited, but we've already read EOF from
|
||||
# its stdout and stderr (and thus removed both from rpipes), we can
|
||||
# just wait for it to exit.
|
||||
|
||||
elif not rpipes:
|
||||
p.wait()
|
||||
# The process is terminated. Since no pipes to read from are
|
||||
# left, there is no need to call select() again.
|
||||
break
|
||||
|
||||
# Otherwise there may still be outstanding data to read.
|
||||
|
||||
# close stdin after process is terminated and stdout/stderr are read
|
||||
# completely (see also issue #848)
|
||||
stdin.close()
|
||||
|
|
Loading…
Add table
Reference in a new issue