Clarify select() handling for ssh connections

This change is motivated by an ssh oddity: when ControlPersist is
enabled, the first (i.e. master) connection goes into the background; we
see EOF on its stdout and the process exits, but we never see EOF on its
stderr. So if we ran a command like this:

    ANSIBLE_SSH_PIPELINING=1 ansible -T 30 -vvv somehost -u someuser -m command -a whoami

We would first do select([stdout,stderr], timeout) and read the command
module output, then select([stdout,stderr], timeout) again and read EOF
on stdout, then select([stderr], timeout) AGAIN (though the process has
exited), and select() would wait for the full timeout before returning
rfd=[], and then we would exit. The use of a very short timeout in the
code masked the underlying problem (that we don't see EOF on stderr).

It's always preferable to call select() with a long timeout so that the
process doesn't use any CPU until one of the events it's interested in
happens (and then select will return independent of elapsed time).

(A long timeout value means "if nothing happens, sleep for up to <x>";
omitting the timeout value means "if nothing happens, sleep forever";
specifying a zero timeout means "don't sleep at all", i.e. poll for
events and return immediately.)

This commit uses a long timeout, but explicitly detects the condition
where we've seen EOF on stdout and the process has exited, but we have
not seen EOF on stderr. If and only if that happens, it reruns select()
with a short timeout (in practice it could just exit at that point, but
I chose to be extra cautious). As a result, we end up calling select()
far less often, and use less CPU while waiting, but don't sleep for a
long time waiting for something that will never happen.

Note that we don't omit the timeout to select() altogether because if
we're waiting for an escalation prompt, we DO want to give up with an
error after some time. We also don't set exceptfds, because we're not
actually acting on any notifications of exceptional conditions.
This commit is contained in:
Abhijit Menon-Sen 2015-09-24 08:57:35 +05:30 committed by James Cammarata
parent 03127dcfae
commit 6e82df451a

View file

@ -444,11 +444,7 @@ class Connection(ConnectionBase):
state += 1
while True:
poll_timeout = 0.1
if state <= states.index('awaiting_escalation'):
poll_timeout = timeout
rfd, wfd, efd = select.select(rpipes, [], rpipes, poll_timeout)
rfd, wfd, efd = select.select(rpipes, [], [], timeout)
# We pay attention to timeouts only while negotiating a prompt.
@ -536,24 +532,33 @@ class Connection(ConnectionBase):
self._send_initial_data(stdin, in_data)
state += 1
# Now we just wait for the process to exit. Output is already being
# accumulated above, so we don't need to do anything special here.
# Now we're awaiting_exit: has the child process exited? If it has,
# and we've read all available output from it, we're done.
status = p.poll()
# only break out if no pipes are left to read or
# the pipes are completely read and
# the process is terminated
if (not rpipes or not rfd) and status is not None:
break
# No pipes are left to read but process is not yet terminated
# Only then it is safe to wait for the process to be finished
# NOTE: Actually p.poll() is always None here if rpipes is empty
elif not rpipes and status == None:
if p.poll() is not None:
if not rpipes or not rfd:
break
# When ssh has ControlMaster (+ControlPath/Persist) enabled, the
# first connection goes into the background and we never see EOF
# on stderr. If we see EOF on stdout and the process has exited,
# we're done. Just to be extra sure that we aren't missing any
# output on stderr, we call select again with a small timeout.
if not p.stdout in rpipes:
timeout = 0.001
continue
# If the process has not yet exited, but we've already read EOF from
# its stdout and stderr (and thus removed both from rpipes), we can
# just wait for it to exit.
elif not rpipes:
p.wait()
# The process is terminated. Since no pipes to read from are
# left, there is no need to call select() again.
break
# Otherwise there may still be outstanding data to read.
# close stdin after process is terminated and stdout/stderr are read
# completely (see also issue #848)
stdin.close()