关于ssh调用远程后台命令挂住的解释
目前看到的最详细最全面的解释:
http://www.snailbook.com/faq/background-jobs.auto.html
SSH Frequently Asked Questions
Sometimes my SSH connection hangs when exiting — the shell (or remote command) exits, but the connection remains open, doing nothing.
Quick Fix
You're probably using the OpenSSH server, and started a background process on the server which you intended to continue after logging out of the SSH session. Fix: redirect the background process stdin/stdout/stderr streams (e.g. to files, or /dev/null if you don't care about them). For example, this hangs:
client% ssh server server% xterm & server% logout hangs...
but this behaves as expected:
client% ssh server server% xterm < /dev/null >& /dev/null & server% logout SSH session terminates client%
Short Explanation
This problem is usually due to a feature of the OpenSSH server. When writing an SSH server, you have to answer the question, "When should the server close the SSH connection?" The obvious answer might seem to be: close it when the server-side user program started by client request (shell or remote command) exits. However, it's actually a bit more complicated; this simple strategy allows a race condition which can cause data loss (see the explanation below). To avoid this problem,sshd instead waits until it encounters end-of-file (eof) on the pipes connecting to the stdout and stderr of the user program.
This strategy, however, can have unexpected consequences. In Unix, an open file does not return eof until all references to it have been closed. When you start a background process from the shell on the server, it inherits references to the shell's standard streams. Unless you prevent this by redirecting these, or the process closes them itself (daemons will generally do this), the existence of the new process will cause sshd to wait indefinitely, since it will never see eof on the pipe connecting it to the (now defunct) shell process — because that pipe also connects it to your background process.
This design choice has changed over time. Early versions of OpenSSH behaved as described here. For some time, it was changed to exit immediately upon exit of the user program; then, it was changed back when the possibility of data loss was discovered.
Race Condition Details
As an example, let's take the simple case of:
ssh server cat foo.txt
This should result in the entire contents of the file foo.txt coming back to the client — but in fact, it may not. Consider the following sequence of events:
- The SSH connection is set up; sshd starts the target account's shell as shell -c "cat foo.txt" in a child process, reading the shell's stdout and sending the data over the SSH connection. sshd is waiting for the shell to exit.
- The shell, in turn, starts cat foo.txt in a child process, and waits for it to exit. The file data from foo.txt which cat write to its stdout, however, does not pass through the shell process on its way to sshd. cat inherits its stdout file descriptor (fd) from it parent process, the shell — that fd is a direct reference to the pipe connecting the shell's stdout to sshd.
- cat writes the last chunk of data from foo.txt, and exits; the data is passed to the kernel via the write system call, and is waiting in the pipe buffer to be read by sshd. The shell, which was waiting on the cat process, exits, and then sshd in turn exits, closing the SSH connection. However, there is a race condition here: through the vagaries of process scheduling, it is possible that sshd will receive and act on the SIGCHLD notifying it of the shell's exit, before it reads the last chunk of data from the pipe. If so, then it misses that data.
This sequence of events can, for example, cause file truncation when using scp.