What Linux bind mounts are really doing
Lots of Unixes have some form of 'loopback' mounts, where you can mount a bit of an existing filesystem somewhere else; they're called loopback mounts by analogy with the loopback interface.
The general idea behind them is that they are a more efficient (and easier to use) version of doing an NFS mount from localhost
.
Linux's bind mounts (so called because they are done with mount --bind
, or by specifying bind
as the filesystem type in /etc/fstab
) look like any other sort of loopback mounting. However, they actually operate in a way quite different from the usual idea of loopback mounting, and the difference has some important consequences.
What bind mounts are really doing is more or less mounting the filesystem again with a different inode as the root inode. Thus, if you do:
mount /dev/md1 /foo mount --bind /foo/bar /bar
what you really have is /dev/md1 mounted twice, once with the root inode of the filesystem on md1 as the root of the mount point, and once with the inode for 'bar' in the root of the filesystem on md1 as the root of the mount point.
The mount
command makes this hard to see by being misleading in its output, reporting things like'/data/home on /home type none (rw,bind)
'.
Because they use /etc/mtab
, which mount
maintains, things like df
also report like this. More of the real state of affairs is visible in /proc/mounts
, where the kernel itself reports:
/dev/md5 /data ext3 rw,data=ordered 0 0 /dev/md5 /home ext3 rw,data=ordered 0 0
Unfortunately the kernel doesn't report that what root inode /home
is mounted with, which generally makes mount
's output more useful once you know what is really going on.
One consequence of this is that once you've set up your bind mounts, you can unmount the original mount point, something which I believe is not true of things like Solaris's loopback mounts (and which definitely wouldn't be true of NFS mounts from localhost). There might be a use for this in obscure situations.
Sidebar: Deeper under the hood
Disclaimer: I am not sure I understand this correctly.
Under the hood, there are two things: actual mounts of filesystems from devices (or the network), and namespace-based views of such filesystems. Rather than create new copies of both, bind mounts create new views ('mounts' or 'vfsmounts') of the same underlying mounted filesystem.
This explains one limitation of bind mounts, which is that you can't change mount flags when you do a bind mount (so you can't have a bind mount that is a read-only version of part of a read-write filesystem). Currently, all mount flags are associated with the filesystem, not with the view, so all views have to have the same mount flags.