By Michael Kerrisk
March 20, 2013
An exploit posted on March 13
revealed a rather easily exploitable security vulnerability (CVE 2013-1858)
in the implementation of user namespaces. That exploit enables an
unprivileged user to escalate to full root privileges. Although a fix was
quickly provided, it is nevertheless instructive to look in some detail at
the vulnerability, both to better understand the nature of this kind of
exploit and also to briefly consider how this vulnerability came to appear
inside the user namespaces implementation. General background on user
namespaces can be found in parts 5 and part
6 of our recent series of
articles on namespaces.
Overview
The vulnerability was discovered by Sebastian Krahmer, who posted
proof-of-concept code
demonstrating the exploit on the oss-security mailing list.
The exploit is based on the fact that
Linux 3.8 allows the following combination of flags when calling
clone() (and also unshare() and setns()):
clone(... CLONE_NEWUSER | CLONE_FS, ...);
CLONE_NEWUSER says that the new child should be in
a new user namespace, and with the completion of the user namespaces
implementation in Linux 3.8, that flag can now be employed by unprivileged
processes. Within the new namespace, the child has a full set of capabilities,
although it has no capabilities in the parent namespace.
The CLONE_FS flag says that the caller of clone() and
the resulting child should share certain filesystem-related attributes—root
directory, current working directory, and file mode creation mask
(umask). The attribute of particular interest here is the root directory,
which a privileged process can change using the chroot() system
call.
It is the mismatch between the scope of these two flags that creates
the window for the exploit. On the one hand, CLONE_FS causes the
parent and child process to share the root directory attribute. On the
other hand, CLONE_NEWUSER puts the two processes into separate
user namespaces, and gives the child full capabilities in the new user
namespace. Those capabilities include CAP_SYSCHROOT, which gives a
process the ability to call chroot(); the sharing provided by
CLONE_FS means that the child can change the root directory of a
process in another user namespace.
In broad strokes, the exploit achieves escalation to root privileges by
executing any set-user-ID-root program that is present on the system in a
chroot environment which
is engineered to execute attacker-controlled code. That code runs with user
ID 0 and allows the exploit to fire up a shell with root privileges. The
exploit as demonstrated is accomplished by subverting the dynamic linking
mechanism, although other lines of attack based on the same foundation are
also possible.
The vulnerability scenario
The first part of understanding the exploit requires some understanding
of the operation of the dynamic linker. Most executables (including most
set-user-ID root programs) on a Linux system employ shared libraries and
dynamic linking.
At run time, the dynamic linker loads the required shared libraries in
preparation for running the program. The pathname of the dynamic linker is
embedded in the executable file's ELF headers, and is listed among the
other dependencies of a dynamically linked executable when we use the
ldd command (here executed on an x86-64 system):
$ ldd /bin/ls | grep ld-linux
/lib64/ld-linux-x86-64.so.2 (0x00000035b1800000)
There are a few important points to note about the dynamic linker. First, it
is run before the application program. Second, it is run under whatever
credentials would be accorded to the application program; thus, for
example, if a set-user-ID-root program is being executed, the dynamic
linker will run with an effective user ID of root.
Executable files are normally protected so that they can't be modified
by users other than the file owner; this prevents, for example,
unprivileged users from modifying the dynamic linker path embedded inside a
set-user-ID-root binary. For similar reasons, an unprivileged user can't
change the contents of the dynamic linker binary.
However, suppose for a moment that an unprivileged user could construct a
chroot tree containing (via a hard link) the set-user-ID-root binary and
an executable of the user's own choosing at
/lib64/ld-linux-x86-64.so.2. Running the set-user-ID-root binary
would then cause control first to be passed to the user's own code, which
would be running as root. The aim of the exploit is to bring about the
situation shown in the following diagram, where pathnames are shown linked
to various binary files:
The key point in the above diagram is that two pathnames link to the
fusermount binary (a set-user-ID-root program used for mounting
and unmounting FUSE
filesystems). If a process outside the chroot environment executes the
/bin/fusermount binary, then the real dynamic linker will be
invoked to load the binary's shared libraries. On the other hand, if a
process inside the chroot environment executes the other link to the binary
(/suid-root), then the kernel will load the ELF interpreter
pointed to by the link /lib64/ld-linux-x86-64.so.2 inside the
chroot environment. That link points to code supplied by an attacker, and
will be run with root privileges.
How does the Linux 3.8 user namespaces implementation help with this
attack? First, an unprivileged user can create a new user namespace in which
they gain full privileges, including the ability to create a chroot
environment using chroot(). Second, the differing scope of
CLONE_NEWUSER and CLONE_FS described above means that
the privileged process inside a new user namespace can construct a chroot
environment that applies to a process outside the user namespace. If that
process can in turn then be made to execute a set-user-ID binary inside
the chroot environment, then the attacker code will be run as root.
A three-phase attack
Although Sebastian's program is quite short, there are many details
involved that make the exploit somewhat challenging to understand;
furthermore, the program is written with the goal of accomplishing the
exploit, rather than educating the reader on how the exploit is carried
out. Therefore, we'll provide an equivalent program, userns_exploit.c, that performs the
same attack—this program is structured in a more understandable way
and is instrumented with output statements that enable the user to see what
is going on. We won't walk though the code of the program, but it is well
commented and should be easy to follow using the explanations in this article.
The attack code involves the creation of three processes, which we'll
label "parent", "child", and "grandchild". The attack is conducted in
three phases; in each phase, a separate instance of the attacker code is
executed. This concept can at first be difficult to grasp when reading the
code. It's easiest to think of the userns_exploit program as
simply offering itself in three flavors, with the choice being determined
by command-line arguments and the effective user ID of the process.
The following diagram shows the exploit in overview:
In the above diagram, the vertical dashed lines indicate points where a
process is blocked waiting for another process to complete some action.
In the first phase of the exploit, the program starts by discovering its
own pathname. This is done by reading the contents of the
/proc/self/exe symbolic link.
The program needs to know its own pathname for two
reasons: so it can create a link to itself inside the chroot tree and so it
can re-execute itself later.
The program then creates two processes, labeled "parent" and "child"
in the above diagram. The parent's task is simple. It will loop, using the
stat() system call to check whether the program pathname
discovered in the previous step is owned by root and has the
set-user-ID permission bit enabled. This causes the parent to wait until
the other processes have finished their tasks.
In the meantime, the "child" populates the directory tree that will be used
as the chroot environment. The goal is to create the set-up shown in the
following diagram:
The difference from the first diagram is that we now see that it is the
userns_exploit program that will be used as the fake dynamic
loader inside the chroot environment. Furthermore, that binary is also
linked outside the chroot environment, and the exploit design takes advantage of
that fact.
Having created the chroot tree shown above, the child then employs
clone(CLONE_NEWUSER|CLONE_FS) to create a new process—the
grandchild. The grandchild has a full set of capabilities, which allows it
to call chroot() to place itself into the chroot tree. Because the
grandchild and the child share the root directory attribute, the child is
now also placed in the chroot environment.
Its small task complete, the grandchild now terminates. At that point,
the child, which has been waiting on the grandchild, now
resumes. As its next step, the child executes the program at the path
/suid-root. This is in fact a link to the fusermount
binary. Because the child is in the initial user namespace and the
fusermount binary is set-user-ID-root, the child gains root
privileges.
However, before the fusermount binary is loaded, the kernel
first loads its ELF interpreter, the file at the path
/lib64/ld-linux-x86-64.so.2. That, as it happens, is actually the
userns_exploit program. Thus, the userns_exploit program
is now executed for a second time (and the fusermount program is
never executed).
The second phase of the exploit has now begun. This instance of the
userns_exploit program recognizes that it has an effective user ID
of 0. However, the only files it can access are those inside the chroot
environment. But that is sufficient. The child can now change the ownership
of the file /lib64/ld-linux-x86-64.so.2 and turn on the file's
set-user-ID permission bit. That pathname is, of course, a link to the
userns_exploit binary. At this point, the child's work is now
complete, and it terminates.
All of this time, the parent process has been sitting in the background
waiting for the userns_exploit binary to become a set-user-ID-root
program. That, of course, is what the child has just accomplished. So, at
this point, the parent now executes the userns_exploit program
outside the chroot environment. On this execution, the program is
supplied with a command-line argument.
The third and final phase of the exploit has now started. The
userns_exploit program determines that it has an effective user ID
of 0 and notes that it has a command-line argument. That latter fact
distinguishes this case from the second execution of the
userns_exploit and is a signal that this time the program is being
executed outside the chroot environment. All that the program now
needs to do is execute a shell; that shell will provide the user with full
root privileges on the system.
Further requirements for a successful exploit
There are a few other steps that are necessary to successfully
accomplish the exploit. The userns_exploit program must be
statically linked. This is necessary so that, when executed as the dynamic linker
inside the chroot environment, the userns_exploit program does not
itself require a dynamic linker.
In addition, the value in the /proc/sys/fs/protected_hardlinks
file must zero. The protected_hardlinks file was a feature that
was added in Linux 3.6 specifically to prevent
the types of exploit discussed in this article. If this file has the
value one, then only the owner of a file can create hard links to it. On a
vanilla kernel, protected_hardlinks unfortunately has the default
value zero, although some distributions provide kernels that change this
default.
In the process of exploring this vulnerability, your editor
discovered that set-user-ID binaries built as hardened,
position-independent executables (PIE) cannot be used for this particular
attack. (Many of the set-user-ID-root binaries on his Fedora system were
hardened in this manner.) While PIE hardening thwarts this particular line of
attack, the chroot() technique described here can still be used to
exploit a set-user-ID-root binary in other ways. For example, the
binary can be placed in a suitably constructed chroot environment
that contains the genuine dynamic linker but a compromised libc.
Finally, user namespaces must of course be enabled on the system where
this exploit is to be tested, and the kernel version needs to be precisely
3.8. Earlier kernel versions did not allow unprivileged users to create
user namespaces, and later kernels will fix this bug, as described
below. The exploit is unlikely to be possible with distributor kernels:
because the Linux 3.8 kernel does not support the use of user namespaces
with various filesystems, including NFS and XFS, distributors are
unlikely to enable user namespaces in the kernels that they ship.
The fix
Once the problem was reported, Eric
Biederman considered two possible
solutions. The more complex solution is to create an association from a
process's fs_struct, the kernel data structure that records the
process's root directory, to a user namespace, and use that association to
set limitations around the use of chroot() in scenarios such as
the one described in this article. The alternative is the simple and
obviously safe solution: disallow the combination of CLONE_NEWUSER
and CLONE_FS in the clone() system call, make
CLONE_NEWUSER automatically imply CLONE_FS in the
unshare() system call, and disallow the use of setns() to
change a process's user namespace if the process is sharing
CLONE_FS-related attributes with another process.
Subsequently, Eric concluded
that the complex solution seemed to be unnecessary and would add a small
overhead to every call to fork(). He thus opted for the simple
solution: the Linux 3.9 kernel (and the 3.8.3 stable kernel) will disallow
the combination of CLONE_NEWUSER and CLONE_FS.
User namespaces and security
As we noted in an earlier
article, Eric Biederman has put a lot of work into trying to ensure
that unprivileged can create user namespaces without causing security
vulnerabilities. Nevertheless, a significant exploit was found soon after
the release of the first kernel version that allowed unprivileged processes
to create user namespaces. Another user namespace vulnerability that
potentially allowed unprivileged users to load arbitrary kernel modules was
also reported and fixed earlier this month. In addition, during
the discussion of the CLONE_NEWUSER|CLONE_FS issue,
Andy Lutomirski has hinted that there may
be another user namespaces vulnerability to be fixed.
Why is it that several security vulnerabilities have sprung from the
user namespaces implementation? The fundamental problem seems to be that
user namespaces and their interactions with other parts of the kernel are
rather complex—probably too complex for the few kernel developers
with a close interest to consider all of the possible security
implications. In addition, by making new functionality available to
unprivileged users, user namespaces expand the attack surface of the
kernel. Thus, it seems that as user namespaces come to be more widely
deployed, other security bugs such as these are likely to be
found. One hopes that they'll be found and fixed by the kernel developers
and white hat security experts, rather than found and exploited by black
hat attackers.
Updated on 22 February 2013 to clarify and correct some minor details of the
"simple and safe" solution under the heading, "The fix".
(
Log in to post comments)