Anatomy of a user namespaces vulnerability [LWN.net]

By Michael Kerrisk
March 20, 2013

An exploit posted on March 13 revealed a rather easily exploitable security vulnerability (CVE 2013-1858) in the implementation of user namespaces. That exploit enables an unprivileged user to escalate to full root privileges. Although a fix was quickly provided, it is nevertheless instructive to look in some detail at the vulnerability, both to better understand the nature of this kind of exploit and also to briefly consider how this vulnerability came to appear inside the user namespaces implementation. General background on user namespaces can be found in parts 5 and part 6 of our recent series of articles on namespaces.

Overview

The vulnerability was discovered by Sebastian Krahmer, who posted proof-of-concept code demonstrating the exploit on the oss-security mailing list. The exploit is based on the fact that Linux 3.8 allows the following combination of flags when calling clone() (and also unshare() and setns()):

    clone(... CLONE_NEWUSER | CLONE_FS, ...);

CLONE_NEWUSER says that the new child should be in a new user namespace, and with the completion of the user namespaces implementation in Linux 3.8, that flag can now be employed by unprivileged processes. Within the new namespace, the child has a full set of capabilities, although it has no capabilities in the parent namespace.

The CLONE_FS flag says that the caller of clone() and the resulting child should share certain filesystem-related attributes—root directory, current working directory, and file mode creation mask (umask). The attribute of particular interest here is the root directory, which a privileged process can change using the chroot() system call.

It is the mismatch between the scope of these two flags that creates the window for the exploit. On the one hand, CLONE_FS causes the parent and child process to share the root directory attribute. On the other hand, CLONE_NEWUSER puts the two processes into separate user namespaces, and gives the child full capabilities in the new user namespace. Those capabilities include CAP_SYSCHROOT, which gives a process the ability to call chroot(); the sharing provided by CLONE_FS means that the child can change the root directory of a process in another user namespace.

In broad strokes, the exploit achieves escalation to root privileges by executing any set-user-ID-root program that is present on the system in a chroot environment which is engineered to execute attacker-controlled code. That code runs with user ID 0 and allows the exploit to fire up a shell with root privileges. The exploit as demonstrated is accomplished by subverting the dynamic linking mechanism, although other lines of attack based on the same foundation are also possible.

The vulnerability scenario

The first part of understanding the exploit requires some understanding of the operation of the dynamic linker. Most executables (including most set-user-ID root programs) on a Linux system employ shared libraries and dynamic linking. At run time, the dynamic linker loads the required shared libraries in preparation for running the program. The pathname of the dynamic linker is embedded in the executable file's ELF headers, and is listed among the other dependencies of a dynamically linked executable when we use the ldd command (here executed on an x86-64 system):

    $ ldd /bin/ls | grep ld-linux
            /lib64/ld-linux-x86-64.so.2 (0x00000035b1800000)

There are a few important points to note about the dynamic linker. First, it is run before the application program. Second, it is run under whatever credentials would be accorded to the application program; thus, for example, if a set-user-ID-root program is being executed, the dynamic linker will run with an effective user ID of root.

Executable files are normally protected so that they can't be modified by users other than the file owner; this prevents, for example, unprivileged users from modifying the dynamic linker path embedded inside a set-user-ID-root binary. For similar reasons, an unprivileged user can't change the contents of the dynamic linker binary.

However, suppose for a moment that an unprivileged user could construct a chroot tree containing (via a hard link) the set-user-ID-root binary and an executable of the user's own choosing at /lib64/ld-linux-x86-64.so.2. Running the set-user-ID-root binary would then cause control first to be passed to the user's own code, which would be running as root. The aim of the exploit is to bring about the situation shown in the following diagram, where pathnames are shown linked to various binary files:

The key point in the above diagram is that two pathnames link to the fusermount binary (a set-user-ID-root program used for mounting and unmounting FUSE filesystems). If a process outside the chroot environment executes the /bin/fusermount binary, then the real dynamic linker will be invoked to load the binary's shared libraries. On the other hand, if a process inside the chroot environment executes the other link to the binary (/suid-root), then the kernel will load the ELF interpreter pointed to by the link /lib64/ld-linux-x86-64.so.2 inside the chroot environment. That link points to code supplied by an attacker, and will be run with root privileges.

How does the Linux 3.8 user namespaces implementation help with this attack? First, an unprivileged user can create a new user namespace in which they gain full privileges, including the ability to create a chroot environment using chroot(). Second, the differing scope of CLONE_NEWUSER and CLONE_FS described above means that the privileged process inside a new user namespace can construct a chroot environment that applies to a process outside the user namespace. If that process can in turn then be made to execute a set-user-ID binary inside the chroot environment, then the attacker code will be run as root.

A three-phase attack

Although Sebastian's program is quite short, there are many details involved that make the exploit somewhat challenging to understand; furthermore, the program is written with the goal of accomplishing the exploit, rather than educating the reader on how the exploit is carried out. Therefore, we'll provide an equivalent program, userns_exploit.c, that performs the same attack—this program is structured in a more understandable way and is instrumented with output statements that enable the user to see what is going on. We won't walk though the code of the program, but it is well commented and should be easy to follow using the explanations in this article.

The attack code involves the creation of three processes, which we'll label "parent", "child", and "grandchild". The attack is conducted in three phases; in each phase, a separate instance of the attacker code is executed. This concept can at first be difficult to grasp when reading the code. It's easiest to think of the userns_exploit program as simply offering itself in three flavors, with the choice being determined by command-line arguments and the effective user ID of the process.

The following diagram shows the exploit in overview:

In the above diagram, the vertical dashed lines indicate points where a process is blocked waiting for another process to complete some action.

In the first phase of the exploit, the program starts by discovering its own pathname. This is done by reading the contents of the /proc/self/exe symbolic link. The program needs to know its own pathname for two reasons: so it can create a link to itself inside the chroot tree and so it can re-execute itself later.

The program then creates two processes, labeled "parent" and "child" in the above diagram. The parent's task is simple. It will loop, using the stat() system call to check whether the program pathname discovered in the previous step is owned by root and has the set-user-ID permission bit enabled. This causes the parent to wait until the other processes have finished their tasks.

In the meantime, the "child" populates the directory tree that will be used as the chroot environment. The goal is to create the set-up shown in the following diagram:

The difference from the first diagram is that we now see that it is the userns_exploit program that will be used as the fake dynamic loader inside the chroot environment. Furthermore, that binary is also linked outside the chroot environment, and the exploit design takes advantage of that fact.

Having created the chroot tree shown above, the child then employs clone(CLONE_NEWUSER|CLONE_FS) to create a new process—the grandchild. The grandchild has a full set of capabilities, which allows it to call chroot() to place itself into the chroot tree. Because the grandchild and the child share the root directory attribute, the child is now also placed in the chroot environment.

Its small task complete, the grandchild now terminates. At that point, the child, which has been waiting on the grandchild, now resumes. As its next step, the child executes the program at the path /suid-root. This is in fact a link to the fusermount binary. Because the child is in the initial user namespace and the fusermount binary is set-user-ID-root, the child gains root privileges.

However, before the fusermount binary is loaded, the kernel first loads its ELF interpreter, the file at the path /lib64/ld-linux-x86-64.so.2. That, as it happens, is actually the userns_exploit program. Thus, the userns_exploit program is now executed for a second time (and the fusermount program is never executed).

The second phase of the exploit has now begun. This instance of the userns_exploit program recognizes that it has an effective user ID of 0. However, the only files it can access are those inside the chroot environment. But that is sufficient. The child can now change the ownership of the file /lib64/ld-linux-x86-64.so.2 and turn on the file's set-user-ID permission bit. That pathname is, of course, a link to the userns_exploit binary. At this point, the child's work is now complete, and it terminates.

All of this time, the parent process has been sitting in the background waiting for the userns_exploit binary to become a set-user-ID-root program. That, of course, is what the child has just accomplished. So, at this point, the parent now executes the userns_exploit program outside the chroot environment. On this execution, the program is supplied with a command-line argument.

The third and final phase of the exploit has now started. The userns_exploit program determines that it has an effective user ID of 0 and notes that it has a command-line argument. That latter fact distinguishes this case from the second execution of the userns_exploit and is a signal that this time the program is being executed outside the chroot environment. All that the program now needs to do is execute a shell; that shell will provide the user with full root privileges on the system.

Further requirements for a successful exploit

There are a few other steps that are necessary to successfully accomplish the exploit. The userns_exploit program must be statically linked. This is necessary so that, when executed as the dynamic linker inside the chroot environment, the userns_exploit program does not itself require a dynamic linker.

In addition, the value in the /proc/sys/fs/protected_hardlinks file must zero. The protected_hardlinks file was a feature that was added in Linux 3.6 specifically to prevent the types of exploit discussed in this article. If this file has the value one, then only the owner of a file can create hard links to it. On a vanilla kernel, protected_hardlinks unfortunately has the default value zero, although some distributions provide kernels that change this default.

In the process of exploring this vulnerability, your editor discovered that set-user-ID binaries built as hardened, position-independent executables (PIE) cannot be used for this particular attack. (Many of the set-user-ID-root binaries on his Fedora system were hardened in this manner.) While PIE hardening thwarts this particular line of attack, the chroot() technique described here can still be used to exploit a set-user-ID-root binary in other ways. For example, the binary can be placed in a suitably constructed chroot environment that contains the genuine dynamic linker but a compromised libc.

Finally, user namespaces must of course be enabled on the system where this exploit is to be tested, and the kernel version needs to be precisely 3.8. Earlier kernel versions did not allow unprivileged users to create user namespaces, and later kernels will fix this bug, as described below. The exploit is unlikely to be possible with distributor kernels: because the Linux 3.8 kernel does not support the use of user namespaces with various filesystems, including NFS and XFS, distributors are unlikely to enable user namespaces in the kernels that they ship.

The fix

Once the problem was reported, Eric Biederman considered two possible solutions. The more complex solution is to create an association from a process's fs_struct, the kernel data structure that records the process's root directory, to a user namespace, and use that association to set limitations around the use of chroot() in scenarios such as the one described in this article. The alternative is the simple and obviously safe solution: disallow the combination of CLONE_NEWUSER and CLONE_FS in the clone() system call, make CLONE_NEWUSER automatically imply CLONE_FS in the unshare() system call, and disallow the use of setns() to change a process's user namespace if the process is sharing CLONE_FS-related attributes with another process.

Subsequently, Eric concluded that the complex solution seemed to be unnecessary and would add a small overhead to every call to fork(). He thus opted for the simple solution: the Linux 3.9 kernel (and the 3.8.3 stable kernel) will disallow the combination of CLONE_NEWUSER and CLONE_FS.

User namespaces and security

As we noted in an earlier article, Eric Biederman has put a lot of work into trying to ensure that unprivileged can create user namespaces without causing security vulnerabilities. Nevertheless, a significant exploit was found soon after the release of the first kernel version that allowed unprivileged processes to create user namespaces. Another user namespace vulnerability that potentially allowed unprivileged users to load arbitrary kernel modules was also reported and fixed earlier this month. In addition, during the discussion of the CLONE_NEWUSER|CLONE_FS issue, Andy Lutomirski has hinted that there may be another user namespaces vulnerability to be fixed.

Why is it that several security vulnerabilities have sprung from the user namespaces implementation? The fundamental problem seems to be that user namespaces and their interactions with other parts of the kernel are rather complex—probably too complex for the few kernel developers with a close interest to consider all of the possible security implications. In addition, by making new functionality available to unprivileged users, user namespaces expand the attack surface of the kernel. Thus, it seems that as user namespaces come to be more widely deployed, other security bugs such as these are likely to be found. One hopes that they'll be found and fixed by the kernel developers and white hat security experts, rather than found and exploited by black hat attackers.

Updated on 22 February 2013 to clarify and correct some minor details of the "simple and safe" solution under the heading, "The fix".

(Log in to post comments)

Complexity

Posted Mar 21, 2013 5:33 UTC (Thu) by dgc (subscriber, #6611) [Link]

Yup, consider that there are some filesystem APIs that allow root to have r/w access to all inodes and their attributes in a filesystem because they bypass the filesystem namespace altogether...

-Dave.

Complexity

Posted Mar 21, 2013 7:42 UTC (Thu) by lkundrak (subscriber, #43452) [Link]

I'm curious, which ones they are?

Complexity

Posted Mar 21, 2013 14:36 UTC (Thu) by dpquigl (subscriber, #52852) [Link]

I'm actually confused as to what his statement is to begin with. Root by virtue of having privileged access can do whatever it wants to any file assuming you don't bring capabilities or other access controls into the picture. Saying root has access to read/write to any inode or change any attributes is a vacuous statement since root can open any file in the filesystem read/write to begin with by virtue of being root. You don't need special APIs for that you just use open. Maybe he's talking about debug file systems or tools that are available for certain file systems like XFS that let you manipulate the inodes of a filesystem directly?

Complexity

Posted Mar 21, 2013 16:52 UTC (Thu) by butlerm (subscriber, #13312) [Link]

> Root by virtue of having privileged access can do whatever it wants to any file

Isn't "root" now an ambiguous term? Don't we now have local root and global or system root? We certainly don't want local root to have privileges to do things like open arbitrary files by inode number. For filesystems the local root mounted or owns perhaps, but certainly not with regard to filesystems mounted by system root or other local root users.

Unless the idea is to adopt the convention that "root" always refers to system root, and never to local root without further qualification, any such reference is likely to lead to some considerable degree of confusion. This thread is a perfect example.

Complexity

Posted Mar 21, 2013 21:05 UTC (Thu) by dgc (subscriber, #6611) [Link]

> Maybe he's talking about debug file systems or tools that are available
> for certain file systems like XFS that let you manipulate the inodes
> of a filesystem directly?

File handles are the problem. And when combined with interfaces like bulkstat, you've got a capability to find, open and *invisibly modify* any file in the filesystem regardless of namespace restrictions...

http://oss.sgi.com/archives/xfs/2013-03/msg00382.html

-Dave

Complexity

Posted Mar 21, 2013 10:48 UTC (Thu) by Tobu (subscriber, #24111) [Link]

chroot doesn't reduce the attack surface much because there's still the whole kernel, but I wouldn't call it a loophole. It's just a user namespaces forerunner, which should be combined with seccomp or similar if one wants a strong security boundary. suid on the other hand is an attractive nuisance: a simple design, but every privileged process that uses it has to be paranoid about its entire environment.

Complexity

Posted Mar 21, 2013 22:49 UTC (Thu) by wahern (subscriber, #37304) [Link]

chroot does reduce the attack surface, considerably.

No more setuid issues.

No more /tmp race conditions.

No more /dev.

No more /proc.

No more /sys.

No more playing with named pipes or unix domain sockets owned by privileged processes.

I'll never understand the attitude of "chroot isn't enough; let's instead add a layer of incredibly complex policy, and tens of thousands of lines of new code to the kernel". Yeah.. that's much better....

It wasn't but a few years ago that one could confidently say that Linux shook the bugs out of simple stuff like file permissions, including setuid linker issues, and run-of-the-mill data races. Now we're adding a whole new set of incredibly complex subsystems and interfaces, and _willingly_ putting everybody through the grinder all over again.

Complexity

Posted Mar 21, 2013 22:58 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

the problem with chroot has been that it's not perfect.

Root can mount things inside the chroot, create device files, etc and so it's possible for someone to escape out of a chroot after they become root.

I've never bought into the 'this makes chroot worthless' mantra, it may only slow an attacker, but slowing an attacker can still be valuable.

If these namespaces could only be setup by root, we would not really be any worse off, but since people are so fascinated by the "my admin won't let me do X, so I'm going to figure out a way to do it anyway" problem that they are giving too much power to non-root users.

If you admin doesn't want to let you do something, go use a different box (including one where you are the admin), don't engineer a way around the admin's restrictions.

Complexity

Posted Mar 22, 2013 8:48 UTC (Fri) by jezuch (subscriber, #52988) [Link]

> the problem with chroot has been that it's not perfect.

The problem with chroot, as I was told, is that it is not and has never been a security mechanism.

Complexity

Posted Mar 22, 2013 18:41 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

> The problem with chroot, as I was told, is that it is not and has never been a security mechanism.

It depends on how you define 'security mechanism'

chroot has always provided security in that processes in a chroot in that it prevented that process from accessing any files outside that chroot.

This doesn't mean that this security couldn't be bypassed (if you could get root inside the chroot), but if you did not have root in the chroot, it helped.

for example, if a server had a vulerability that allowed it to access arbitrary files on the filesystem, putting it in a chroot can be very useful.

Complexity

Posted Mar 25, 2013 9:42 UTC (Mon) by talex (subscriber, #19139) [Link]

> Root can mount things inside the chroot, create device files, etc and so it's possible for someone to escape out of a chroot after they become root.

As I understand it: with user namespaces, *anyone* can escape from a chroot. At least, that seemed to be the case when I tested it (I was experimenting with using namespaces to sandbox some aspects of 0install:
http://thread.gmane.org/gmane.comp.file-systems.zero-inst... )

> If these namespaces could only be setup by root, we would not really be any worse off, but since people are so fascinated by the "my admin won't let me do X, so I'm going to figure out a way to do it anyway" problem that they are giving too much power to non-root users.

The problem with that (only making security features available to root) is that then prorgammers can't use them. For example, 0install needs to unpack archives it downloads. Since tar may contain bugs, we'd like to run tar in a restricted environment (e.g. a chroot where /home doesn't exist). If that requires root, then 0install itself has to be setuid, which is not good.

Complexity

Posted Mar 25, 2013 10:17 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

why should you be able to install new software on the system without the permission of the admin of that system?

if 0install needs to run tar as root to install it's applications, and you don't want to trust tar as root, then you shouldn't trust it. untar the files as the user and then change their permissions afterwords.

And if you think that users should be able to change the ownership of files to be other users without requiring some sort of privilege, you just don't understand the concepts.

Namespaces makes it possible to escape from a chroot, because they let the user become root inside a changeroot.

But namespaces are intended to replace chroot, so you would not be likely to use chroot and namespaces together.

now, once distros start enabling all these namespaces by default, they end up weakening the security of anything that's using chroot, but if a distro is doing that, the distro should be changing the programs to be locked down via namespaces limitations instead

nobody should be using Fedora in production, it's bleeding edge, and exposing this sort of security problem where namespaces interact badly with each other and with other features is exactly the sort of bleeding that such a distro produces.

Complexity

Posted Mar 25, 2013 11:00 UTC (Mon) by talex (subscriber, #19139) [Link]

> why should you be able to install new software on the system without the permission of the admin of that system?

That's just the way Linux works. Any user can cause executable files to be written to their home directory, and can then run them. But, like most people, I am the admin of my computer, so I don't need to ask anyone's permission to install software.

> if 0install needs to run tar as root to install it's applications, and you don't want to trust tar as root, then you shouldn't trust it. untar the files as the user and then change their permissions afterwords.

0install doesn't run tar as root. It runs it as my normal user. But that's still more privileges that I'd like to give it.

For example, let's say I'm installing OpenTTD. Currently, 0install downloads the archive as my user (talex), unpacks it, verifies it, and runs it. OpenTTD does not gain root privileges on my system, but it does run with my user privileges. I'd like to restrict it further so that, for example, it can't read or write to my home directory (or anywhere except it's own data directory).

> But namespaces are intended to replace chroot, so you would not be likely to use chroot and namespaces together.

So, what is the replacement for chroot in the new namespaces world then? Should I unmount all existing filesystems and mount something new over the real root? I'm not sure how to do that.

Complexity

Posted Mar 25, 2013 18:46 UTC (Mon) by luto (subscriber, #39314) [Link]

Chrooting to an empty, unwritable directory, closing fds and dropping privileges denies useful filesystem access. A kernel that suddenly changes that is not okay and should be fixed. (And that's one of the bugs I found. Guess I might as well make the whole thing public.)