« Back to the top page
IDG News Service

Experimental container support for 2.6.24

IDG News Service, The Industry Standard11.07.2007
Tags
Comments 0
Like the story? Get Alerts of big news events. Enter your email address

By Jonathan Corbet, IDG News Service

"Containers" are a form of lightweight virtualization as represented by projects like OpenVZ. While virtualization creates a new virtual machine upon which the guest system runs, containers implementations work by making walls around groups of processes. The result is that, while virtualized guests each run their own kernel (and can run different operating systems than the host), containerized systems all run on the host's kernel. So containers lack some of the flexibility of full virtualization, but they tend to be quite a bit more efficient.

As of 2.6.23, virtualization is quite well supported on Linux, at least for the x86 architecture. Containers lag a little behind, instead. It turns out that, in many ways, containers are harder to implement than virtualization is. A container implementation must wrap a namespace layer around every global resource found in the kernel, and there are a lot of these resources: processes, filesystems, devices, firewall rules, even the system time. Finding ways to wrap all of these resources in a way which satisfies the needs of the various container projects out there, and which also does not irritate kernel developers who may have no interest in containers, has been a bit of a challenge.

Full container support will get quite a bit closer once the 2.6.24 kernel is released. The merger of a number of important patches in this development cycle fills in some important pieces, though a certain amount of work remains to be done.

Once upon a time, there was a patch set called process containers. The containers subsystem allows an administrator (or administrative daemon) to group processes into hierarchies of containers; each hierarchy is managed by one or more "subsystems." The original "containers" name was considered to be too generic - this code is an important part of a container solution, but it's far from the whole thing. So containers have now been renamed "control groups" (or "cgroups") and merged for 2.6.24.

Control groups need not be used for containers; for example, the group scheduling feature (also merged for 2.6.24) uses control groups to set the scheduling boundaries. But it makes sense to pair control groups with the management of the various namespaces and resource management in general to create a framework for a containers implementation.

The management of control groups is straightforward. The system administrator starts by mounting a special cgroup filesystem, associating the subsystems of interest with the filesystem at mount time. There can be more than one such filesystem mounted, as long as each subsystem appears on at most one control group. So the administrator could create one cgroup filesystem to manage scheduling and a completely different one to associate processes with namespaces.

Once the filesystem is mounted, specific groups are created by making directories within the cgroup filesystem. Putting a process into a control group is a simple matter of writing its process ID into the tasks virtual file in the cgroup directory. Processes can be moved between control groups at will.

The concept of a process ID has gotten more complicated, though, since the PID namespace code was also merged. A PID namespace is a view of the processes on the system. On a "normal" Linux system, there is only the global PID namespace, and all processes can be found there. On a system with PID namespaces, different processes can have very different views of what is running on the system. When a new PID namespace is created, the only visible process is the one which created that namespace; it becomes, in essence, the init process for that namespace. Any descendants of that process will be visible in the new namespace, but they will never be able to see anything running outside of that namespace.

Virtualizing process IDs in this way complicates a number of things. A process which creates a namespace remains visible to its parent in the old namespace - and it may not have the same process


Post new comment

The content of this field is kept private and will not be shown publicly.
Respectful debate is welcome, but comments that are defamatory, indecent, abusive, or in violation of any law will be removed.