This page gives an overview of how compositors work in the Wayland architecture. It is intended as a reference, and not as an argument for or against using Wayland or Weston. It focuses on the graphics and composition side of Wayland, rather than the input handling side.
Wayland is a protocol specifying communication between a display server (or compositor) and its clients, which are individual applications. Unlike X11, it does not specify a set of rendering primitives, or even a canonical protocol for transferring pixel data between clients and compositor. The most important primitives which Wayland defines are surfaces and buffers. A surface is an object representing a rectangular area on the screen, defined by a location, size and pixel content; a buffer provides pixel data when attached to a surface. The interactions between surfaces and buffers allow for double-buffering and glitch-free updates of windows, but are not directly relevant to how composition works, so this document will simply use ‘surface’ to refer to the combination of the two.
In the Wayland architecture, the window manager and compositor are combined (and together known as simply the ‘compositor’), performing these two main tasks:
- Receive pixel data from clients and composite it to form frames which are outputted to the screen.
- Handle user input and direct it to the appropriate client to be handled.
As well as performing composition of client windows, the compositor performs window management from within the same process. Weston is the reference, and most developed, implementation of a Wayland compositor. Mutter is another implementation.
Although not necessarily the case for all compositors, Weston contains several components, which are ‘plugged together’ to support each platform.
- Shell: Provides the shell UI, such as a task bar and clock. Different shells can be plugged in to provide different UI experiences.
- Renderer: Implements a specific method of rendering surfaces. For example, Weston has a software renderer (pixman), a GLES renderer (DRI), and hardware-specific renderers (e.g. RPI for the DispmanX hardware on Raspberry Pi).
- Backend: Implements the platform-specific parts of the compositor, instantiating one or more renderers, and handling all composition of surfaces. For example, if a platform can use hardware-specific APIs, it instantiates the appropriate hardware-specific renderer; if it also needs GLES support, it instantiates the DRI renderer. It can then pass surfaces to the most appropriate renderer for the current frame.
Other compositors are typically arranged similarly; for example, Mutter has a plugin system which can be used to implement different shells; GNOME Shell is one example.
Some more general terminology:
- Display controller: Hardware which processes pixels and overlays (see Compositing) and drives the screen.
- GPU: Hardware which implements 3D acceleration and the GLES pipeline. Its output is fed into the display controller.
In the Wayland architecture, all rendering of client UIs is performed by client code, typically by the graphics toolkit the client uses. This is no different from modern usage of X11.
The graphics toolkit may use whatever method it wishes to render UI elements: software rendering on the CPU, or hardware rendering using GLES. All Wayland requires is for the resulting pixels to be sent to the compositor for each frame and window the client renders. Pixel data may be transported in several ways, depending on how it was rendered, and what is mutually supported by the clients and compositor:
- Shared memory buffers containing actual pixel data. These are a fallback mechanism supported if no others are.
- GPU buffer sharing (DRM/DRI). Clients render windows directly on the GPU, and the resulting pixel data remains in GPU memory and a handle to it is passed to the compositor. This prevents unnecessary and expensive copying of pixel data.
Once the compositor has all the pixel data (or handles to GPU buffers containing it), it can composite a frame. As with client-side rendering, this can be done in several ways:
- Software rendering. CPU-intensive and used as a fallback. This also entails pulling pixel data out of GPU memory, which is expensive.
- Full GPU rendering using GLES. This takes the pixel data and composites it on the GPU, potentially applying shaders and 3D transformations if required for animations.
- Hardware-specific APIs on the display controller. These are generally 2D composition APIs which are less resource intensive than full 3D computation, but still keep processing on the display controller rather than the CPU, and do not require extra copies of the pixel data.
Different compositors use different approaches. As Mutter is tied to Clutter, it is constrained to using GLES for all rendering. Conversely, Weston implements several different renderers, so can choose the most efficient method of rendering depending on the requirements of the current frame (for example, if an animation is underway, or if any effects are being applied). For typical UIs, this will mean using hardware-specific APIs to composite the pixel data in 2D, as 3D effects are rarely needed, even when performing simple animations such as slides and fades. If full GLES 3D support is needed, Weston can choose to use the full GPU capabilities instead.
This is supported in Weston by the use of planes (known on some hardware as ‘overlays’). Planes are collections of surfaces, and each plane maps to a different overlay in hardware. Before rendering each frame, the compositor backend can choose which surfaces to put on each plane, resulting in them being rendered differently by the hardware. There was a good introductory talk on planes in Weston given at FOSDEM 2013.
For example, on a traditional graphics card, there are perhaps four hard-coded overlays:
- Primary: main overlay
- Scanout: a single, full-screen surface
- Sprite: typically a video overlay in a different colour space
Within each plane, the display controller is used to composite all the surfaces to form an output frame for that plane, using the normal GLES pipeline. The hardware then has special support in the display controller for compositing the four planes to form the final output frame sent to the monitor. In this example, the planes are composited as a stack, with the cursor on top of the sprite, on top of the scanout, on top of the primary.
On more powerful embedded systems, the display controller (which is separate from the GLES pipeline) often has many more overlays, which are more general purpose. This means that Weston can assign fewer surfaces to each overlay, or divide them so that only one overlay needs to run through the GLES pipeline. In the best case, there is at most one surface per overlay, and no GLES processing needs to be done. There is an article on how the Weston Raspberry Pi backend uses planes to best advantage.
Journey of a pixel
As an illustration, consider the journey of a single UI element from being programmatically created in a client application, to appearing on the user’s screen. This assumes that textures are shared between clients and compositor by passing handles to GPU resources (rather than the other methods listed in client-side rendering). For this example, we use Weston as the compositor; if Mutter were used, steps 5–7 would be replaced by a single step: compositing all surfaces on the GPU using GLES, to form the final output frame.
- The program creates a new widget using its UI toolkit.
- The toolkit sets up the widget in its GLES context and uploads any necessary textures to the GPU.
- When the application next renders a frame (e.g. due to part of the UI changing), it pushes its GLES context through the GPU’s GLES pipeline, creating an output texture in the GPU which contains pixel data for the entire application window.
- The application uses the Wayland protocol to notify the compositor (Weston) of the updated window, passing a handle to the GPU texture.
- When Weston next renders a frame, it determines if any surfaces need GLES transformations applied to them, and assigns the surfaces to planes and hardware overlays as required.
- For each plane, Weston composites all the surfaces in that plane,
creating output pixel data for that plane.
- If any transformations are needed for a plane, the Weston renderer pushes the surfaces in that plane through the GPU GLES pipeline.
- If no transformations are needed, or if the needed transformations can be implemented using more efficient hardware-specific APIs, this step is skipped.
- Weston uses hardware-specific APIs to composite all the planes to form the final output frame.
- The output frame is sent to the user’s screen.
The windows as the developer would like them to appear:
Uploading textures to the graphics memory.
Client-side (e.g. Clutter) rendering of individual windows in GLES.
Notifying Weston of the updated window.
Compositing surfaces within each overlay in GLES.
Compositing overlays to form the final output in the display controller.