Table of Contents:
Apertis can be used with a touchscreen only, in this case the user will need an on-screen keyboard to be able to enter information like passwords, URLs, messages.
This document outlines the current state of the Wayland protocols dealing with input methods, their implementation status as well as a possible approach for integrating this support into Apertis.
Terminology and concepts
In Wayland, multiple protocols are involved to allow users to enter text.
The text-input protocol allows compositors to send text to applications in a way which supports various input methods other than direct physical keyboard input. Examples of this include complex text composition methods such as CJK alphabets in which each character is typically composed from multiple keypresses, or state-aware input methods such as on-screen virtual keyboards which may offer text suggestions, correction, autocompletion, emoji, and other complex input types which are not supported by the traditional keyboard input mechanism.
A text input object is used to manage state of what are typically text entry fields in the application. Client applications send enable/disable events to the compositor following text input focus changes (this is typically done by the GUI framework in use), and the compositor can then decide when and where to display the on-screen keyboard.
Apart from enable/disable events, a number of state requests may also be sent by the client, allowing the compositor to keep track of the state of the input field. For example, set_content_type can be used by the client to specify what kind of text is expected, while set_cursor_rectangle can be used to specify an area around the cursor and thus allow the compositor to put a window with word suggestions near the cursor, without obstructing the text being input.
This protocol is currently on version v3 upstream, with v4 being discussed.
The input-method protocol allows the compositor to delegate work to let user input text to some other program.
This protocol is very similar to text-input, because it lets a program (e.g. an on-screen keyboard application) to send text to the compositor, and allows the compositor to tell this program what kind of text is needed.
The program will then communicate to the user (e.g. through interaction with the on-screen keyboard) and give the text to the compositor. Once received, the compositor will typically send the text onward to the currently focused application using the text-input protocol, creating a chain: special program → compositor → focused application.
Additionally, because there is typically only one application using this protocol, it can do things which would not work with multiple applications. One of them is grabbing the keyboard, by allowing the input method to receive all hardware keyboard input (exclusive grab). This allows the input method to preprocess the input before forwarding it, which is common to CJK language users, for example by allowing the input method to send the text “你好” when “nihao” is typed.
The latest protocol supported upstream is on version v1, with version v2 available and v3 under development.
The virtual-keyboard protocol is designed for programs which want to tell the compositor to issue “fake” keyboard events, as if they came from a physical keyboard.
This should allow inputting text in legacy applications which don’t support the text-input protocol or triggering actions which would normally need a keyboard, and is done by emulating key presses.
Important to note that if the compositor enables a keyboard to perform arbitrary actions, it should prevent untrusted clients from using this interface.
This protocol is not yet available upstream, with a proposal adding v1 support currently under discussion.
- A user wants to enter a text without a physical keyboard (i.e. using an on-screen keyboard)
- A user wants to be able to enter text in a number of languages and writing systems (e.g. English/Latin, CJK)
- A user wants to be able to make use of text input features such as correction and completion suggestions
- A user wants to be able to select and input emoji characters
The chosen on-screen keyboard implementation must:
- allow to configure the keyboard layout
- be automatically enabled when user selects a text input field and allow users to show it manually (for legacy applications - see below)
- not require any changes to the applications themselves
In previous versions of Apertis, a custom widget in the client application was used for spell checking. This widget was built to exclusively target the legacy Mildenhall platform, and thus it brings several problems:
- It is exclusively tied to Mildenhall applications; no other GUI frameworks are supported.
- Each app has their own instance of the widget, thus the application side is also responsible for tasks such as positioning the keyboard on the screen, while not actually knowing the full screen layout as a compositor does.
Attempting to migrate it away from being dependent on Mildenhall would essentially amount to a full rewrite, resulting in little advantage versus an alternative solution.
Using a New Widget
Even if a new in-application widget were created, both of these points would still apply:
- There would need to be a widget for every graphical application framework, and every application using the framework would need to explicitly include the widget.
- The problems and inefficiency with having the application position the keyboard on the screen without full knowledge of the entire screen persist outside of Mildenhall, as it is purely an architectural problem.
Implementations in Other Systems
Qt uses a framework-specific, client-side plugin, qtvirtualkeyboard. Thus, like the obsolete Mildenhall speller widget mentioned previously, qtvirtualkeyboard is exclusively tied to Qt applications. In addition, code reuse is not possible, as it is under the GPLv3 license.
LG webOS OSE uses custom plugins on top of the Maliit IME framework. Although Maliit itself is LGPL-2.1, the reference keyboard implementation is under LGPL-3.
A fully-fledged input method program will be a Wayland client using the input-method protocol for submitting text, but also supporting virtual-keyboard for submitting actions, and as a fallback for legacy applications.
A compositor would ferry text around between the input method program and whichever application is focused. It would also carry synthetic keyboard events from the input method program to the focused application.
An application consuming text would support text-input, generally through a GUI framework like GTK or Qt, and it would send enable and disable events whenever a text input field comes into focus or becomes unfocused.
Legacy applications won’t send enable and disable events, even when a text field is focused and the user is ready to type. When that happens, the compositor and the input method won’t realize when to display the on-screen keyboard or when text should be submitted. Because of that, it’s best to always make sure the user can bring up the on-screen keyboard to be able to input text, which would then be delivered as keyboard events (which are always supported by applications) via the virtual-keyboard protocol.
Currently the majority of the on-screen keyboard applications was developed for the
display server. For
Wayland only a few are available:
- Maliit Keyboard 2
Simple implementation of an on-screen keyboard. The application only supports roman, numeric and arabic keyboards, which are hardcoded, and it is built on top of outdated versions of the text-input and input-method protocols. (This can be improved, however.)
- License: X11, MIT and CC-BY-SA
- Languages: C
Maliit Keyboard 2
Maliit Keyboard 2 is an evolution of the Ubuntu Keyboard plugin for Maliit, which can be run standalone and supports many different languages and emoji.
- License: LGPL-3, BSD and CC-BY (The license of the combined work is LGPL-3.0-only)
- Languages: QML, C++
Squeekboard has been developed to be the on-screen keyboard of Librem 5 phone OS, using Phoc compositor which is based on wlroots.
- License: GPL-3
- Languages: Rust, C
The following table lists GUI frameworks, clients and compositors and their corresponding implementation status:
Given the only (currently) GPL-3 free (matching Apertis licensing expectations) on-screen keyboard implementation is a simple/demo version (weston-keyboard), Apertis may either opt to improve it, use one of the other existing implementations (as a GPL-3 exception) or implement a new one from scratch.
The recommended approach is to patch Weston to support the latest protocols versions and ship weston-keyboard as the reference on-screen keyboard implementation. A merge request exists to integrate text-input v3, input-method v2 and virtual-keyboard v1 to Weston (including weston-keyboard) and could be used as a starting point.
As needed, weston-keyboard could potentially be forked as a separate project from Weston to allow using a more modern GUI toolkit for its implementation.
Optionally, other changes could be made to weston-keyboard to improve or implement new features such as supporting more languages or adding emoji support.
This should allow existing applications to interact with the on-screen keyboard without modifications, even for legacy applications not supporting the text-input protocol.
The different Wayland protocols involved in an on-screen keyboard are currently under development and subject to change, see Stalled Upstream Protocol Work.
This documentation and some of the illustrations are based on or come from:
- Wayland and input methods blog post
- It’s not about keyboards blog post
- Input Method Hub Wayland issue