XML parsing

XML is used for a few formats within Apertis, although not as many as JSON. It is more commonly used internally by various GLib systems and tools, such as GSettings and D-Bus. In situations where it is parsed by Apertis code, the XML being parsed typically comes from untrusted sources (untrusted web APIs or user input), so must be validated extremely carefully to prevent exploits.


Parsing XML

XML should be parsed using a standard library, such as libxml2. That will take care of checking the XML for well-formedness and safely parsing the values it contains. The output from libxml2 is a hierarchy of parsed XML elements — the Apertis code must extract the data it requires from this hierarchy. The navigation of this hierarchy is still security critical, as the parsed XML document may not conform to the expected format (the schema for that document). Strings should be checked to see if they’re empty or invalid UTF-8; integer parsing should check for failure or unparsable characters; the parser should error if required elements aren’t encountered or expected attributes are missing; etc.

Schema validation

Ideally, all XML formats will have an accompanying XML schema which describes the expected structure of the XML files. If a schema exists for an XML document which is stored in git (such as a GtkBuilder UI definition), that document can be validated at compile time, which can help catch problems without the need for runtime testing.

Schemas can be written in XSD or RelaxNG. The choice is a matter of personal preference, as both are equally expressive.

One tool for this is xmllint, which allows validation of XML documents against schemas. Given a schema called schema.xsd and an XML document called example.xml, the following Makefile.am snippet will validate them at compile time:

check-local: check-xml

check-xml: schema.xsd $(xml_files)
    xmllint --noout --schema schema.xsd $(xml_files)

.PHONY: check-xml

Various existing autotools macros for systems which use XML, such as GSettings, already automatically validate the relevant XML files.