JSON parsing

JSON is used for various formats within Apertis, and potentially also for various web APIs. It is a well defined format, and several mature libraries exist for parsing it. However, the JSON being parsed typically comes from untrusted sources (user input or untrusted web APIs), so must be validated extremely carefully to prevent exploits.

Summary

  • Use a standard library to parse JSON, such as json-glib. (Parsing JSON)
  • Be careful to pair up JSON reader functions on all code paths. (Parsing JSON)
  • Write a JSON schema for each JSON format in use. (Schema validation)
  • Use Walbottle to validate JSON schemas and documents. (Schema validation)
  • Use Walbottle to generate test vectors for unit testing JSON reader code. (Unit testing)

Parsing JSON

JSON should be parsed using a standard library, such as json-glib. That will take care of checking the JSON for well-formedness and safely parsing the values it contains. The output from json-glib is a hierarchy of parsed JSON nodes which may be values, arrays or objects. The Apertis code must then extract the data it requires from this hierarchy. This navigation of the hierarchy is still security critical, as the parsed JSON document may not conform to the expected format (the schema for that document). See Schema validation for more information on this.

When using json-glib, the JsonReader object is typically used to navigate a parsed JSON document and extract the required data. A common pitfall is to not pair calls to json_reader_read_member() and json_reader_end_member().

For example:

gint
read_some_member (JsonReader *reader)
{
  gint retval;

  /* This code is incorrect. */
  if (!json_reader_read_member (reader, "member-name"))
    {
      return -1;
    }

  retval = json_reader_read_int (reader);
  json_reader_end_member (reader);

  return retval;
}

This code is incorrect because json_reader_end_member() is not called on the code path where the member-name member doesn’t exist. That leaves the JsonReader in an error state, and any remaining read operations will silently fail.

Instead, the following should be done:

gint
read_some_member (JsonReader *reader)
{
  gint retval = -1;

  if (json_reader_read_member (reader, "member-name"))
    {
      retval = json_reader_read_int (reader);
    }

  json_reader_end_member (reader);

  return retval;
}

The same is true of other APIs, such as json_reader_read_element(). Read the API documentation for json-glib functions carefully to check whether the function will put the JsonReader into an error state on failure and, if so, how to get it out of that error state.

Schema validation

Ideally, all JSON formats will have an accompanying JSON schema which describes the expected structure of the JSON files. A JSON schema is analogous to an XML schema for XML documents. If a schema exists for a JSON document which is stored in git (such as a UI definition), that document can be validated at compile time, which can help catch problems without the need for runtime testing.

One tool for this is Walbottle, which allows validation of JSON documents against schemas. Given a schema called schema.json and two JSON documents called example1.json and example2.json, the following Makefile.am snippets will validate them at compile time:

json_schema_files = schema.json
json_files = example1.json example2.json

check-local: check-json-schemas check-json

check-json-schemas: $(json_schema_files)
    json-schema-validate --ignore-errors $^
check-json: $(json_schema_files) $(json_files)
    json-validate --ignore-errors $(addprefix --schema=,$(json_schema_files)) $(json_files)

.PHONY: check-json-schemas check-json

Unit testing

Due to the susceptibility of JSON handling code to break on invalid input (as it assumes the input follows the correct schema, which it may not, as it’s untrusted), it is important to unit test such code. See the Unit testing guidelines for suggestions on writing code for testing. The ideal is for the JSON parsing code to be separated from whatever code calls it, so that it can be linked into unit tests by itself, and passed JSON snippets to check what it retrieves from them.

Thinking of JSON snippets which thoroughly test parsing and validation code is hard, and is impossible to do without also using code coverage metrics (see the Tooling guidelines). However, given a JSON schema for the document, it is possible to automatically and exhaustively generate unit test vectors which can be easily copied into the unit tests to give good coverage.

This can be done using Walbottle:

json-schema-generate --valid-only schema.json
json-schema-generate --invalid-only schema.json

That command will generate sets of valid and invalid test vectors, each of which is a JSON instance which may or may not conform to the given schema.