A Linux system such as those assembled by Apertis contain components licensed under many different licenses. These various licenses impose different conditions and it is important to understand to a good degree of fidelity the terms under which each component is provided. We are proposing to implement an automated process to generate software Bills Of Materials (BOMs) which detail both the components used in Apertis and the licensing that applies to them. Licensing isn't static, nor is it always as simple as all the components from a given source package deriving the same license. Packages have been known to change licenses and/or provide various existing or new components under different terms. Either now or at some point in the future, the licenses of some of the components in Apertis may start to be provided under terms that Apertis may wish to avoid. For example, by default Apertis is careful not to include components to be used in the target system that are licensed under the GPL version 3, the licensing terms wouldn't be acceptable in Apertis’ target markets.

In order to take advantage of new functionality and support being developed in the software community, Apertis needs to incorporate newer versions of existing software packages and replace some with alternatives when better or more suitable components are created. To ensure that the licensing conditions remain favorable for the use cases targeted by Apertis, it is important to continually validate that the licensing terms under which these components are provided. These licensing terms should be documented in a way that is accessible to Apertis’ users.

Debian packages by default track licensing on a per source package level. The suitability of a package is decided at that level before it is included in Debian, which meets the projects licensing goals. Apertis will continue to evaluate licensing before the inclusion of source packages in the distribution, but also wishes to take a more nuanced approach, tracking licensing for each file in each of it's binary packages. By tracking licensing to this degree we can look to exclude components with unsatisfactory licensing from the packages intended for distributed target systems, whilst still packaging them separately so they may be utilized during development. A good example of this situation is the gcc source package and the libgcc1 binary package produced by it. Unlike the other artifacts produced by the GCC source package, the libgcc1 binary package is not licensed under the stock GPLv3 license, a run time exception is provided and it is thus fine to ship it on target devices. The level of tracking we are proposing will detect such situations and will offer a straight forward way to resolve them, maintaining compliance with the licensing requirements.

To achieve this 2 main steps need to be taken:

  • Record the licensing of the project source code, per file
  • Determine the mapping between source code files and the binary/data files in each binary package

We recommend to integrate these steps into our CI pipelines to provide early detection of any change to the licensing status of each package. Extending our CI pipelines will also enable developers to learn about new issues and to solve them during the merge request development flow.

License scanners

There are various proprietary and open source tools which can help with tracking the licensing terms that apply to the pieces of software from which Apertis is built. The following tools are examples of those that can help to achieve the first of the steps outlined above:

  • Dependency-Track: Open source, higher level tool for presenting data from BOMs generated by other software.
  • FOSSA: Proprietary suite for license compliance tracking and management.
  • FOSSID: Proprietary license scanning tool.
  • FOSSology: Open source, server based tool, utilizing a number of techniques to extract licensing information from source and binary artifacts.
  • Licensecheck: A simple open source license checker.
  • Licensee: Open source tool, limited to scanning for license files.
  • Ninka: Very limited, light weight, open source tool, developed as a research project aimed at identifying licenses in source code.
  • Protex: Part of the Black Duck suite of proprietary tools for managing open source compliance.
  • ScanCode: Suite of open source tools, which provide a foundation on which the company developing them provides it's proprietary enterprise solution.
  • WhiteSource: Proprietary suite for open source component management.

Due to the open source nature of the Apertis project, we intend to utilize an open source tool for license compliance rather than a proprietary solution. Given the traction, community, and Linux Foundation involvement, our suggestion of open source tool for license scanning is FOSSology.

FOSSology

FOSSology is a server based tool which provides a web front-end that is able to scan through source code (and to a degree binaries) provided to it, finding license statements and texts. To achieve this FOSSology employs a number of different scanning techniques to identify potential licenses, including using matching to known license texts and keywords. The scanning process errs on the side of caution, generating false positives over missing potential licensing information, as a result it will be necessary to “clear” the licenses that are found, deciding whether the matches are valid or not. This is likely to be a very time consuming process, though bulk recognition of identical patterns may provide some efficiencies. Once completed, FOSSology will record the licensing decisions and can apply this information to updated scans of the source. It is anticipated that, after an initial round of verification, FOSSology will only require additional clearing of license information should the scan detect new sources of potential licensing information in an updated projects source or when new packages are added to Apertis. It is possible to export and import reports which contain the licensing decisions that have previously been made, if a trusted source of reports can be found then these could also be imported, potentially reducing the work required.

FOSSology is backed by the Linux Foundation, it appears to have an active user and developer base and a significant history. As such, it is felt that this tool is likely to be maintained for the foreseeable future and thus a good choice for integration into the Apertis workflow.

CI Pipeline integration

In order to avoid manual tasks the license detection should be integrated into the CI process. FOSSology provides a REST API to enable such integration.

FOSSology is able to consume branches of git repositories, thus allowing scanning of the given source code straight from GitLab. It is suggested that this should be triggered after normal build checks have been successfully performed. A report will be generated and retrieved, using the REST API, which describes (among other things) the licensing status of each file. The report can be generated in a number of formats, including various SPDX flavors that are easily machine parsable, which will be the preferred option. It is suggested that each component should require a determination of the licensing to have been made for every file in the project. Due to the large volume of licensing matches that will result from the initial licensing scan, we recommend that the absence of license information initially generates a warning. In some cases, to achieve the fine grained licensing information desired, the licensing of some files may need to be clarified with the components author(s). Once an initial pass of all Apertis components had been made we would expect missing license information to result in an error, as such errors would be as a result of new matches being found, which would need to be resolved in FOSSology before CI would complete without an error. The generated report should be saved in the Debian metadata archive so that it is available for the following processing.

Binary to source file mapping

Now that we have a way to determine the licensing of the source files, we need a way to determine which of these source files were used in each binary. Compilers store information in the binaries it outputs, that can be used by a debugger to pause execution of a process at a point corresponding to a selected line of source code. This information provides a mapping between the lines of source code and the compiled machine code operations. Executable binaries in Linux are generally stored in the Executable and Linkable Format (ELF), the associated DWARF debugging data format is generally used to store this debugging information inside the ELF in specific “debug” sections.

By parsing this information, the source files that were used to generate each binary can be determined. Combining this with the licensing information provided in the licensing report, a mapping can be made between each binary and it's associated licenses.

CI Pipeline integration

Apertis uses the Open Build Service (OBS) platform to build the binary packages in a controlled manner across several architectures and releases. OBS utilizes dpkg-buildpackage behind the scenes to build each package. This utility will have access to the source licensing report as it is contained in the Debian metadata archive. As well as the source licensing, the Debian metadata archive contains configuration to help dpkg-buildpackage determine how to build the source. This is typically done with the help of debhelper, which provides helpers that simplify this process. We plan to extend debhelper to include a command to perform the mapping between the binary files produced by the build and the license of the associated source files, using the process laid out above, and recording this for each of the binary packages to be made. In addition, this helper should record the licensing attached to any other files that will be packaged as well. Typically the binaries are striped (using a debhelper command called dh_strip) prior to packaging, removing the debug symbols from the binary and reducing its size. We suggest that it would be easier to perform the license mapping prior to this step. Whilst the debug symbols are kept, packaged separately in the dbgsym package, it's easier to perform the mapping before this is done. A report should be saved in each binary package covering the files shipped in that package. The report should be saved in /usr/share/doc/<package>/in a machine parsable SPDX format.

The new debhelper command will need to be added to the build rules for each package. Whilst most packages make use of debhelper, many do so via higher level helpers that factor out common functionality, such as dh and CDBS and this will add complexity to this task. There may be packages in Apertis that do not make use of debhelper, these packages will need special handling to ensure that the required steps are completed.

As these reports are provided by each binary package, the reports from installed packages can be accessed at image build time and amalgamated into an image wide report at that point should it be required. As a binary can be built from multiple sources, each with differing licenses, it will be necessary for the report to detail each file that is used to create each binary and the licensing under which it is provided. In some circumstances dual licensed source code may allow for a binary to be effectively licensed under the terms of a single license, that is the user has the option to pick a license that results in the whole binary being able to be provided under the terms of a single license. Where dual licensed source code isn't used, the terms of all applicable licenses should be declared. The terms of the various licenses may be considered compatible, allowing the binary to effectively be managed under the terms of the more restrictive license. For example, a binary derived from source code licensed with the GPLv2 license and other source code licensed with the MIT license, the terms of both apply to the binary, though as the terms of the MIT license will be met if the binary is used in accordance with the terms of the GPLv2, then handling the binary as though it was licensed under the GPLv2 will ensure the terms of both are met. Not all possible combinations of licenses work out this way and thus why it is important to ensure that licensing is properly tracked.

Binary Licensing Reporting

The approach each project using Apertis takes with regards to the reporting of licensing information should be driven by how this information is to be utilized, i.e. some projects may wish to parse the license information and present it in a single BOM file in HTML, XML or human readable text.

For the images provided by the Apertis project, we plan to combine the reports saved in /usr/share/doc/<package>/ into a single parsable file. Should it be required to provide some tool with which to interrogate the licensing which applies to the binary packages, the SPDX files can be imported into FOSSology.

CI Pipeline integration

Apertis utilizes Debos in its image generation pipeline. There is an existing tool available for the merging of SPDX documents. The generation of a combined BOM can be realized by utilizing this tool in a script to be run at the appropriate time during the image build process by integrating the script into the Debos recipes. Integrating scripts into the Debos recipes is an approach we have taken when generating the list of installed packages and list of files. It reduces the overhead and potential complexity of decompressing and mounting the images that would be necessary should the BOM be generated in a separate step.