The license is an important element in open source projects as the license define acceptable use cases, user rights, and contribution guidelines. There are different ways to identify the license from the project source code such as SPDX headers, the LICENSE file, and the COPYING file. However an open source project may contain files from other projects and may use different licenses for different files.

Apertis has certain licensing expectations. In order to improve the accuracy of the licensing information, Apertis performs license scanning as part of it’s continuous integration process

Ensuring continuous maintenance of open source licence documentation

Maintaining the open source licenses documentation is an incremental process:

When Apertis is rebased on a new version of Debian or new packages are added, the licensing is checked for all packages involved. From a project perspective, Apertis teams tries to do a full scan on all projects at each release cycle.

During development, updates are monitored. The integration of a new project in Apertis and the update of source code are the operations that can result in the update of a license. New projects can be integrated at any time in Apertis. If new sources for a project already in Apertis are received: the licensing of the project can change, or the licensing for some distributables within this project can differ from the prevalent license.

Open source software shipped with devices that users buy adds significant licensing constraints to the software stack of preview and product releases. These constraints do not affect development releases, and it is possible to save some work on those releases.

Regular checks of the whole archive have been integrated into Apertis CI pipelines to provide early detection of any change to the licensing status of each package. A copyright report is generated and kept updated using scan-copyrights, helping Apertis maintainers to detect problematic licenses or missing information which may require a manual check.

Source code scanning with scan-copyrights

In order to validate the licensing of a package, the entire package source tree needs to be scanned to detect and find copyright holders and known licenses for each file. In order to achieve that, the scan-copyrights tool has been integrated to Apertis CI pipeline, rescanning and updating a copyright report for each package on every commit.

Written in Perl, scan-copyrights tool from libconfig-model-dpkg-perl uses licensecheck to parse the source files, detect known licenses and copyright statements, outputting the result in plain text or a Debian copyright file format.

Apertis packages keep an exhaustive copyright report in debian/apertis/copyright, containing information for every file in the source tree. During this process, missing information and unacceptable licenses are reported, which may require manual review from developers to complete the package copyright report. Two files are used for this purpose:

  • debian/apertis/copyright.yml: Contains a mapping YAML structure, where the key is a Perl pattern used to match a path, to manually provide the correct copyright information. See Filling_the_blanks.

  • debian/apertis/copyright.whitelist: Using git ignore format, lists files that will be ignored if reported with a missing/unacceptable license. Note that CI pipeline updates the copyright report with information for every file and will fail reporting on those problematic entries that weren’t whitelisted.

By default scan-copyrights tries to create short reports by using wildcards to create generic entries for folders. Unfortunately, in some scenarios this also requires the use wildcards while whitelisting or overriding licenses that do not comply with Apertis policies. Since the use of wildcards in these scenarios could lead to errors, this approach should only be used if it is strictly needed. To prevent that, another file can be used to tell scan-copyrights to not use wildcards and report the license of every file, even if they all have the same license in a given folder.

This is activated when the file debian/apertis/copyright-long is present. The content of the file is ignored but it good practice to write a comment in it explaining why it is there.

Future improvements

FOSSology is a license reporting tool. It is being integrated into Apertis as a replacement for scan-copyrights as part of an effort to enable end-to-end tracking of licensing information. Although scan-copyrights has helped a lot on automating the process, the approach using FOSSology will result in a finer grained and more reliable license identification through to the identification of the licensing applicable to each binary package.