Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLGPL appears to be treated as a plain license, not as a license exception #91

Open
dvzrv opened this issue Feb 6, 2024 · 8 comments
Open

Comments

@dvzrv
Copy link

dvzrv commented Feb 6, 2024

Hi! 👋

As background: I have added this project as backend for namcap, a validation tool for packages and build scripts, that is used on Arch Linux. From what I can tell after integrating is, that it works pretty well for our use-case and helps us a great deal in being more compliant with SPDX license identifiers (see https://rfc.archlinux.page/0016-spdx-license-identifiers/). Thanks for that! 🎉

However, there appear to be edge cases and maybe you are able to help me in figuring this particular one out.

When trying to package an upstream that uses the LLGPL preamble my assumption would be, after reading

Exceptions are added to a license using the License Expression operator, "WITH".

on https://spdx.org/licenses/exceptions-index.html that an expression such as LGPL-2.1-only WITH LLGPL would be valid. This does not seem to be the case though:

Python 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from license_expression import get_spdx_licensing
>>> licensing = get_spdx_licensing()
>>> licensing.parse("LGPL-2.1-or-later WITH LLGPL", strict=True)
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/license_expression/__init__.py", line 540, in parse
    tokens = list(self.tokenize(
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/license_expression/__init__.py", line 604, in tokenize
    for token in tokens:
  File "/usr/lib/python3.11/site-packages/license_expression/__init__.py", line 1090, in replace_with_subexpression_by_license_symbol
    raise ParseError(
boolean.boolean.ParseError: A plain license symbol cannot be used as an exception in a "WITH symbol" statement. for token: "LLGPL" at position: 23

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.11/site-packages/license_expression/__init__.py", line 548, in parse
    raise ExpressionParseError(
license_expression.ExpressionParseError: A plain license symbol cannot be used as an exception in a "WITH symbol" statement. for token: "LLGPL" at position: 23
>>> licensing.parse("LGPL-2.1-or-later WITH LLGPL")
LicenseWithExceptionSymbol(license_symbol=LicenseSymbol('LGPL-2.1-or-later', aliases=('LGPL-2.1+',), is_exception=False), exception_symbol=LicenseSymbol('LLGPL', aliases=('LicenseRef-scancode-llgpl',), is_exception=False))
>>>

When looking at above output, it becomes clear, that the expression fails parsing when applying strict rules, because LLGPL is treated as a "plain license". This is evidenced by it being represented by a LicenseSymbol where is_exception=False when not applying strict rules during parsing.

Relatedly, we apply strict parsing by default, because we want to have SPDX compliant expressions and we want to have them to be correctly distinguished between plain licenses and exceptions for packaging reasons. In the case of LLGPL this does not seem to work correctly and namcap fails parsing the license expression.

Furthermore (and to make things somewhat more complicated, but also more specific and useful for us as a distribution), when it comes to packaging, on Arch Linux we rely on a system-wide package (see https://gitlab.archlinux.org/archlinux/packaging/packages/licenses) to provide "common" license and exception files in well-known locations. Those "common" license files are of licenses and exceptions that are frequently used verbatim and do not contain or require individually identifying information (e.g. specific list of authors) or an ever-changing date identifier. We also provide full lists of all known license and exception identifiers separately. The package allows us to centrally share common license files and not repackage them in every package.
Namcap in turn relies on this information to identify and correlate known identifiers and the ones that are common (and thus do not need to be packaged).

This unfortunately does not work with LLGPL though, since license-expression treats it as a plain license, not an exception identifier. As such namcap fails when adding it in a WITH expression (e.g. LGPL-2.1-or-later WITH LLGPL) and would if provided plainly (e.g. LLGPL) require the user to prefix it with LicenseRef-, because LLGPL is not found in the list of "known" licenses (as it is in the list of "known" license exceptions).
From my understanding, the expression LGPL-2.1-or-later WITH LLGPL should be valid though and LLGPL should be treated as a license exception, not a plain license.

This leads me to the question: Is there a specific reason why LLGPL is treated as a plain license and not as a license exception?

@pombredanne
Copy link
Member

@dvzrv Thanks for using this small lib! This is awesome.

This leads me to the question: Is there a specific reason why LLGPL is treated as a plain license and not as a license exception?

This is an oversight, a data bug on our side and easy to fix : add an is_exception: yes to https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/llgpl.LICENSE

Like in https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/classpath-exception-2.0.LICENSE#L8

https://gitlab.archlinux.org/archlinux/packaging/packages/licenses/-/blob/main/.SRCINFO?ref_type=heads#L7

I would advise to prefix this with arch-linux or something to the same effect, like in LicenseRef-archlinux-none ... even better would be to have a proper license.

We also provide full lists of all known license and exception identifiers separately. The package allows us to centrally share common license files and not repackage them in every package.

There is an interesting wrt. license compliance when the tarballs do not contain the licenses. We can discuss about this in a separate issue/discussion.

NB: I might move this issue to ScanCode toolkit or create a new one there as the fix will happen there

@pombredanne
Copy link
Member

And related: if you want to actually detect the licenses in the code and match that to the metadata-level expression, you may want to look into ScanCode toolkit or .io

@dvzrv
Copy link
Author

dvzrv commented Feb 6, 2024

There is an interesting wrt. license compliance when the tarballs do not contain the licenses. We can discuss about this in a separate issue/discussion.

We separately make source tarballs available for the upstreams licensed under terms that require it. All Arch Linux systems have the common license files installed which the binary package files require though :)

NB: I might move this issue to ScanCode toolkit or create a new one there as the fix will happen there

Thanks for getting back on this matter so quickly! 🥳

And related: if you want to actually detect the licenses in the code and match that to the metadata-level expression, you may want to look into ScanCode toolkit or .io

Ah thanks, that's a great piece of info! I guess our tooling still lacks there. Personally I rely on SPDX-License-Identifiers in upstream code mostly, or other specific license information provided by upstreams.

@pombredanne
Copy link
Member

@dvzrv you wrote:

We separately make source tarballs available for the upstreams licensed under terms that require it. All Arch Linux systems have the common license files installed which the binary package files require though :)

FWIW, we actually wrote a parser for Arch's PKGBUILD shell scripts in https://github.com/nexB/scancode-toolkit/blob/f70bbb7d9d9bab40a9d504e664bc945b6a1630e8/src/packagedcode/bashlex.py#L43 and https://github.com/nexB/scancode-toolkit/blob/f70bbb7d9d9bab40a9d504e664bc945b6a1630e8/src/packagedcode/bashparse.py (which is also the format used by Alpine Linux APKBUILD files and Msys2/mingw) and another parser for PKGINFO/BUILDINFO (for msys2) that I need to validate to use with AUR https://github.com/nexB/scancode-plugins/blob/4df0cf04e1b7b6774ba6e983c7c57002f19327c9/etc/scripts/msys2.py#L883

The problem with a the shared package containing all the license texts is that it satisfies the engineer in me who wants to avoid file duplication in downloads. But it means that each and every binary download such as https://archlinux.org/packages/extra/any/acpi_call-dkms/download/ are also not GPL compliant and are missing the all important GPL license text. :]

To add insult to injury, https://github.com/nix-community/acpi_call is also missing the GPL text and ARCH is further incorrectly reporting a plain "GPL" which means literally "GPL-1.0-or-later" instead of the upstream "GPL-3.0-or-later"... so the work done on ARCH in a bit lossy wrt. upstream and upstream is not even complying with its own license.

Ah thanks, that's a great piece of info! I guess our tooling still lacks there. Personally I rely on SPDX-License-Identifiers in upstream code mostly, or other specific license information provided by upstreams.

That's not bad at first, but for a comprehensive approach, a full scan with ScanCode will not hurt. I helped a few years ago adding proper SPDX License ids in the kernel for instance, but the point is as much as I wish, not everyone is using these.

@Foxboron
Copy link

Foxboron commented Feb 6, 2024

To add insult to injury, https://github.com/nix-community/acpi_call is also missing the GPL text and ARCH is further incorrectly reporting a plain "GPL" which means literally "GPL-1.0-or-later" instead of the upstream "GPL-3.0-or-later"... so the work done on ARCH in a bit lossy wrt. upstream and upstream is not even complying with its own license.

fwiw, it's still ongoing work to move to SPDX identifiers. And I assume a lot of issues like that will be fixed over the next year.

@pombredanne
Copy link
Member

@Foxboron 👋 much honored! Tell me how we can help within our modest capabilities. I work a bit with Debian developers too and I want every package to have a clean and clear license expression.

Ideally, I would love to have the top-level PKGBUILD license expression being derived automatically from the actual code licenses notices (or overridden when things are missing/incorrect and eventually pushed upstream as fixes for future releases)

Is this something we could work as a distros collaboration and may find some good souls to help with the effort (which is a massive undertaking at a distro scale ... and makes the work I did on the kernel look like small potatoes)

@Foxboron
Copy link

Foxboron commented Feb 7, 2024

Tell me how we can help within our modest capabilities. I work a bit with Debian developers too and I want every package to have a clean and clear license expression.

Thanks for the offer. Currently it is very much a manual process and people are updating license information as they go along with the help of the community.

I'm sure David would appreciate more eyes on the code he has already written :)

https://gitlab.archlinux.org/pacman/namcap/-/blob/master/Namcap/rules/licensepkg.py?ref_type=heads

Ideally, I would love to have the top-level PKGBUILD license expression being derived automatically from the actual code licenses notices (or overridden when things are missing/incorrect and eventually pushed upstream as fixes for future releases)

I suspect this could be part of our developer tooling, our package linter and/or a future CI/CD check. I don't think it could be part of the package manager itself. Arch is probably not going to be as detailed as distros like Debian currently is which helps a bit making this simpler to implement.

Is this something we could work as a distros collaboration and may find some good souls to help with the effort (which is a massive undertaking at a distro scale ... and makes the work I did on the kernel look like small potatoes)

I'm not sure. It's interesting and I can point at the places where such a thing could be implemented. But I'm not sure if anyone in Arch is up for the time investment.

AyanSinhaMahapatra added a commit that referenced this issue Feb 27, 2024
Reference: #91
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants