Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new spec for go package URLs #338

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

maceonthompson
Copy link

The current PURL specification for Go was created before Go 1.11 modules and thus has namespace inconsistencies and lacks semantic versioning.

Although in many cases a module path corresponds directly to the URL of the hosting repository, that is not always true. The URL formed from the module path may be an endpoint that serves a redirect to the true host. This indirection protects projects that for whatever reason must change their hosting provider: their module names will continue to work. Consequently, it is undesirable to encode any aspect of the underlying hosting system as part of the PURL.

In essence, all Go modules form a single namespace. Since it is used by the majority of Go programmers, we propose to represent this namespace by the empty string. Though not included in this commit, other namespaces could be possible and would represent package managers and/or build tools that are alternatives to the go command.

The go type proposed here fixes the current issues by removing the namespace, using valid Go module versions (including pseudoversions), and adds some extra functionality to encode optional information about specific builds (GOOS, GOARCH, etc).

If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages

Copy link
Contributor

@matt-phylum matt-phylum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #196 #294 #308

This is a breaking change that affects all software utilizing PURL for Go. Personally, I don't think there's anything fundamentally wrong with pkg:golang except that the description is outdated, and I'm sure it can be fixed without making this level of breaking change. Maintaining the separation of namespace and name and putting the entire Go package ID into the PURL name makes PURLs difficult for human users to work with.

------
``go`` for Go modules:

- The ``namespace`` field is empty and implies the go mod proxy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the field empty or does it imply the go mod proxy? It can't be both.

- The ``name`` will be the full module path.
- The ``subpath`` will represent the package path within a module.
- The ``version`` will be a valid go version or pseudoversion, or empty.
- Additional Build information for binaries can be included as ``qualifiers`` (i.e VCS info, go version info, GoArch/GoOS info etc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional information should be explicitly defined here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactlty. be specific in the spec, so we all are on the same page.

``go`` for Go modules:

- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably specify that it is case sensitive. pkg:golang incorrectly states that it is not case sensitive and must be lowercased.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. this is what the whole #308 is about.
Please don't repeat the mistakes from the past.

@maceonthompson
Copy link
Author

See also #196 #294 #308

Thanks for pointing at these! This is essentially a combination of #196 and #308 (with the addition of qualifiers for build info). They go into more detail than this proposal, but especially in the case of namespaces #63 (comment) is a good example as to why dropping name in favor of an entirely coded namespace would be more useful. I understand that having a bunch of %2F in the PURL is ugly for humans, but is (we feel) necessary to ensure that go PURLs are consistent (which is to say that go module -> PURL is injective, a go module cannot be represented by different PURLs).

Say you have a module with the path host.com/maybeuser/module.
With the current type definition, both pkg:golang/host.com/maybeuser/module and pkg:golang/host.com/maybeuser%2Fmodule, could represent that module. In order for PURLs to canonically and uniquely define go modules in the way that they are defined on pkg.go.dev or the go module proxy, they must be unique as well.

@matt-phylum
Copy link
Contributor

Say you have a module with the path host.com/maybeuser/module.
With the current type definition, both pkg:golang/host.com/maybeuser/module and pkg:golang/host.com/maybeuser%2Fmodule, could represent that module. In order for PURLs to canonically and uniquely define go modules in the way that they are defined on pkg.go.dev or the go module proxy, they must be unique as well.

I think the better solution to this problem is that pkg:golang/host.com/maybeuser%2Fmodule stays illegal. It'd be better if the documentation explicitly stated it were illegal, but based on the examples and test cases the correct form is pkg:golang/host.com/maybeuser/module, and based on the reference parsing and formatting algorithms it's clear that these PURLs are distinct.

However, "a go module cannot be represented by different PURLs" is not generally the case:

  • The PURL spec describes a canonical format for PURLs, but users and even commonly used PURL implementations often get this wrong and produce non-canonical PURLs which must still be considered equal. For example, pkg:golang/host%2Ecom/maybeuser/module is a non-canonical, valid, PURL which refers to the same package.
  • A PURL may have qualifiers which may or may not be critical to the PURL. A PURL with a ?goarch is a different PURL which refers to the same module, but a PURL with a ?repository_url (or however the module proxy is specified) is a different PURL which may refer to a different module (probably more likely in other ecosystems).

@jkowalleck jkowalleck added Proposed new type type: go Proposed new type as well as component discussions labels Nov 8, 2024
@jkowalleck
Copy link
Member

jkowalleck commented Nov 8, 2024

This is a breaking change that affects all software utilizing PURL for Go.

I'd disagree. In fact, it is non-breaking, as it adds a completely new purl type. Therefore, no breaking changes are introduced.

@matt-phylum
Copy link
Contributor

It is breaking because no existing PURL software expects pkg:go, and new PURL software will not expect pkg:golang. This creates a compatibility problem where either the PURL is rejected as an unrecognized type or software on different sides of the breakage don't understand each other. If this is merged, all software that works with Go PURLs will need to be updated to accept both types of Go PURL and convert before they interoperate again.

@jkowalleck
Copy link
Member

It is breaking because no existing PURL software expects pkg:go [...]

this is true to every newly proposed PURL Type :-)
And none of them is a breaking change - neither in spec nor in behaviour.

this PR is trying to add a new type go. the existing golang is not touched at all.

@matt-phylum
Copy link
Contributor

The problem is that this is not a new type. The go type is intended to replace golang.

@jkowalleck
Copy link
Member

The problem is that this is not a new type.

it is not? Could you point me to the existing go type?

The go type is intended to replace golang.

I wonder how you come to this conclusion.
this very PR adds a new type, it does neither obsolete nor deprecate the existing golang type.

@matt-phylum
Copy link
Contributor

I wonder how you come to this conclusion.
this very PR adds a new type, it does neither obsolete nor deprecate the existing golang type.

From the PR description:

If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages

golang is the type currently used for Go modules and packages. For example: https://github.com/anchore/syft/blob/3c070e0ad9d69c0f2191be52e2f2fb4904bcd558/syft/pkg/cataloger/golang/package_test.go#L24 . This PR is introducing a second, more preferred type for the same purpose.

@jkowalleck
Copy link
Member

I wonder how you come to this conclusion.
this very PR adds a new type, it does neither obsolete nor deprecate the existing golang type.

From the PR description:

If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages

which is a behavioural change in a downstream application. This is out of scope of this spec, and not in our hands at all - we have no authority there.

golang is the type currently used for Go modules and packages. For example: https://github.com/anchore/syft/blob/3c070e0ad9d69c0f2191be52e2f2fb4904bcd558/syft/pkg/cataloger/golang/package_test.go#L24 . This PR is introducing a second, more preferred type for the same purpose.

exactly this paragraph makes it clear: this is a non-breaking change.

Causing no breaking change is the whole point of introducing a new purl type, instead of modifying an exising one.

@jkowalleck
Copy link
Member

jkowalleck commented Nov 8, 2024

I'm sure it can be fixed without making this level of breaking change.

i don't think so. #308 makes this clear: the existing spec has flaws that require breaking changes to fix them

The only way to fix golang is

  • a) introduce breaking changes in the existing purl-type << undesired !!!
  • b) introduce a new purl-type << feasible
  • c)
    1. have the PURL spec modified to allow versioning of purl-types << burocratic efforts that might lead to nothing
    2. if c)1. was successful: craft a purl-type golang version 2
    3. else fall back to a) or b)

@matt-phylum
Copy link
Contributor

Introducing a new type for an existing type is a breaking change to the PURL ecosystem. Implementations that use golang can continue to use golang and their golang PURLs will still be golang PURLs, but PURL has no negotiation mechanism where all the software that's going to read the PURLs agrees with the software that writes the PURLs on whether to use go or golang to describe Go dependencies.

If you start writing SBOMs that have go, they will be processed incorrectly by software that doesn't support go. If you continue writing SBOMs that have golang, they will be processed incorrectly by software that doesn't support golang. If you combine SBOMs using software that doesn't understand that go and golang are really the same type, the dependencies will be duplicated in the output. If you query go or golang packages against a vulnerability database, you have a 50/50 chance of finding the vulnerabilities unless the database understands both and converts golang to go.

Keeping golang is incompatible with the "a go module cannot be represented by different PURLs" goal of this PR.

You cannot just fix a PURL type by introducing a new type. Even if PURL libraries are updated to support transparently upgrading the old type into the new type on read, any software that is comparing pre-canonicalized PURL strings will need updates.

the existing spec has flaws that require breaking changes to fix them

What are the flaws that require breaking changes? #308 is about the path being incorrectly converted to lowercase, which is much more easily fixed by just not doing that.

@jkowalleck
Copy link
Member

jkowalleck commented Nov 8, 2024

Introducing a new type for an existing type is a breaking change to the PURL ecosystem.

how?

If a tool that produced purls would change it's behaviour by using the new purl-type, where they've used the other one before - this would be a breaking change in that very tool.
This is out of the scope of the purl spec -- we do not have authority there.

Implementations that use golang can continue to use golang and their golang PURLs will still be golang PURLs, but PURL has no negotiation mechanism where all the software that's going to read the PURLs agrees with the software that writes the PURLs on whether to use go or golang to describe Go dependencies.

So?
This is true to every purl type that is added over time.
An implementation written 2 years ago might not know the purl type that was defined yesterday.
This is by design and was never an issue. This is out of the scope of the purl spec -- we do not have authority there.

Keeping golang is incompatible with the "a go module cannot be represented by different PURLs" goal of this PR.

A PR tells a story, and the effective patch gets updated along with the discussions on a PR.
the initial PR description is usually not updated in accordance with the effective patch.

(PS: I review the content of the PR. and at the time of review, I saw no breaking change.
I was starting the "breaking" discussion in expectation that you'd agree that is no longer a breaking change, based on the current state of the PR.
I am happy we are discussing the topic anyway, i might be wrong, and I still need to learn.)

You cannot just fix a PURL type by introducing a new type. Even if PURL libraries are updated to support transparently upgrading the old type into the new type on read, any software that is comparing pre-canonicalized PURL strings will need updates.

how comes?

the existing spec has flaws that require breaking changes to fix them

What are the flaws that require breaking changes? #308 is about the path being incorrectly converted to lowercase, which is much more easily fixed by just not doing that.

the curerent golang spec says: the path MUST be lowercased.
This is wrong in terms of actual go dependency management: the path MUST NOT be lowercased.
Changing MUST to MUST NOT in golang purl-type is a breaking change of the specification.

``go`` for Go modules:

- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
Copy link
Member

@jkowalleck jkowalleck Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``name`` will be the full module path.
- The ``name`` is the full module path. It MUST be unmodified, and follow the `Go Module Reference <https://go.dev/ref/mod#go-mod-file-ident>`_.

this change would close #308

Copy link
Contributor

@matt-phylum matt-phylum Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- - The ``name`` will be the full module path.
+ - The ``name`` is the full module path. In case of an URL: protocol MUST be lowercased; host-part MUST be lowercased; path-part MUSTbe unmodified, as it is case-sensitive.

this change would close #308

I don't think this is correct.

  1. I don't think it's legal to include a protocol in the module path. Go makes some HTTPS requests to resolve a VCS URL to download the package from (usually this is delegated to the proxy).
  2. The host part is also part of the case sensitive module path. It should not be lowercased. Uppercase characters are currently forbidden by Go for modules. I don't think it's worthwhile or really correct for the PURL spec to be specifying how to convert an invalid module path into a valid module path, I don't think it's worthwhile for the PURL spec to be specifying how to validate Go module paths, this doesn't cover all the restrictions, and this may cause problems if Go ever changes the restrictions for some reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re 1: I see. i was wrong there. Adjusted my suggestion for the protocol.
re 2: the host-part is, per URL-spec case-insensitive, and is normalized to lowercase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as Go is concerned, it's usually a host-part but it has additional restrictions and it is case sensitive: https://go.dev/ref/mod#go-mod-file-ident

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I will modify my change-suggestion accordingly. does it fit better, now?


- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
- The ``subpath`` will represent the package path within a module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``subpath`` will represent the package path within a module.
- The ``subpath`` is the unmodified package path within a module.

- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
- The ``subpath`` will represent the package path within a module.
- The ``version`` will be a valid go version or pseudoversion, or empty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``version`` will be a valid go version or pseudoversion, or empty.
- The ``version`` may be a valid go version or pseudoversion, omitted when empty.

@matt-phylum
Copy link
Contributor

Adding a new type for a new type is much different than adding a new type for an existing type. An old tool not recognizing a truly new type is expected, but an old tool not recognizing Go PURLs anymore because a tool producing the data says that golang is now spelled go is a breaking change. You can argue that this isn't a breaking change in the PURL spec itself because it doesn't change golang, but it necessitates a breaking change in every current implementation of Go PURLs and complicates implementations of Go PURL consuming software as long as there are both go and golang PURLs going around.

Changing "MUST be lowercased" to "MUST NOT be lowercased" is a much less impactful change than this. From what I've seen, names with uppercase characters are uncommon, and an outdated implementation that is incorrectly lowercasing is still working correctly for all names that do not contain uppercase characters to lowercase. I would even say that on a larger scale it is not a breaking change because:

  • An outdated PURL producer that incorrectly lowercases an ID containing capitals produces the wrong PURL, but today those producers are producing exactly the same PURL and calling it correct despite referring to the wrong package.
  • An outdated PURL consumer that incorrectly lowercases an ID containing capitals reads the wrong ID, but today those consumers are already reading exactly the same ID and calling it correct despite referring to the wrong package.

In both cases, the PURL is still parsed successfully and the meaning of the PURL is unchanged with respect to the current "MUST be lowercased" spec. The only differences would be that the canonical form changes¹ and a new consumer receiving a PURL from an old producer might be more likely to expect that the ID refers to the correct package, but since there is no good way for an outdated consumer to recover the correct ID after an outdated producer lowercases it, any consumer that relies on getting the correct ID (eg to resolve the package files) is likely already broken and not lowercasing the name can only improve the behavior in that situation.

This causes the same alignment problems as introducing a go type, except that if the correct ID is lowercase, no problem occurs because lowercasing is already producing the correct PURL.

¹ Due to underspecification in the text and tests, I wouldn't trust incoming PURLs to be in the canonical form as my implementation understands it. There are numerous minor differences in which characters are escaped when (and sometimes how), so if you're accepting PURLs from an external source, even if you don't expect user-entered, non-canonical PURLs in that source, you should be canonicalizing those PURLs yourself if your application depends on them all being canonical for the same definition of canonical.

@matt-phylum
Copy link
Contributor

Go isn't the only ecosystem that has this problem of incorrect name normalization rules in this repo. I'm also aware of:

@zpavlinovic
Copy link

zpavlinovic commented Nov 14, 2024

Introducing a new type for an existing type is a breaking change to the PURL ecosystem.

If this is indeed true, then there is something really wrong with PURL: it does not allow for evolution. On the one hand, we cannot add modifications to the existing specification that could introduce breaking changes. On the other hand, we cannot introduce a new type because somehow that is a breaking change as well. So one is pretty much stuck with slight variations of the initial spec. Specs should be allowed to evolve just the way the software does.

There should really be a way to add versioning on top of PURL itself. What is being proposed here might in essence be just that for the go spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposed new type type: go Proposed new type as well as component discussions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants