-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new spec for go
package URLs
#338
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change that affects all software utilizing PURL for Go. Personally, I don't think there's anything fundamentally wrong with pkg:golang
except that the description is outdated, and I'm sure it can be fixed without making this level of breaking change. Maintaining the separation of namespace and name and putting the entire Go package ID into the PURL name makes PURLs difficult for human users to work with.
------ | ||
``go`` for Go modules: | ||
|
||
- The ``namespace`` field is empty and implies the go mod proxy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the field empty or does it imply the go mod proxy? It can't be both.
- The ``name`` will be the full module path. | ||
- The ``subpath`` will represent the package path within a module. | ||
- The ``version`` will be a valid go version or pseudoversion, or empty. | ||
- Additional Build information for binaries can be included as ``qualifiers`` (i.e VCS info, go version info, GoArch/GoOS info etc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional information should be explicitly defined here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactlty. be specific in the spec, so we all are on the same page.
``go`` for Go modules: | ||
|
||
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably specify that it is case sensitive. pkg:golang
incorrectly states that it is not case sensitive and must be lowercased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. this is what the whole #308 is about.
Please don't repeat the mistakes from the past.
Thanks for pointing at these! This is essentially a combination of #196 and #308 (with the addition of Say you have a module with the path |
I think the better solution to this problem is that However, "a go module cannot be represented by different PURLs" is not generally the case:
|
I'd disagree. In fact, it is non-breaking, as it adds a completely new purl type. Therefore, no breaking changes are introduced. |
It is breaking because no existing PURL software expects |
this is true to every newly proposed PURL Type :-) this PR is trying to add a new type |
The problem is that this is not a new type. The |
it is not? Could you point me to the existing
I wonder how you come to this conclusion. |
From the PR description:
|
which is a behavioural change in a downstream application. This is out of scope of this spec, and not in our hands at all - we have no authority there.
exactly this paragraph makes it clear: this is a non-breaking change. Causing no breaking change is the whole point of introducing a new purl type, instead of modifying an exising one. |
i don't think so. #308 makes this clear: the existing spec has flaws that require breaking changes to fix them The only way to fix
|
Introducing a new type for an existing type is a breaking change to the PURL ecosystem. Implementations that use If you start writing SBOMs that have Keeping You cannot just fix a PURL type by introducing a new type. Even if PURL libraries are updated to support transparently upgrading the old type into the new type on read, any software that is comparing pre-canonicalized PURL strings will need updates.
What are the flaws that require breaking changes? #308 is about the path being incorrectly converted to lowercase, which is much more easily fixed by just not doing that. |
how? If a tool that produced purls would change it's behaviour by using the new purl-type, where they've used the other one before - this would be a breaking change in that very tool.
So?
A PR tells a story, and the effective patch gets updated along with the discussions on a PR. (PS: I review the content of the PR. and at the time of review, I saw no breaking change.
how comes?
the curerent |
``go`` for Go modules: | ||
|
||
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The ``name`` will be the full module path. | |
- The ``name`` is the full module path. It MUST be unmodified, and follow the `Go Module Reference <https://go.dev/ref/mod#go-mod-file-ident>`_. |
this change would close #308
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- - The ``name`` will be the full module path. + - The ``name`` is the full module path. In case of an URL: protocol MUST be lowercased; host-part MUST be lowercased; path-part MUSTbe unmodified, as it is case-sensitive.this change would close #308
I don't think this is correct.
- I don't think it's legal to include a protocol in the module path. Go makes some HTTPS requests to resolve a VCS URL to download the package from (usually this is delegated to the proxy).
- The host part is also part of the case sensitive module path. It should not be lowercased. Uppercase characters are currently forbidden by Go for modules. I don't think it's worthwhile or really correct for the PURL spec to be specifying how to convert an invalid module path into a valid module path, I don't think it's worthwhile for the PURL spec to be specifying how to validate Go module paths, this doesn't cover all the restrictions, and this may cause problems if Go ever changes the restrictions for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re 1: I see. i was wrong there. Adjusted my suggestion for the protocol.
re 2: the host-part is, per URL-spec case-insensitive, and is normalized to lowercase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as Go is concerned, it's usually a host-part but it has additional restrictions and it is case sensitive: https://go.dev/ref/mod#go-mod-file-ident
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I will modify my change-suggestion accordingly. does it fit better, now?
|
||
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. | ||
- The ``subpath`` will represent the package path within a module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The ``subpath`` will represent the package path within a module. | |
- The ``subpath`` is the unmodified package path within a module. |
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. | ||
- The ``subpath`` will represent the package path within a module. | ||
- The ``version`` will be a valid go version or pseudoversion, or empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The ``version`` will be a valid go version or pseudoversion, or empty. | |
- The ``version`` may be a valid go version or pseudoversion, omitted when empty. |
Adding a new type for a new type is much different than adding a new type for an existing type. An old tool not recognizing a truly new type is expected, but an old tool not recognizing Go PURLs anymore because a tool producing the data says that Changing "MUST be lowercased" to "MUST NOT be lowercased" is a much less impactful change than this. From what I've seen, names with uppercase characters are uncommon, and an outdated implementation that is incorrectly lowercasing is still working correctly for all names that do not contain uppercase characters to lowercase. I would even say that on a larger scale it is not a breaking change because:
In both cases, the PURL is still parsed successfully and the meaning of the PURL is unchanged with respect to the current "MUST be lowercased" spec. The only differences would be that the canonical form changes¹ and a new consumer receiving a PURL from an old producer might be more likely to expect that the ID refers to the correct package, but since there is no good way for an outdated consumer to recover the correct ID after an outdated producer lowercases it, any consumer that relies on getting the correct ID (eg to resolve the package files) is likely already broken and not lowercasing the name can only improve the behavior in that situation. This causes the same alignment problems as introducing a ¹ Due to underspecification in the text and tests, I wouldn't trust incoming PURLs to be in the canonical form as my implementation understands it. There are numerous minor differences in which characters are escaped when (and sometimes how), so if you're accepting PURLs from an external source, even if you don't expect user-entered, non-canonical PURLs in that source, you should be canonicalizing those PURLs yourself if your application depends on them all being canonical for the same definition of canonical. |
Go isn't the only ecosystem that has this problem of incorrect name normalization rules in this repo. I'm also aware of:
|
If this is indeed true, then there is something really wrong with PURL: it does not allow for evolution. On the one hand, we cannot add modifications to the existing specification that could introduce breaking changes. On the other hand, we cannot introduce a new type because somehow that is a breaking change as well. So one is pretty much stuck with slight variations of the initial spec. Specs should be allowed to evolve just the way the software does. There should really be a way to add versioning on top of PURL itself. What is being proposed here might in essence be just that for the go spec. |
The current PURL specification for Go was created before Go 1.11 modules and thus has namespace inconsistencies and lacks semantic versioning.
Although in many cases a module path corresponds directly to the URL of the hosting repository, that is not always true. The URL formed from the module path may be an endpoint that serves a redirect to the true host. This indirection protects projects that for whatever reason must change their hosting provider: their module names will continue to work. Consequently, it is undesirable to encode any aspect of the underlying hosting system as part of the PURL.
In essence, all Go modules form a single namespace. Since it is used by the majority of Go programmers, we propose to represent this namespace by the empty string. Though not included in this commit, other namespaces could be possible and would represent package managers and/or build tools that are alternatives to the go command.
The
go
type proposed here fixes the current issues by removing the namespace, using valid Go module versions (including pseudoversions), and adds some extra functionality to encode optional information about specific builds (GOOS, GOARCH, etc).If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages