-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix VersionType pattern #41
Comments
I just tested this on pythex and these are considered valid:
Playing around I came up with this:
Seems to work better. You can play around with it here: |
The pattern is fine but beware that this pattern goes into the XSD Also |
OK, “^$” can be added to ensure it’s an isolated string. The core regex still holds from what I’ve tested. IMO 9.0 is OK, 9.000000 is not because it’s unbounded. We can arbitrarily bind it. How many chars? |
Not sure what you are trying to say. If you test the regex with '^$' , i.e. |
I was referring to this: It would be very easy to modify the regex to allow for some (arbitrary) number of consecutive 0s. While I realize that there might be some sort of system out there that would track thousands of "minor" versions, I was suggesting we cap the number so as to have "completeness" in the formula. 1.0000000000000000000000000000000000000 is not realistic, even though 1.000 might be. Yes, it will be arbitrary, but in this space I suspect that we could cover all real world situations with something like this without leaving it unbounded: On a side note, this regex assumes there is a decimal. Is that something we wish to enforce? Seems reasonable to me. |
OK, I need to clarify. The dot character '.' here is not a decimal in a versioning scheme, but a delimiter between the different versioning parts: Major version, Minor version, Patch version, etc. So it can be repeated more than once. In fact, my goal is to follow SemVer (semantic versioning) to some extent, as suggested in my first comment. SemVer is a well-known best practice for versioning, especially in software. Quote from SemVer: A normal version number MUST take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes. X is the major version, Y is the minor version, and Z is the patch version. Each element MUST increase numerically. I simply reused the SemVer regex (the second one), but removed the Build metadata part which may be a bit overly complex for policy versioning I think, and allowed extra versioning parts after the Patch version ( Now going back to your example, |
This also applies to VersionMatchType. For matching numbers the standard just says "a direct numeric match". I looked at what my implementation is doing, which is to ignore leading zeros for each number. So 0, 00 and 000 are the same number and 8, 08 and 008 are the same number, but 8, 80 and 800 are different numbers. The matching needs to be better specified, either by saying that leading zeroes are ignored (or significant) or by changing the patterns to outlaw leading zeroes. Outlawing leading zeroes means that the equality of two versions can be determined by a string match. |
I can easily update it to allow leading zeros if that’s the direction we want to go. I’m not a fan of it but trivial change on the regex side. Agree 100% on improving the specification. |
Yes, like in the VersionType pattern I propose, leading zeroes are outlawed, each part (separated by a dot) must be a valid integer representation in decimal. But I propose we deal with VersionMatchType in a separate issue and reuse the resolution of this one. |
Played with it and since So... any benefit compared to |
No. |
allows for:
That seems incorrect. |
Seems okay to me. I don't see a problem with anyone choosing just a single number that is monotonically increasing as the version. We don't need to impose both a major version number and minor version number on policy versions. 80000 is exceedingly unlikely for a version number assigned by a human, but I don't discount the possibility of some highly-automated, policy-revising PAP that increments a minor version number to very large values. If we impose a limit then what do we choose? We need a determination on leading zeroes because it affects the implementation of version matching and interoperability. Apart from that we don't have a technical reason to be more restrictive. |
I agree with you in theory it's differing a bit from SemVer as it allows to omit the Minor and Patch versions. But like @steven-legg , I considered this would be a bit tedious for Policy writers to always specify a Minor and Patch versions for policies. Still, to make everybody happy, we could specify a simplified versioning scheme where version 800000 is still a valid integer for a Minor version, however big it is, so not an issue for me. Fixing any maximum value for a version would be arbitrary. |
Assuming that 1 = 1.0 = 1.0.0 What is the suggested matching mechanism? I assume thar SemVer took the « minor version » requirement approach to help facilitate equality testing. |
Well, currently the matching mechanism in XACML 3.0 says (VersionMatchType):
So I propose two alternatives:
(We don't need to worry about the patch version - third part - actually here.) |
I think we are getting closer with this reference but should we add something that makes references to "0" versions in particular so that 1 = 1.0 = 1.0.0 = 1.0.0.0, etc. Perhaps something like "Trailing 0" version notations are to be ignored for for purposes of comparison
Maybe with an example of:
Oddly pedantic, I know, but 0s are a unique case and it seems like a good idea to call out the fact that raw string matching isn't likely going to work with them. |
Considering that #16 started with the proposition to dispense with versions and version matching because they weren't used in practice we are probably over engineering this. Anyone depending on version matching is already dealing with 1 != 1.0 and anyone who isn't doesn't care (except the need for implementations to be conformant). Let's fix ambiguities and move on. |
How do you suggest we |
What to do about numbers with leading zeroes isn't spelled out. The matching procedure is otherwise deterministic. We can either:
The first option leads to the easiest matching algorithm to implement. The current matching procedure clearly decides that 1 != 1.0 and there is no single match string that can match all and only 1, 1.0 and 1.0.0. That doesn't bother me. |
Sounds like we're moving to simple string match. If 1 != 1.0 and 1.0. != 1.0.0 then 01 != 1 seems equally logical. In that case is regex needed at all...? We don't appear to have any semantic requirements (2 > 1.0?) other than simple character equality. Also, if we can envision the case of some automated process generating a version like 0.8000000, perhaps there could be a case where a UUID is used to force uniqueness with each iteration? Seems unlikely, but I've always assumed that the ultimate purpose of this was to determine antecedence :) |
It isn't. That would be the second option: say leading zeroes are significant.
The IdReferenceType requires that the versions can be ordered from earliest to latest but an ordering isn't explicitly spelled out anywhere, so it's effectively implementation defined. There isn't even a requirement that two policies with the same ID must have different versions, though IdReferenceType needs this to be so in order to choose "the most recent one". Neither of two policies with the same ID and version can be considered more recent. We need to specify that policies with the same ID cannot have equal versions, whatever way version equality is defined. |
This all makes sense. The question I now have is if there is value defining antecedent versioning? The use case that comes to mind is situations where “version > #” (vs “version = string”) could be determined from the value. To do that would require us to be more specific in our definition, to not do it would require the implementer to maintain and ordered list (which would be cumbersome). Of course this depends on the validity of the use case. Is it something we think has merit? |
On reflection you were probably asking if we should impose no pattern on a version string and just use lexicographic string matching. That would be okay for equality matching but would not work for conventional version ordering, e.g., 1.10 < 1.9. |
Yes. And to do that I believe that we need a structure that clearly defines sequence which, in turn, requires logical equality specification. For example, if “2” is allowed as a version number there should be a mechanism for determining that is greater than 1.0, 1.9 and 1.0.0.8000000000. |
We decided with #16 that we didn't have a use case but historically there were people that did so we kept versions. We can try to improve the version matching but we are flying blind without a solid use case. Maybe it's not even fit for purpose. The current IdReferenceType allows some oddities, like an EarliestVersion or LatestVersion of "*.3". The LatestVersion is inclusive but I suspect that making it exclusive would be more useful. EarliestVersion="1", LatestVersion="2" includes every version starting with 1 but also includes 2, but not 2.0.1. So 1.9 and 2 are okay but 2.0.1 isn't? Versions 1 and 1.9 are presumably more different than 2 and 2.0.1. If LatestVersion were exclusive the range would only contain versions with major version 1 and would support what Cyril was looking for with "1+". |
Based on this conversation my read is that we are looking for a flexible solution that allows for a wide array of (reasonable) operational values (1, 0.1, 1.2.3.4, 1.80000), while being able to deterministically compare values for equality and antedence. Here is my proposal: We establish a basic version pattern of Major.Minor.Patch.Build, where each period separated number is an integer and only Major is required in a Rule/Policy. (Regex should not be needed given the general acceptance of what an integer is 🙂 ) When a comparison is performed, they candidate values are "expanded" using 0s for each of the missing elements, where applicable. More levels of versioning are not supported. Using the versions above we have: 1 -> 1.0.0.0 Version order is determined by first comparing the integer values in this order: Major, Minor, Patch, Build. Where [pedantically] the smaller number is antecedent. Using the examples above yields: 1.8000 > 1.2.3.4 > 1 > 0.1 I tried a number of different ideas and kept coming back to realization I was overthinking this. "Use" and "Comparison" being treated independently was the clearest and simplest idea I could come up with. |
I wouldn't count on that. It doesn't hurt to specify a pattern to remove any doubt, especially since this is a subset of what XACML 3.0 allows. |
…where an integer is defined as /(0|[1-9]{1,}/ Done. 🙂 |
Not quite, unless you meant to exclude version 10. The full pattern would be something like: "(0|[1-9]\d*)(.(0|[1-9]\d*)){0,3}". |
Nice catch 🙂 My primary goal was to exclude “.” because it makes the complicated regex mess that we’ve been wading through. By calling out version levels being separated by a period in the text I’m hoping we can reduce the regex to the simplest syntax possible (which is enlightening me as to why the SemVer people use “^$”). This is the cleanest I could come up with: This matches:
This doesn’t match:
|
A regex for a non-negative integer in the text is easier for an implementor to disregard than a regex for a version string in the XML Schema or JSON Schema. |
ergo my suggestion for using the term “integer” 🙂 |
Interestingly, the XML Schema integer type allows leading zeroes. |
…and XPATH doesn’t match “01” and “1” unless you cast using number() which, I believe, will convert 01.1 to 1.1 and 01.1.1 to NaN, correct? Fortunately, if we go with the “parse, cast, test” model it’s straightforward to cast “01” as 1. Of course this also means 1.01 = 1.1 or is invalid. My suggestion is that we provide lots of examples. 🤣 |
In that case, for matching Major.Minor.Patch.Build (where each Major / Minor / Patch / Build is a non-negative integer), the regex would be: (The dot must be escaped because it is a special character in regex that otherwise represents any character, and |
Works for me. I'm OK with "0[0-9]" being invalid. |
We don't need the anchors for an XML Schema pattern.
|
Yes, already mentioned that in a previous comment, but it was necessary for @humantypo to test with online regex checkers like pythex. Indeed, the
All right. Fine with me. |
The VersionType pattern is currently
(\d+\.)*\d+
which allows meaningless versions with multiple leading / trailing zeros, such as :0000000.1
0.000000
Fix: change the pattern to
((0|[1-9]\d*)\.)*(0|[1-9]\d*)
.See also SemVer pattern for more elaborate semantic versioning patterns.
The text was updated successfully, but these errors were encountered: