Strategies for breaking changes #1841

aucampia · 2022-04-17T14:35:12Z

aucampia
Apr 17, 2022
Maintainer

Given there is very limited resources to maintain RDFLib I think we need to adopt a stategies for introducing breaking changes which reduces the need to maintain a separate branch and also reduces the need for huge pull requests which take multiple days to review and that have to be rebased for weeks.

Two approaches we can follow here:

Add breaking changes to our current code base, but just under private names e.g. if changing a function foo, add the new function with _foo_next, if changing a class _Foo, add the new class with _FooNext. If possible the old interface should then be re-implemented on the new interfaces, so only the new code runs.

Releasing a new version with the breaking changes then essentially comes down to deleting and renaming code and this can be done with a fairly minimal PR that is easy to review.
Use feature flags. We could use global flags in a hidden namespace to control behaviour. So for example, if we want to change the behaviour of Graph.parse(..., publicID=...) and relative URI resolution, we could check if rdflib._internal.future_public_id is set, and if it is use the future behaviour, and otherwise fall back to the old behaviour.

Releasing breaking changes then basically comes down to changing a boolean and deleting code.

There may be cases where these strategies won't work well, but for something like #1834 they may be ideal. Would be good to get some feedback about this, I could add something to the developers guide if this seems agreeable.

ghost · 2022-05-10T11:00:02Z

ghost
May 10, 2022

Okay, case in point: #1901 - should there be some “degree of impact” criterion? It's a relatively trivial change and it seems like handling it as a breaking change is probably disproportionally more work than just leaving it in abeyance until needed.

1 reply

aucampia May 12, 2022
Maintainer Author

I this case I would say we make a global somewhere, RDFLib major version, and then if major version >= 7 we use the new URI, otherwise we use the old URI. That way we can easily find the code to clean up in future, we can fairly easily test the right behavior for different RDFLib major versions.

However I will at some point write a guideline on versioning and compatibility.

mwatts15 · 2022-05-15T21:05:31Z

mwatts15
May 15, 2022

Given there is very limited resources to maintain RDFLib I think we need to adopt a stategies for introducing breaking changes which reduces the need to maintain a separate branch and also reduces the need for huge pull requests which take multiple days to review and that have to be rebased for weeks.

Two approaches we can follow here:
* Add breaking changes to our current code base, but just under private names e.g. if changing a function `foo`, add the new function with `_foo_next`, if changing a class `_Foo`, add the new class with `_FooNext`. If possible the old interface should then be re-implemented on the new interfaces, so only the new code runs.
  Releasing a new version with the breaking changes then essentially comes down to deleting and renaming code and this can be done with a fairly minimal PR that is easy to review.

I don't think you need to add under private names -- if it's code that's ready for prime time, give it a public name. You can put the updated class/function/value under a namespace package or in the same module with a name that indicates the version. Deprecate the specific functionality that's going away by clearly adding a message with DeprecationWarning or something else if appropriate: the dead date for the old functionality can be firmed up as late as needed. You're obliged to keep the versioned names around for a while, of course, but you can deprecate those too eventually.

* Use feature flags. We could use global flags in a hidden namespace to control behaviour. So for example, if we want to change the behaviour of `Graph.parse(..., publicID=...)` and relative URI resolution, we could check if `rdflib._internal.future_public_id` is set, and if it is use the future behaviour, and otherwise fall back to the old behaviour.
  Releasing breaking changes then basically comes down to changing a boolean and deleting code.

Please don't do this. Would create headaches for dealing with multiple unrelated packages both depending on RDFLib, but only working for one or the other value for a flag.
With the create-a-versioned-class approach, above, both the library depending on the old and the new can work the way they expect. There are going to be some areas where things aren't modular enough to feasibly implement this, particularly if there's "public" global state in the module (datatype bindings come to mind), but it should work for a lot of things.

1 reply

aucampia May 16, 2022
Maintainer Author

While I like some of your suggestions in theory, I'm not sure how practical it
is in the near term, maybe after a lot of cleanup and rearchitecting we can
consider it, but it really does amount to a complete rewrite almost.

For now we have the reality that we have one version number for RDFLib, and we
are somewhat trying to follow semver, though I think we should make a firm
commitment to it. I will try and clarify exactly what this commitment is in
documentation when I have time, but at the very least major version 6 releases
should be backwards compatible on the interface provided by modules directly
under rdflib and outside of rdflib.plugins.

Consider the following real world examples of what we have to deal with:

Reworking the Query.serialize interface:
Make serialize() on a CONSTRUCT result act like normal g.serialize() #1834
Fixing the skolemization URI for RDFLib:
Fix for #1824 s,http://rdlib.net,http://rdflib.net,g #1901
Reworking Dataset interface and removing ConjunctiveGraph:
dataset re-work for 7.0 release #1814

Our options here are, as far as I can see:

For #1:

Change Query.serialize and commit to releasing V7.

Possibly the simplest option, but the downside here is that every release will
likely be a major version release.
Keep a PR around for many months, rebasing it all the time until we are
actually prepared to release RDFLib 7.

This is what is happening now and it is quite resource intensive to do.
Create rdflib.query.v2, rdflib.query.QueryV2,
rdflib.query.Query.serialize_v2,

I think this complicates things too much and I have not really seen this
pattern being used that successfully using python. I would not mind seeing
some examples of it being used in practice, in theory it sounds good, but this
actually adds many dimensions to the problems we have. Say we go for
versioning the function, now every function that calls it needs to be
versioned if it has to somehow expose some functionality from it, the end
result may be a quite complicated and unmanagable proliferation.
Merge the fix under rdflib.query.Query._serialize_v7, with
rdflib.query.Query.serialize implemented on top of it - and then once we
release V7 just replace rdflib.query.Query.serialize with
rdflib.query.Query._serialize_v7.

This to me reduces the actual change that is needed to release V7 compared to
keeping the whole PR around, but also allows us flexibility in when we want to
release V7.

For #2:

Change the skolemization URI and commit to releasing V7.

See previous concern.
Keep a PR around for many months, rebasing it all the time until we are
actually prepared to release RDFLib 7.

See previous concern.
Versioned modules/classes/functions.

See previous concern.
Use _SKOLEM_AUTHORITY instead of "http://rdlib.net/" - and then set
_SKOLEM_AUTHORITY based on the value of some private global variable, it
could be an explicit feature flag, it could be something as simple as
rdflib._internal.version[0]. We can already test it, and when we are ready
to commit to releasing it then it is a very minimal change.

For #3 it more or less comes down to the same, if we can somehow keep the change
under the surface and maintain public contracts we can merge the bulk of the
changes early, and then once we are ready make a very minimal change to release
a new version.

Specifically addressing your feedback:

I don't think you need to add under private names -- if it's code that's ready
for prime time, give it a public name.

The problem is, if we we make it under a public name it becomes a public
contract, and to me the less we are committing to the easier thigns are to
maintain. Also in the cases in question it is not about adding a new function,
it is about updating existing functions that are quite core to RDFLib.

You can put the updated class/function/value under a namespace package or in
the same module with a name that indicates the version.

We currently have one version for RDFLib, prolifierating this into multiple
dimensions will not make things that much easier in my view, neither for users
of RDFLib or for the maintainers. It will complicate inheritence heirarchies, it
will be strange for users who now have to recall which version of
Graph.serialize is actual at the moment, it will make upgrading quite noisy. I
would want to see some prior art on this before considering it.

Deprecate the specific functionality that's going away by clearly adding a
message with DeprecationWarning or something else if appropriate: the dead
date for the old functionality can be firmed up as late as needed. You're
obliged to keep the versioned names around for a while, of course, but you can
deprecate those too eventually.

If we are following semver, which we are on paper, we are obliged to keep it
around as long as we have the same major version if removing it will break
backwards compatibility.

Use feature flags. We could use global flags in a hidden namespace to
control behaviour. ...

Please don't do this. Would create headaches for dealing with multiple
unrelated packages both depending on RDFLib, but only working for one or the
other value for a flag.

These flags would not be for public consumption, they would be entirely for
internal use, if someone is fiddling with private variables then they should not
expect continued support or reliability as these are not part of our contract
with RDFLib users. Also it could be as simple as one global value which is
essentially the major version under which we are operating, and we can then
quite easily run our test suite for V7 and V8 by just mutating it in conftest.py.

mwatts15 · 2022-05-16T23:59:30Z

mwatts15
May 16, 2022

Will respond more in depth later.. currently on mobile. I was never suggesting to give *separate* version numbers to packages, classes, modules, or functions, but to use version numbers indicative of the rdflib release.

…

On Mon, May 16, 2022, 13:05 Iwan Aucamp ***@***.***> wrote: While I like some of your suggestions in theory, I'm not sure how practical it is in the near term, maybe after a lot of cleanup and rearchitecting we can consider it, but it really does amount to a complete rewrite almost. For now we have the reality that we have one version number for RDFLib, and we are somewhat trying to follow semver, though I think we should make a firm commitment to it. I will try and clarify exactly what this commitment is in documentation when I have time, but at the very least major version 6 releases should be backwards compatible on the interface provided by modules directly under rdflib and outside of rdflib.plugins. Consider the following real world examples of what we have to deal with: 1. Reworking the Query.serialize interface: #1834 <#1834> 2. Fixing the skolemization URI for RDFLib: #1901 <#1901> 3. Reworking Dataset interface and removing ConjunctiveGraph: #1814 <#1814> Our options here are, as far as I can see: For #1 <#1>: - Change Query.serialize and commit to releasing V7. Possibly the simplest option, but the downside here is that every release will likely be a major version release. - Keep a PR around for many months, rebasing it all the time until we are actually prepared to release RDFLib 7. This is what is happening now and it is quite resource intensive to do. - Create rdflib.query.v2, rdflib.query.QueryV2, rdflib.query.Query.serialize_v2, I think this complicates things too much and I have not really seen this pattern being used that successfully using python. I would not mind seeing some examples of it being used in practice, in theory it sounds good, but this actually adds many dimensions to the problems we have. Say we go for versioning the function, now every function that calls it needs to be versioned if it has to somehow expose some functionality from it, the end result may be a quite complicated and unmanagable proliferation. - Merge the fix under rdflib.query.Query._serialize_v7, with rdflib.query.Query.serialize implemented on top of it - and then once we release V7 just replace rdflib.query.Query.serialize with rdflib.query.Query._serialize_v7. This to me reduces the actual change that is needed to release V7 compared to keeping the whole PR around, but also allows us flexibility in when we want to release V7. For #2 <#2>: - Change the skolemization URI and commit to releasing V7. See previous concern. - Keep a PR around for many months, rebasing it all the time until we are actually prepared to release RDFLib 7. See previous concern. - Versioned modules/classes/functions. See previous concern. - Use _SKOLEM_AUTHORITY instead of "http://rdlib.net/" - and then set _SKOLEM_AUTHORITY based on the value of some private global variable, it could be an explicit feature flag, it could be something as simple as rdflib._internal.version[0]. We can already test it, and when we are ready to commit to releasing it then it is a very minimal change. For #3 <#3> it more or less comes down to the same, if we can somehow keep the change under the surface and maintain public contracts we can merge the bulk of the changes early, and then once we are ready make a very minimal change to release a new version. Specifically addressing your feedback: I don't think you need to add under private names -- if it's code that's ready for prime time, give it a public name. The problem is, if we we make it under a public name it becomes a public contract, and to me the less we are committing to the easier thigns are to maintain. Also in the cases in question it is not about adding a new function, it is about updating existing functions that are quite core to RDFLib. You can put the updated class/function/value under a namespace package or in the same module with a name that indicates the version. We currently have one version for RDFLib, prolifierating this into multiple dimensions will not make things that much easier in my view, neither for users of RDFLib or for the maintainers. It will complicate inheritence heirarchies, it will be strange for users who now have to recall which version of Graph.serialize is actual at the moment, it will make upgrading quite noisy. I would want to see some prior art on this before considering it. Deprecate the specific functionality that's going away by clearly adding a message with DeprecationWarning or something else if appropriate: the dead date for the old functionality can be firmed up as late as needed. You're obliged to keep the versioned names around for a while, of course, but you can deprecate those too eventually. If we are following semver, which we are on paper, we are obliged to keep it around as long as we have the same major version if removing it will break backwards compatibility. - Use feature flags. We could use global flags in a hidden namespace to control behaviour. ... Please don't do this. Would create headaches for dealing with multiple unrelated packages both depending on RDFLib, but only working for one or the other value for a flag. These flags would not be for public consumption, they would be entirely for internal use, if someone is fiddling with private variables then they should not expect continued support or reliability as these are not part of our contract with RDFLib users. Also it could be as simple as one global value which is essentially the major version under which we are operating, and we can then quite easily run our test suite for V7 and V8 by just mutating it in conftest.py. — Reply to this email directly, view it on GitHub <#1841 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALLFSAZFKBIY2TX4T6BL43VKKE6XANCNFSM5TT6UL7A> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

aucampia · 2022-09-03T15:08:15Z

aucampia
Sep 3, 2022
Maintainer Author

I added some guidelines in #2108 - please have a look, happy for any feedback.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategies for breaking changes #1841

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Strategies for breaking changes #1841

aucampia Apr 17, 2022 Maintainer

Replies: 4 comments · 2 replies

ghost May 10, 2022

aucampia May 12, 2022 Maintainer Author

mwatts15 May 15, 2022

aucampia May 16, 2022 Maintainer Author

mwatts15 May 16, 2022

aucampia Sep 3, 2022 Maintainer Author

aucampia
Apr 17, 2022
Maintainer

Replies: 4 comments 2 replies

ghost
May 10, 2022

aucampia May 12, 2022
Maintainer Author

mwatts15
May 15, 2022

aucampia May 16, 2022
Maintainer Author

mwatts15
May 16, 2022

aucampia
Sep 3, 2022
Maintainer Author