Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a new view can break previous requests when embedding with column or FK as target #2277

Closed
steve-chavez opened this issue May 5, 2022 · 27 comments · Fixed by #2272
Closed
Labels
breaking change A bug fix or enhancement that would cause a breaking change embedding resource embedding

Comments

@steve-chavez
Copy link
Member

As exemplified in the docs: https://postgrest.org/en/stable/api.html#hint-disambiguation

Whenever a view is added it can break previous embeds - which is surprising behavior for most users.

This is specially bad considering we generate a lot of view relationships: #2262 (comment). Note that we generate view relationships recursively.

Proposal

  1. When using a column or FK for embedding, only tables should be detected and views not.
GET /orders?select=name,billing_address(name) HTTP/1.1

^ Will continue working even when adding more addresses views.

(From https://postgrest.org/en/stable/api.html#embedding-disambiguation)

Instead, views should be explicit in the request:

GET /orders?select=name,central_addresses!billing_address(name) HTTP/1.1
  1. Using a column/FK embed on views will still work for getting tables.
GET /central_addresses?select=name,billing_address(name) HTTP/1.1

(the above will get orders)

We had some use cases before doing this #1230, so it's an organic use case.


This will solve #2238 and #1643

@steve-chavez steve-chavez added bug breaking change A bug fix or enhancement that would cause a breaking change labels May 5, 2022
@steve-chavez
Copy link
Member Author

Adding a view of a junction will also make existing many-to-many embeds fail.

For example, adding this to our fixtures:

create view users_tasks_view as
select * from users_tasks;

Results in 12 test failures. For example this one:

GET /users?select=id,tasks(id)&order=id.asc&tasks.order=id.asc

So we can take the same approach for junction views - they will only be considered if they're explicit:

GET /users?select=id,tasks!users_tasks_view(id)&order=id.asc&tasks.order=id.asc

The downside is that we'd break this previous use case #1736

For this we can add a legacy config.

@wolfgangwalther
Copy link
Member

When using a column or FK for embedding, only tables should be detected and views not.

This would still allow breaking existing queries when adding a view, namely those queries that use the column name for embedding - and then a view with the name of the column is added later on.

This change would also create more differences between tables and views - and lead to a situation where self-embedding would suddenly not be supported the same on views compared to tables anymore.

Also migrating to schema isolation step-by-step as mentioned in PostgREST/postgrest-docs#531 would be made harder, because some queries might need adjustment at the same time.

For this we can add a legacy config.

Please don't. Cluttering the code with legacy config options is hardly an improvement. Let's instead make sure that we change it in a way that everyone can migrate to easily.


Let's break down on a higher level why adding new views can break previous requests. There's two reasons for that:

  • Adding a new view can cause a namespace collision, i.e. by adding a view that has the same name as a column or FK. This would mostly be solved by Unable to disambiguate in presence of a view when column name is equal to table name #2238 (comment), except for some collisions between column names and FK names. Once we have a collision, however, we can't resolve it anymore. No disambiguation as currently implemented or designed would fix that. If we wanted to do that, we'd need a way to tell postgrest in the query that a name belongs to a column, a FK or a relation.
  • Adding a new view can add a new "path" from origin to target, which creates ambiguity on "how to get there". Those can be resolved by adding hints, but possibly require more than one hint or relationship-type hints (m2m, o2m, m2o). Those conflicts can be made less likely by reducing the number of relationships considered for embedding. The one big thing we can do here is restricting embedding for m2m to strict junctions as mentioned elsewhere. We can restrict it even more as mentioned in Unable to disambiguate in presence of a view when column name is equal to table name #2238 (comment).

Note that both situations where an existing query is broken by adding a view can also happen when adding a column or a FK to a table - so those are not really specific to views. Given this and the down-sides regarding schema isolation, I suggest to find a way that does not depend on view vs. table - but rather something like #2238 (comment).

@steve-chavez
Copy link
Member Author

steve-chavez commented May 5, 2022

Hey Wolfgang,

This would still allow breaking existing queries when adding a view, namely those queries that use the column name for embedding - and then a view with the name of the column is added later on.

That is #2238 right? My answer here applies then - telling the user to change the view name is fine. It's also a rare case.

This change would also create more differences between tables and views - and lead to a situation where self-embedding would suddenly not be supported the same on views compared to tables anymore.

Right, self join on views would not be supported. Maybe I can make an exception in that case.

Also migrating to schema isolation step-by-step as mentioned in PostgREST/postgrest-docs#531 would be made harder, because some queries might need adjustment at the same time.

True. I guess it's a chicken and egg problem.

  1. If views keep working as it is, any new view can break many requests with embedding. Forcing the client to be updated to use a target!hint embed.

  2. If we make the change, views will need target!hint. So migrating a table to a view will also force the client to be updated.

I'd say 1 is much more impactful and surprising for users(some will just give up supabase/postgrest-js#229). Specially considering that we recommend views for more complex queries. So it's a fair trade.

A middle ground would be having a way to mark views for full embedding inference - views that work with col/fk as target.

ALTER ROLE authenticator SET pgrst.fully_inferred_views = 'a_view, another_view'

That would allow users to migrate, we can warn about this setting too. It's one more config, but I don't see another way around it.

@steve-chavez
Copy link
Member Author

steve-chavez commented May 5, 2022

This would mostly be solved by #2238 (comment)

Not sure what you mean by this, you mean:

Only allow table/view names as the embedding target. No more columns or FKs for that. Allow only column names, FKs or strict M2M junctions as hints. No more regular table/view names.?

Wouldn't that be a much worse breaking change? Also as I mentioned there, users naturally reach for col as target - so we can't break that.

the one big thing we can do here is restricting embedding for m2m to strict junctions as mentioned elsewhere

That's already done btw: #2262. But it's not enough to get to a stable embedding.

by adding a view can also happen when adding a column or a FK to a table - so those are not really specific to views.

Can an embed break when adding columns?

Regarding adding more FKs - that is the only way reasonable and acceptable for us to break embeddings. More FKs mean more relationships. I was actually thinking to add a warning for this in the docs. Our embeddings are foreign key joins after all.

But we should not break requests when adding views, foreign key joins on these is more of an artificial concept.

@steve-chavez
Copy link
Member Author

Also migrating to schema isolation step-by-step as mentioned in PostgREST/postgrest-docs#531 would be made harder, because some queries might need adjustment at the same time.
If we make the change, views will need target!hint. So migrating a table to a view will also force the client to be updated.

Actually, the above is not 100% true if the client used the table name as target:

GET /projects?select=*,clients(*)

If the clients table changes to a view that will still keep working.

If the client used:

GET /projects?select=*,client_id(*)

In that case, yes, changing that to a view won't work. The config I proposed would solve that though.

@steve-chavez
Copy link
Member Author

Only allow table/view names as the embedding target. No more columns or FKs for that. Allow only column names, FKs or strict M2M junctions as hints. No more regular table/view names.

Also note that on #2272 the hint mode is being changed to the above form, which is the stricter one. We can actually recommend that one now for even less chance of breaking requests.

https://github.com/PostgREST/postgrest/pull/2272/files#diff-08d8ccc40837ed2c7fea0f1f0eae30a34f8a8d3dea341d0c253db73565b8ec0cR187-R200

Before that wasn't possible, because as mentioned in the docs https://postgrest.org/en/latest/api.html#hint-disambiguation

In the target!hint form the target could also be a column or FK.

@wolfgangwalther
Copy link
Member

Wouldn't that be a much worse breaking change? Also as I mentioned there, users naturally reach for col as target - so we can't break that.

And I think we should break it.

Using columns for embedding might feel natural to some (it doesn't for me - at all) - but really it's just the root cause of all the ambiguities when embedding.

No o2m/m2o embedding that has the relation as target can be broken by adding another view - because the relation is already explicitly specified. Those can only be broken by adding new FKs and that's certainly something that would be expected as you rightly point out.

The same for m2m relationships - as long as those are given as <relation>!<junction>, no new view can break them.

Relationships are defined by junction tables and foreign keys - as you can tell from our haskell typing which models relationships this way. And I think we should limit ourselves to just those and throw columns out. This would make embedding so much more consistent overall, much easier to understand, much less error prone.

I'd say:

  • Embeddings given as <relation> only (without hint) can only embed m2o and o2m relationships.
  • Those m2o and o2m can only throw ambiguity errors, when multiple FKs exist between those relations. In this case a single hint can be given with the FK name: <relation>!<fk>.
  • Strict m2m junctions can be embedded as <relation>!<junction>.
  • Any other m2m junctions (non-strict) can be embedded as <relation>!<fk1>,<fk2>.
  • I still have no idea about a nice way to handle self-embedding

Yes, this might feel like a "big" breaking change. But really anything we do here will be a breaking change. And once you start thinking in terms of "relations" and "foreign keys" instead of "columns" when embedding, it will not be hard at all. After all an embedding is just a join - and users certainly understand they need a target relation for a join.

One more argument against columns: Using columns does not allow using multi-column FKs as relationships. So it's not a complete feature anyway. PostgREST has made a decision a long time ago, that there is no "simple" route->PK mapping like /client/:id etc. - specifically, because those wouldn't support multi-column PKs. Overall, it would be consistent to not use columns here either.

@steve-chavez
Copy link
Member Author

One more argument against columns: Using columns does not allow using multi-column FKs as relationships. So it's not a complete feature anyway. PostgREST has made a decision a long time ago, that there is no "simple" route->PK mapping like /client/:id etc. - specifically, because those wouldn't support multi-column PKs. Overall, it would be consistent to not use columns here either.

I think we're going in circles here. The above was already solved a long time with FK-as-target, which we would also drop with your proposal.

Yes, this might feel like a "big" breaking change. But really anything we do here will be a breaking change. And once you start thinking in terms of "relations" and "foreign keys" instead of "columns" when embedding, it will not be hard at all. After all an embedding is just a join - and users certainly understand they need a target relation for a join.
And I think we should break it.
Using columns for embedding might feel natural to some (it doesn't for me - at all) - but really it's just the root cause of all the ambiguities when embedding.

I've seen many of these cases(e.g. supabase/supabase#6274), not offering the col-as-target would break those and not offer a path forward - this is what I consider a "big" breaking change.

This would make embedding so much more consistent overall, much easier to understand, much less error prone.

What I can do to make it more understandable is to deprioritize the col-fk-as-target on the docs and prefer the target!hint form to disambiguate. The col-fk-as-target can be left as a last section in the disambiguation part as a "shortcut", possibly even discouraging(maybe deprecating) this form.

I do see a future where col-fk-as-target support is removed, but this change should not be so abrupt.

@steve-chavez
Copy link
Member Author

steve-chavez commented May 5, 2022

So we can take the same approach for junction views - they will only be considered if they're explicit:
The downside is that we'd break this previous use case #1736
Relationships are defined by junction tables and foreign keys - as you can tell from our haskell typing which models relationships this way.

Good point. So maybe the above breaking change I proposed is not needed, it's reasonable to fail in that case I believe. Also, the adding of junction views is more of an artificial problem, I've never saw them reported. The new strict m2m junctions also reduce likelyhood.

Strict m2m junctions can be embedded as !.

I can also prioritize the "relation!junction" form in the docs and leave the old form as a deprioritized shortcut.

Any other m2m junctions (non-strict) can be embedded as <relation>!<fk1>,<fk2>..

No need to support that now or in the future, flattening should cover those cases #1233 (comment). In fact, flattening(as ideated on the issue) would be able to express any m2m relationship - somewhat making the relation!junction form unnecessary. Users would still expect a direct way to get m2m relationships though, so I'm not advocating for eliminating that form.

@steve-chavez steve-chavez changed the title Adding a new view can break previous requests with embedding Adding a new view can break previous requests when embedding with column or FK as target May 5, 2022
@steve-chavez
Copy link
Member Author

A less disruptive solution to the issue might be just considering the "inferred relationships on views" being 2nd class when in presence of "table relationships". So if we detect 3 relationships and two of them are on views then the table relationship will prevail and be picked. If all 3 relationships are on views then we'd raise the usual error.

From #2238 (comment)

In light of the opposition to the proposal - the above solution would solve the issue without a breaking change. Though I'd have to see how feasible is to implement it(would probably be less efficient).

@wolfgangwalther
Copy link
Member

A less disruptive solution to the issue might be just considering the "inferred relationships on views" being 2nd class when in presence of "table relationships". So if we detect 3 relationships and two of them are on views then the table relationship will prevail and be picked. If all 3 relationships are on views then we'd raise the usual error.

From #2238 (comment)

In light of the opposition to the proposal - the above solution would solve the issue without a breaking change. Though I'd have to see how feasible is to implement it(would probably be less efficient).

Ah, I didn't comment on that, because it seemed like the idea was thrown away already: But I would be even more opposed to this, than to the other suggestion. And that's because with this proposal a working request could not only fail by adding a new view - but it could suddenly return a different result (without failing with an error code) when adding a new table relationship that would otherwise conflict with an inferred view relationship, which is already used.

So assume you use an existing inferred view relationship and you're adding a new table relationship that would cause a conflict: Suddenly, because table wins vs view you're now embedding a different relation compared to before, without getting an error.

That's something we should absolutely avoid - and that's why I think we should avoid all "ranking" based approaches like this one. All the different options need to have the same priority.

@bolerodan
Copy link

bolerodan commented May 10, 2022

Having experienced incorrect embedding very briefly a year or so ago (primarily from duck-typing), I'm curious what truly would be a solution to this with the current auto-detection that is in place? I've realized that being as explicit as possible in our queries has been the safest way as possible to avoid this scenario and always using hints.

Is it even possible to create a solution that guarantees not breaking current query embeds when adding in new tables/views down the line?

@wolfgangwalther
Copy link
Member

Is it even possible to create a solution that guarantees not breaking current query embeds when adding in new tables/views down the line?

It is certainly possible, but the problem at hand is 2-dimensional and there are many solutions. One dimension is the number of embeddings we detect / the types of embeddings we allow. The other dimension is our query syntax. The more embeddings we allow, the more verbose (or less flexible) our syntax needs to be for such a guarantee. The fewer embeddings, the simpler (or more flexible) the syntax can be.

Right now we have a simple syntax paired with many embeddings. That's trouble.

I think there's also a third dimension somehow and that's something I'd call "level of symmetry" within the whole feature. That's basically whether embedding is supported the same across all types of relations (tables, views), types of relationships (o2m, m2o, m2m, self-embedding) and all different syntaxes (using column names / fk names / relation names in target / hint).

We can approach this from two angles:

There are also ideas that would reduce the level of symmetry to try to keep a balance between a backwards compatible syntax and a reduction of ambiguities. This is what Steve is aiming to do in #2272. There will be breaking changes introduced by that as well - but those will be harder to explain. Some will be resolvable by changing syntax. Some will be resolvable by changing your schema. Some won't be resolvable. That's the nature of the asymmetric solution.

I think we should strive for a solution that creates as much symmetry as possible, because this is easier to understand well for users and has other benefits (full support for schema isolation would be one). At the same time we should try to only introduce breaking changes that can be solved by adapting the request you make. We should avoid breaking changes that force you to change your schema - or completely change the way how to query the api in certain scenarios (e.g. forcing you to use "something else" where an embedding worked before).

@steve-chavez
Copy link
Member Author

I've realized that being as explicit as possible in our queries has been the safest way as possible to avoid this scenario and always using hints.

@bolerodan IIRC you use columns as hints - e.g. /tbl?select=other!col(*)- in theory this could break(be ambiguous) if you added a new composite FK from tbl to other that also included col. However the possibility of that happening is low, so it's fine/stable. I believe that's all we can do with the foreign key join(embedding) feature, to minimize the possibility of an ambiguity happening. No perfect solution.

Right now by using the /tbl?select=col-or-fk(*) syntax, the possibility of ambiguity is too high in presence of views. This is what I'm aiming to solve with #2272. In theory there will be breaking changes for pure-view schemas, however in practice I'm not sure if that would cause a lot of breaking changes, since col-or-fk is too unstable with views as mentioned before. It's likely that pure-view schemas where already using the more strict other!hint syntax.

@steve-chavez
Copy link
Member Author

At the same time we should try to only introduce breaking changes that can be solved by adapting the request you make. We should avoid breaking changes that force you to change your schema - or completely change the way how to query the api in certain scenarios (e.g. forcing you to use "something else" where an embedding worked before).

@wolfgangwalther We haven't been following that criteria for breaking changes. Adapting the request you make involves client updates, which is a much harder change(making sure many clients download the new JS lib version, invalidating cache, etc). In contrast, changing your schema is a much easier change. For example, renaming a constraint(simple change in the db) allowed a migration path from the old duck-typing bolerodan mentioned. Renaming views is also a simple change(which could be done to avoid a breaking change for col-fk-as-target). Renaming a table/column name would be out of the question of course. Ideally this should be scoped to the api schema tailored for postgREST.

@wolfgangwalther
Copy link
Member

Adapting the request you make involves client updates, which is a much harder change(making sure many clients download the new JS lib version, invalidating cache, etc). In contrast, changing your schema is a much easier change.

Interesting take. For me, it's quite the opposite.

I can't avoid all breaking changes to my api. To be able to handle any breaking changes to my api cleanly, I need to use proper api versioning mechanisms. And that's not just the breaking changes we're talking about here (introduced by PostgREST), but also all breaking changes introduced by my own api development. If I don't have api versioning mechanisms in place, I can never do a proper upgrade.

We suggest to use schema-based api versioning somewhere.

The procedure is as follows:

  • Add a new schema with the new api version, including the breaking changes.
  • If you want to update PostgREST, run the new PostgREST version and point it at the new schema.
  • With a reverse proxy: Redirect your requests to the new version to the new instance, while keeping the requests to the old version at the old instance/schema.
  • Update all clients step-by-step to use the new api
  • When all clients are updated: shut down the old instance

This is a "rolling upgrade" of a server/client architecture without downtime.

You can automate this in Continuous Deployment setups, too.

Once you have this in place, a change to a client's request is really simple.

For example, renaming a constraint(simple change in the db) allowed a migration path from the old duck-typing bolerodan mentioned.

Renaming a constraint would be a breaking change to your API, which would force those clients that already use the previous constraint name to update their requests.

Renaming views is also a simple change(which could be done to avoid a breaking change for col-fk-as-target).

Renaming a view would be a breaking change to your API, which would force those clients that already use the previous view name to update their requests.

Renaming a table/column name would be out of the question of course.

Renaming a column name (in a relation in the api schema) would be a breaking change to your API, which would force those clients that already use the previous column name to update their requests.

Renaming things does not avoid breaking changes to a client's request at all.

@steve-chavez
Copy link
Member Author

Update all clients step-by-step to use the new api

We have always avoided telling users to execute that step.

Renaming a constraint would be a breaking change to your API, which would force those clients that already use the previous constraint name to update their requests.

Once upon a time, there was duck-typing. Assuming clients and projects tables, you could do:

GET /projects?select=*,client(*)

That is, requesting "clients" without the "s" at the end. That thing caused many more ambiguities that were hard to understand so it was removed. The migration path for that was to rename the autogenerated constraint name(say like clients_projects_fkey) to client and allow for FK-as-target.

Renaming things does not avoid breaking changes to a client's request at all.

So yeah, the above is just plain false.

@steve-chavez
Copy link
Member Author

steve-chavez commented May 10, 2022

Renaming views is also a simple change(which could be done to avoid a breaking change for col-fk-as-target).
Renaming a view would be a breaking change to your API, which would force those clients that already use the previous view name to update their requests.

Hm, maybe it's not renaming, so if you used a /view?select=*,view_col(*) to embed, you could add a new view named view_col to prevent a breaking change and updating the client(after #2272). So there is a migration path.

Actually I've noted that the above could lead to the same situation as #2238. So not sure it's infallible as a migration path.

@steve-chavez
Copy link
Member Author

The procedure is as follows:
This is a "rolling upgrade" of a server/client architecture without downtime.

This is still a good guide that we should document. Managing an additional postgrest instances plus proxy config is definitely not as simple as changing the api schema though.

@wolfgangwalther
Copy link
Member

That is, requesting "clients" without the "s" at the end. That thing caused many more ambiguities that were hard to understand so it was removed. The migration path for that was to rename the autogenerated constraint name(say like clients_projects_fkey) to client and allow for FK-as-target.

But that was only possible, because FKs were not used at all in the exposed API before. That's not the case here. We're talking about renaming things that are exposed already.

Renaming things does not avoid breaking changes to a client's request at all.

So yeah, the above is just plain false.

I wasn't specific enough, I guess. Let me rephrase:

Renaming things that are already part of the API does not avoid breaking changes ... etc.

@steve-chavez steve-chavez added the embedding resource embedding label May 11, 2022
@steve-chavez
Copy link
Member Author

steve-chavez commented May 11, 2022

I think we should strive for a solution that creates as much symmetry as possible, because this is easier to understand well for users and has other benefits (full support for schema isolation would be one).

Full schema isolation support is still there as always, the select=view(*) and select=view!hint(*) forms haven't changed. In fact the latter one would be a bit better with #2272 as mentioned in #2277 (comment).

I'd also argue that disambiguation will be easier to understand with this change, since we no longer have the case mentioned in
https://postgrest.org/en/latest/api.html#hint-disambiguation (new views will no longer cause the disambiguation error). I think it's mostly a matter of rearranging the docs and mention the col-fk-as-target is a convenience.

Another win is that we can now document the self-relationship embedding, without solving #1643 it was not stable enough to document it, but now it is.

That being said, #2272 is not a "perfect solution" by any means, but it's does solve 3 bugs in our issue tracker - a win I'd say.

@steve-chavez
Copy link
Member Author

steve-chavez commented May 16, 2022

Let's see how this goes in the next release. There's still a way to offer backward compatibility with this change, but it would require manually specifying relationships:

ALTER DATABASE your_db -- could also be ALTER ROLE authenticator 
SET pgrst.your_schema.my_view = $$
{
 "relationships": [
   { "target": "arbitrary_name"
   , "foreign_table": "other"
   , "join_columns": ["my_view.col1", "other.col2"]
   , "cardinality": ''o2m"}
  ]
}
$$

Then, it would be possible to:

GET /my_view?select=*,arbitrary_name(*)

We should try to keep the "low-code" approach though.

@steve-chavez
Copy link
Member Author

steve-chavez commented May 16, 2022

Actually implementing the above is not a bad idea for this release. It not only provides a way for users to migrate but also workarounds for other issues like #1907

Since it provides a way to do embeds without hints, it can also solve the case where there is a legitimate FK ambiguity, without patching clients. It would be capable of overriding existing inferred relationships.

@steve-chavez
Copy link
Member Author

In hindsight, I wonder if the server/db side config for resolving embeds would have been a better choice than the client side hints. Not only it would have kept our syntax simpler, but also allows us to provide better metadata(typings, openapi, etc) since the embeds are static and not resolved at runtime.

@bolerodan Curious on your thoughts on the above proposal

@bolerodan
Copy link

@steve-chavez could you elaborate a bit more about this proposal?

@steve-chavez
Copy link
Member Author

@bolerodan Sorry for the late reply. I gave it more thought and will implement #2144 to offer a way forward from this breaking change. With #2144 any embedding can be resolved unambiguosly.

@bolerodan
Copy link

@steve-chavez cool sounds good. I've been keeping my eye on that thread to see how it materializes (hah) on that front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change A bug fix or enhancement that would cause a breaking change embedding resource embedding
Development

Successfully merging a pull request may close this issue.

3 participants