Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Being able to manually specify foreign keys in views #1179

Closed
photz opened this issue Sep 3, 2018 · 20 comments
Closed

Feature request: Being able to manually specify foreign keys in views #1179

photz opened this issue Sep 3, 2018 · 20 comments
Assignees
Labels
embedding resource embedding

Comments

@photz
Copy link

photz commented Sep 3, 2018

Description

I am still fairly new to working with PostgREST, but my understanding thus far is that when a database view is exposed through PostgREST, then PostgREST will try to detect any foreign key relationships to enable the kind of resource embedding described in the docs here:
http://postgrest.org/en/v5.0/api.html#resource-embedding

In my limited experience this works very well as long as the view is fairly straightforward, but when the view definition goes beyond a select * from myTable, then PostgREST will often fail to detect foreign key relationships.

Since both resource embedding as well as the ability to work freely with views seem crucial to me, I've been wondering if it would be possible / make sense to let users manually specify which view columns are foreign keys.

@steve-chavez
Copy link
Member

Try v5.1(released a few days ago), we're now able to detect fks in more complicated views(see this comment on a previous issue), also the query got a lot faster(noticeable in big schemas).

If the new query detection is good enough(try/share a view that doesn't work) then it may be unnecessary to have this capability.

@photz
Copy link
Author

photz commented Sep 3, 2018

Thanks for your response, I'm currently using the subzerocloud/postgrest image which apparently already incorporates the changes you mention.

At the moment resource embedding on the state column in the following view doesn't work:

create or replace view updates as
select walk.lead as lead,
       0 as pos,
       walk.start_state as state,  -- foreign key to data.states
       walk.created_at as entered_at
from data.walk
union all
(select step.lead as lead,
        step.idx + 1 as pos,
        edges.to_node as state, -- foreign key to data.states
        step.created_at as entered_at
from data.step
inner join data.edges on edges.id = step.edge)
order by 1, 2;

@steve-chavez
Copy link
Member

steve-chavez commented Sep 3, 2018

UNION counts as modifying the origin column as I mentioned here and even PostgreSQL(by querying pg_rewrite) cannot tell from which table/view the column(state in your case) was coming from.

I had some ideas for manually specifying the fk here, but I don't see any of those getting implemented anytime soon.

This as a limitation for now(which will be documented) so I suggest trying to rewrite your view.

Edit: A simpler approach that occurs to me is to have a table like postgrest.views_fk from which postgrest would query these manually specified(would have to insert into this table) foreign keys for views.

@purpleKarrot
Copy link
Contributor

I try /stats?select=*,sites(*) without success:

CREATE TABLE projects (
  project_id  serial  PRIMARY KEY,
  ...
);

CREATE TABLE sites (
  site_id     serial  PRIMARY KEY,
  ...
);

CREATE TABLE jobs (
  job_id      uuid    PRIMARY KEY,
  site_id     int     NOT NULL REFERENCES sites,
  project_id  int     NOT NULL REFERENCES projects,
  ...
);

CREATE VIEW stats AS
  SELECT
    site_id,
    project_id,
    count(job_id) AS number_of_jobs
  FROM sites
  JOIN jobs USING (site_id)
  JOIN projects USING (project_id)
  GROUP BY (site_id, project_id);

@steve-chavez
Copy link
Member

steve-chavez commented Oct 18, 2018

@purpleKarrot Try with /stats?select=*,sites:site_id(*) and see if it works.

Edit: works for me, probably should work without site_id as well as I don't see the ambiguity, but this would be better reported in another issue.

@matt-snider
Copy link

matt-snider commented Jan 16, 2019

@steve-chavez I have a similar use case and issue, so I tried out the above schema. The results weren't what I expected. I'm using PostgREST 5.2.0, 1e732ac.

The sites array seems to be a list of jobs, where I would expect it to be the matching site object:

GET /stats?select=*,sites:site_id(*)
[
    {
        "site_id": 1,
        "project_id": 1,
        "number_of_jobs": 1,
        "sites": [
            {
                "job_id": "bc5d5362-b881-438f-b9f5-7417e08704ed",
                "site_id": 1,
                "project_id": 1
            },
            {
                "job_id": "3bd52697-033b-4edd-8a28-46a9c04b7c1e",
                "site_id": 1,
                "project_id": 2
            }
        ]
    },
    ...
]

Oddly this works for projects. Also notice that the projects query doesn't require the alias. The sites query does not work without it (i.e. Could not find foreign keys between these entities, No relation found between stats and sites).

GET /stats?select=*,project(*)
[
    {
        "site_id": 1,
        "project_id": 1,
        "number_of_jobs": 1,
        "project": {
            "project_id": 1,
            "name": "project x"
        }
    },
    ...
]

EDIT - here's the schema and data I tested with:

-- Schema
CREATE TABLE projects (
  project_id  serial  PRIMARY KEY,
  name varchar(150)
);

CREATE TABLE sites (
  site_id     serial  PRIMARY KEY,
  name varchar(150)
);

CREATE TABLE jobs (
  job_id      uuid    PRIMARY KEY,
  site_id     int     NOT NULL REFERENCES sites,
  project_id  int     NOT NULL REFERENCES projects
);

CREATE VIEW stats AS
  SELECT
    site_id,
    project_id,
    count(job_id) AS number_of_jobs
  FROM sites
  JOIN jobs USING (site_id)
  JOIN projects USING (project_id)
  GROUP BY (site_id, project_id);

-- Data 
INSERT INTO projects (project_id, name) 
VALUES (1, 'project x'), 
       (2, 'project y');

INSERT INTO sites (site_id, name) 
VALUES (1, 'site x'), 
       (2, 'site y');

INSERT INTO jobs (job_id, site_id, project_id) 
VALUES ('bc5d5362-b881-438f-b9f5-7417e08704ed', 1, 1),
       ('3bd52697-033b-4edd-8a28-46a9c04b7c1e', 1, 2);

@steve-chavez
Copy link
Member

@matt-snider Thanks for the detailed report, I can reproduce the problem. Could you please open another issue including your report? This is a bug related to how we do resource embedding(haskell code), our introspection query(sql) seems to be working fine.

@matt-snider
Copy link

@steve-chavez Thanks for checking it out. An issue has been created 😄

@photz
Copy link
Author

photz commented Feb 2, 2019

For what it's worth, here's how the authors of Postgraphile deal with this problem: https://www.graphile.org/postgraphile/smart-comments/#constraints

@photz
Copy link
Author

photz commented Feb 10, 2019

This issue continues to make my life quite miserable... What often happens is that everything works fine for a while, then suddenly my client stops working, and on closer inspection it turns out that a request failed because a particular resource couldn't be embedded. So some change to the view query in question must have resulted in PostgREST no longer being able to infer foreign keys. Then I must go back and look at the commit history to find out what caused it. I try to guard myself against this situation by writing request tests but occasionally something slips through my fingers. Since PostgREST is otherwise a blast to work with, I hope we will be able to address this.

@steve-chavez
Copy link
Member

@photz If you share the view in question we could improve the fk detection, only UNIONS should be the limitation.

@steve-chavez
Copy link
Member

We could have a config option that could contain a VIEW/function with added fk relationships. If specified our DbStructure queries could add this VIEW/function results to be included in our schema cache.

@LorenzHenk
Copy link

We could have a config option that could contain a VIEW/function with added fk relationships.

This would be interesting for us as well.

Alternatively, being able to specify the source and target join column when querying would be fine as well.
The main table / view column could for example be notated with an "@" and the remote column with an "!", e.g. /book_count_per_author_top_100?select=*,author_id@top_100_authors!id(name), which means "join book_count_per_author_top_100.author_id with top_100_authors.id".

@wolfgangwalther
Copy link
Member

The current idea on this was mentioned in some of those recently linked issues / PRs, but I'll repeat it here in the right issue for visibility.

We have recently made big progress with parsing view definitions in general (#1625, #1632 and others). We now have the base set up to be able to parse views with set operations (UNION etc.), too. Note: This is not implemented, yet, but the new JSON parsing does allow the implementation.

This should allow the detection of base columns / foreign keys for more complicated views already. Still, it might be valuable to provide manual "hints" for additional foreign keys that are not detected automatically. Once we can parse UNIONs, this could be done with a UNION of an always-empty-resultset like this:

CREATE VIEW v AS
SELECT fk_column, [...] FROM base_table WHERE FALSE
UNION ALL
SELECT [complicated query where FK is not detected] FROM base_table

The first part of the UNION will never return any rows, because of the WHERE FALSE - but when we parse the view definition, we should still be able to get the base columns from this part of the query. This means we would now enable embedding according to this fk_column of the view.

Alternatively, being able to specify the source and target join column when querying would be fine as well.
The main table / view column could for example be notated with an "@" and the remote column with an "!", e.g. /book_count_per_author_top_100?select=*,author_id@top_100_authors!id(name), which means "join book_count_per_author_top_100.author_id with top_100_authors.id".

I don't think we would want to go that way. Currently, the basic idea is that only relationships supported by foreign keys are made usable. My understanding is, that part of the reason for this is to prevent easy DDoS-ing by creating huge joins that are not supported by any index. A syntax that would allow arbitrary tables/views/columns to be specified would not be implemented with that in mind.

Would the above suggestion with an empty UNION work for your use-cases as well?

@LorenzHenk
Copy link

Once we can parse UNIONs, this could be done with a UNION of an always-empty-resultset

I dislike this approach because of several reasons:

  • separation of concerns: the way views are written should not be influenced by an external dependency (PostgREST); this is metadata and should be handled as such, IMO the cleanest way is to use a view / function as mentioned in Feature request: Being able to manually specify foreign keys in views #1179 (comment)
    • in our use-case views are written by customers - we do not want to teach them to write a UNION whenever they want to do a JOIN through the API - meaning our only option would be to automatically inject the UNION when our customer save the query and show it to them without the injected UNION part, which is very hard
    • it will lead to confusion of DBAs asking why this is needed (as they don't know / care about PostgREST, they just see a weirdly written query)
    • it can lead to unexpected query planner decisions and might make "tricking" the query planner into doing specific things impossible (of course tricking the query planner is discouraged, but happens in the real world)
    • uninstalling PostgREST will still leave its trails in the DDL of views, which is harder to track down than e.g. having to delete 1 table & 1 view containing all the metadata
  • you have to duplicate the column list of the view in the hint UNION - all UNIONed queries must have the same column names and types - which makes writing these UNIONs annoying and unnecessarily verbose
    • when adding new columns to the view (which is possible opposed to changing or deleting existing columns) through CREATE OR REPLACE VIEW, you have to remember to update the hint UNION
  • views are not auto-updatable when using UNION (this is a big issue for us, as we expose 1 schema with views - mostly SELECT * FROM some_schema.some_table - that access the tables from other schemas - for security/permission reasons and PostgREST limitations; we'd have to generate triggers for these simple - in normal circumstances auto-updatable - views)

My understanding is, that part of the reason for this is to prevent easy DDoS-ing by creating huge joins that are not supported by any index

DoS-ing could also be accomplished by:

  • having a self-referencing table / view without an INDEX on the foreign key column (maybe because the table is small enough so nobody cared), deeply self-joining this table
  • filtering / ordering by unindexed columns in a large table
  • using count=exact on a large table

Though I do understand that misusing this JOIN operator can lead to problematic results much faster / easier. Also, you'll most likely always join on the same columns, meaning a one-time configuration would make more sense.

Can you elaborate (or point me to the correct issue / comment) on why this approach of using UNION was chosen over having the metadata separated and provided through a view / function (as mentioned in #1179 (comment))?

@wolfgangwalther
Copy link
Member

Thanks for your detailed input, helps a lot in evaluating the idea.

Can you elaborate (or point me to the correct issue / comment) on why this approach of using UNION was chosen over having the metadata separated and provided through a view / function (as mentioned in #1179 (comment))?

First of all: This is just an idea - open for discussion. And at the current stage, Steve's idea that you are referencing and my idea are exactly the same: Just ideas. Nothing has been decided, yet.

All ideas have pros and cons, but one of the things I like more about the UNION approach compared to the "metadata table" approach is, that it keeps everything together in the view definition - in one place. This is much better to maintain, imho.


DoS-ing could also be accomplished by:

  • having a self-referencing table / view without an INDEX on the foreign key column (maybe because the table is small enough so nobody cared), deeply self-joining this table

I don't think that's possible - afaik a foreign key requires a unique target. Adding a constraint to ensure that, will create an index automatically.

See https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-FK:

A foreign key must reference columns that either are a primary key or form a unique constraint. This means that the referenced columns always have an index (the one underlying the primary key or unique constraint); so checks on whether a referencing row has a match will be efficient.


  • separation of concerns: the way views are written should not be influenced by an external dependency (PostgREST); this is metadata and should be handled as such, IMO the cleanest way is to use a view / function as mentioned in #1179 (comment)

Of course the goal for PostgREST is to be able to expose arbitrary schemas in the nicest way possible. But that doesn't mean that there are not any more advanced features that might need specifically crafted SQL code. Just think about all functions that make use of any of the request.* GUCs etc.. - writing your SQL completely independent from PostgREST is possible, but will not allow you to use many of the advanced features of PostgREST. That's nothing special to this feature.

As I said above: I see value in keeping it in the view definition instead of separating this out over multiple places. Once you keep this in another table, you suddenly need to take care that this is always in sync. On the PostgREST side we'd need to validate the custom data and then properly behave if the data is out of sync, etc. - lot's of challenges.

  • in our use-case views are written by customers - we do not want to teach them to write a UNION whenever they want to do a JOIN through the API - meaning our only option would be to automatically inject the UNION when our customer save the query and show it to them without the injected UNION part, which is very hard

There might be a mis-understanding: This UNION hint is very much a special case. By no means should you ever inject something into their definitions automatically. If you were able to do that, you'd need to parse their view definition anyway to construct something useful - which tables / columns would you otherwise inject? If you were able to improve on the parsing that PostgREST can already do - you should just improve the PostgREST code itself. Then those views would be detected automatically, too.

By no means does this need to be done for every JOIN or view. Not at all.

  • it will lead to confusion of DBAs asking why this is needed (as they don't know / care about PostgREST, they just see a weirdly written query)

Same thing for any other special PostgREST thing. "What is this GUC about? Never seen that in my life!". If your DBA is involved in checking your view definitions but doesn't care about what those views are used for - they are doing something wrong, sorry. You can easily add a comment in the view definition that explains the reasoning and even add a link to the PostgREST docs. Case solved.

  • uninstalling PostgREST will still leave its trails in the DDL of views, which is harder to track down than e.g. having to delete 1 table & 1 view containing all the metadata

Again, this is the same case with every advanced PostgREST feature. But I doubt you'll need to use this very often at all...

If you follow the PostgREST schema versioning approach, you would probably keep your "real" schema somewhere else and just add another schema with views that are only used by PostgREST to access that other schema. Many of those views could be simple SELECT * FROM table definitions. The point is, once you're done with PostgREST, you can just delete the whole schema.

  • you have to duplicate the column list of the view in the hint UNION - all UNIONed queries must have the same column names and types - which makes writing these UNIONs annoying and unnecessarily verbose

Again. It's a very special case. Depending on implementation, we could add the UNION hint as the last part of the union, too. In this case the column types would already have been defined by the first UNION, and you'd just need to write down the base columns you want to hint at and add NULL, NULL, NULL for all remaining columns. Fairly simple.

  • when adding new columns to the view (which is possible opposed to changing or deleting existing columns) through CREATE OR REPLACE VIEW, you have to remember to update the hint UNION

Yes. Otherwise, you'll get an error. Nice feature actually. Much better than having to remember to maybe update the metadata table to not be out of sync.

  • views are not auto-updatable when using UNION (this is a big issue for us, as we expose 1 schema with views - mostly SELECT * FROM some_schema.some_table - that access the tables from other schemas - for security/permission reasons and PostgREST limitations; we'd have to generate triggers for these simple - in normal circumstances auto-updatable - views)

I'd like to see a real world view definition that is auto-updateable by PostgreSQL but PostgREST doesn't detect foreign keys. I think it's possible, probably fairly easy to construct - but again this seems to be a special case. Certainly your SELECT * FROM table queries are not affected at all and will never need a UNION hint...


  • it can lead to unexpected query planner decisions and might make "tricking" the query planner into doing specific things impossible (of course tricking the query planner is discouraged, but happens in the real world)

As you can tell, I wasn't convinced by any of the above concerns so far. This one is the only one, where I can kind of follow. I'm quite certain that the planner would / should just throw away such a "always-false" clause - but I did not test it, yet. Of course, this needs to be tested to be sure.

@LorenzHenk
Copy link

There might be a mis-understanding: This UNION hint is very much a special case.

I see I focused too much on the use-cases where PostgREST is able to determine the foreign key automatically anyway, UNION should only be necessary in very rare cases.

If PostgREST will be able to determine most use cases automatically, I'm happy with whatever manual approach this project will take.

Concerning PostgREST determining the keys / references automatically, is there an endpoint to query which embeddings are available for a specific view? This would be interesting to create a GUI for building embedded requests in our product.

The comments below are just for completeness sake.

I don't think that's possible - afaik a foreign key requires a unique target. Adding a constraint to ensure that, will create an index automatically.

You are right, even though I was talking about the foreign key source column, the target column is indexed and thus the index is being used. An unindexed foreign key only has negative effects on DELETE of the target row.

... it keeps everything together in the view definition - in one place. This is much better to maintain ...
Much better than having to remember to maybe update the metadata table to not be out of sync.

It would be better to maintain if changes to the foreign key column happened, yes. But as established before, view columns cannot change. Though I see the point when someone drops and recreates a view, they can change the view definitions to their likings and the separated metadata would indeed be outdated.

@wolfgangwalther
Copy link
Member

Concerning PostgREST determining the keys / references automatically, is there an endpoint to query which embeddings are available for a specific view? This would be interesting to create a GUI for building embedded requests in our product.

Currently this is not possible. I think this idea was mentioned somewhere already, but I don't know where. I agree that this would be very valuable.

You are right, even though I was talking about the foreign key source column, the target column is indexed and thus the index is being used. An unindexed foreign key only has negative effects on DELETE of the target row.

Ah, thanks for following up on that. This gives me something to think about... :o)

@steve-chavez
Copy link
Member

steve-chavez commented May 30, 2021

Wolfgang had a great idea for using computed columns to do manual embeddings: #915 (comment)

I think that should be the way forward, it's simple and any view or even foreign data wrappers could be made embeddable.

New idea on #2277 (comment)

Related:

@steve-chavez steve-chavez self-assigned this May 16, 2022
@steve-chavez
Copy link
Member

Solved on #2144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embedding resource embedding
Development

No branches or pull requests

6 participants