Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add FieldsNotFoundError #10412

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Nov 1, 2024

I have been getting sick of typing some_table_or_struct.field_that_doesnt_exist_or_has_a_small_typo and then getting a useless error message. It also is annoying to do some_table.select(doesnt_exist, also_doesnt_exist), and you only get an error for the very first one. It would be better if you got back info on ALL the values that failed to get resolved.

This PR makes that UX much better.

Still need to add tests, but I wanted to get this up here for some initial thoughts before I invested more time. Is this something we want to pursue?

@NickCrews NickCrews force-pushed the field-not-found-error branch from 2fae0d4 to 8be679f Compare November 1, 2024 19:21
@@ -205,7 +205,7 @@ def __getitem__(self, name: str) -> ir.Value:
KeyError: 'foo_bar'
"""
if name not in self.names:
raise KeyError(name)
raise FieldNotFoundError(self, name, self.names)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to be breaking here and not raise a KeyError anymore?

Copy link
Contributor Author

@NickCrews NickCrews Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyError seems semantically a little wrong. I think KeyError should be for collections with a dynamic set of keys, such as a vanilla python dict. But structs have a static set of keys, so FieldNotFoundError, as a subclass of AttributeError, seems better to me.

@NickCrews NickCrews force-pushed the field-not-found-error branch from 8be679f to c5f58d5 Compare November 1, 2024 19:51
@NickCrews
Copy link
Contributor Author

@cpcloud does this look like an idea worth considering? Any behavior requirements that you think I would need to make an implementation satisfy?

@NickCrews NickCrews marked this pull request as ready for review December 5, 2024 00:52
@NickCrews
Copy link
Contributor Author

@cpcloud should I invest more time in this or do you think this whole idea is not the right direction?

@cpcloud
Copy link
Member

cpcloud commented Jan 24, 2025

I'm not opposed to this but I think the behavior should be different for attribute versus getitem syntax.

It all comes down to what we believe the user's intent is, to the best of our knowledge.

For attribute misspellings, I think we should leave them as plain old AttributeError and do nothing fancy. We have no idea whether the user meant a method or a column, and we should really refuse the temptation to guess.

For [] syntax, the only thing that could possibly be is a field access (in both the column and the struct field case), so upgrading the UX with FieldNotFound seems like a nice improvement.

@NickCrews
Copy link
Contributor Author

That sounds like good behavior to me to return an AttributeError for .dot accesses. I think that makes sense to "refuse the temptation to guess" and raise an AttributeError and not a FieldNotFoundError, because that is better to be caught programmatically. But, what about being a little bit fancy/helpful, and returning a TableAttributeError, which is a subclass of AttributeError, which has the helpful "did you mean this column?" error message? Then this is the best of both worlds?

There is one bit of nuance on attribute access though. If I do t.bogus, I agree that should just be the above behavior, it is impossible to guess if the user meant a method or a column, so just return an AttributeError (or subclass of it as I suggest above). However, if I do t.select(_.bogus), then this is unambiguous that they wanted a Column, and so returning a FieldNotFoundError seems better.

In pseudocode:

# Raised on t.bogus
# Includes helpful suggestion "did you mean <best match among all columns AND attributes/methods>"
class TableAttributeError(AttributeError, IbisError): ...

# Raised on my_struct.bogus
# Includes helpful suggestion "did you mean <best match among all fields AND attributes/methods>"
# Maybe should get consolidated with TableAttributeError
class StructAttributeError(AttributeError, IbisError): ...

# Raised on table_or_struct["bogus", "bogus2"] and table_or_struct.select(_.bogus, _.bogus2) and my_struct["bogus"]
# Includes helpful suggestion "did you mean <best match among ONLY columns/fields>"
# Note that this contains ALL not found fields.
# I think that would be simpler to just have one error, and not one for single FieldNotFound.
# Should this inherit from AttributeError and/or KeyError?
# I think AttributeError since 1. these are statically known and 2. then a user can `catch AttributeError`
# and have all of these errors caught.
class FieldsNotFoundError(AttributeError, IbisError): ...

What do you think of this spec?

@NickCrews
Copy link
Contributor Author

and do nothing fancy
we should really refuse the temptation to guess

So I understand your goals, is your reasoning here so that 1. our maintenance is easier or 2. so we don't accidentally do the wrong thing for the user?

@cpcloud
Copy link
Member

cpcloud commented Feb 12, 2025

I think that spec is too complex for now. Let's start with plain AttributeError and only beef up the getitem errors.

@NickCrews
Copy link
Contributor Author

Sounds good, thanks, I'll try to update this implementation and let's see what we think about it!

@NickCrews NickCrews force-pushed the field-not-found-error branch from c5f58d5 to 66273f4 Compare February 20, 2025 03:59
@github-actions github-actions bot added the tests Issues or PRs related to tests label Feb 20, 2025
@NickCrews NickCrews force-pushed the field-not-found-error branch 5 times, most recently from 956a6a1 to 6b0d0db Compare February 20, 2025 05:29
@NickCrews NickCrews changed the title feat(api): add FieldNotFoundError feat(api): add FieldsNotFoundError Feb 20, 2025
@NickCrews
Copy link
Contributor Author

NickCrews commented Feb 20, 2025

Not sure what is causing the postgres failure but I think it's unrelated.

Details
________________ test_insert_with_database_specified[postgres] _________________
[gw3] linux -- Python 3.10.16 /home/runner/work/ibis/ibis/.venv/bin/python

con_create_database = <ibis.backends.postgres.Backend object at 0x7f1e81750f70>

    @pytest.mark.notyet(
        ["flink"],
        reason="unclear whether Flink supports cross catalog/database inserts",
        raises=Py4JJavaError,
    )
    def test_insert_with_database_specified(con_create_database):
        con = con_create_database
    
        t = ibis.memtable({"a": [1, 2, 3]})
    
        with create_and_destroy_db(con) as dbname:
            con.create_table(
                table_name := gen_name("table"),
                obj=t,
                database=dbname,
                temp=con.name == "flink",
            )
            try:
>               con.insert(table_name, obj=t, database=dbname)

ibis/backends/tests/test_client.py:1412: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ibis/backends/sql/__init__.py:464: in insert
    query = self._build_insert_from_table(
ibis/backends/sql/__init__.py:479: in _build_insert_from_table
    target_cols = self.get_schema(target, catalog=catalog, database=db).keys()
ibis/backends/postgres/__init__.py:521: in get_schema
    with self._safe_raw_sql(type_info) as cur:
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/contextlib.py:135: in __enter__
    return next(self.gen)
ibis/backends/postgres/__init__.py:710: in _safe_raw_sql
    with contextlib.closing(self.raw_sql(*args, **kwargs)) as result:
ibis/backends/postgres/__init__.py:719: in raw_sql
    query = query.sql(dialect=self.dialect)
.venv/lib/python3.10/site-packages/sqlglot/expressions.py:612: in sql
    from sqlglot.dialects import Dialect
<frozen importlib._bootstrap>:1075: in _handle_fromlist
    ???
.venv/lib/python3.10/site-packages/sqlglot/dialects/__init__.py:111: in __getattr__
    module = importlib.import_module(f"sqlglot.dialects.{module_name}")
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1050: in _gcd_import
    ???
<frozen importlib._bootstrap>:1024: in _find_and_load
    ???
<frozen importlib._bootstrap>:171: in __enter__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = _ModuleLock('sqlglot.dialects.dialect') at 13[97](https://github.com/ibis-project/ibis/actions/runs/13428575878/job/37520654230?pr=10412#step:12:98)68970723824

>   ???
E   KeyError: 139769782639488

<frozen importlib._bootstrap>:123: KeyError

def __init__(
self,
container: object,
names: str | Iterable[str],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally sold on the names of these arguments/attributes.

"""
if name not in self.names:
raise KeyError(name)
raise FieldsNotFoundError(self, name, self.names)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't have a test for Struct.__getitem before, but if we had, then we would see that this change is breaking. Before Struct["foo_bar"] raised a KeyError, now it raises a FieldsNotFoundError. I think something that derives from AttributeError is more semantically correct: we know what fields are present at compile time, whereas I think of KeyError I think of for when you have a runtime-missing key. IDK if this is standard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: now FieldsNotFoundError derives ONLY from IbisError, and neither KeyError nor AttributeError.

"bogus",
("bogus", False),
ibis.desc("bogus"),
_.bogus,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FieldsNotFoundError is a subclass of AttributeError, so this is only breaking for the test cases that looked for IbisTypeError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: now that I made FieldsNotFound NOT a subclass of Attribute error, this is now breaking for all of these test cases.

@NickCrews
Copy link
Contributor Author

As you can see in the updated tests, this change is potentially breaking for users who are trying to catch IbisTypeError. But, it does unify these two errors so they only need to catch a FieldsNotFoundError, so that is a plus. I think this breakage is OK. We could make FieldsNotFoundError subclass from BOTH AttributeError and IbisTypeError? IDK, I could see how table.bogus is a sort of TypeError.

@NickCrews NickCrews added the breaking change Changes that introduce an API break at any level label Feb 20, 2025
@@ -45,6 +48,37 @@ class RelationError(ExpressionError):
"""RelationError."""


class FieldsNotFoundError(AttributeError, IbisError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you decide to subclass AttributeError here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was because it originally was a FieldNotFoundError (singular), so it made more sense in the sense of "you tried to lookup some statically known thing on this object that doesn't exist". But now that it stores multiple missing fields, the heirarchy doesn't make sense, so I removed AttributeError as a base class, now it is just inherits from IbisError.

return self.limit(limit, offset=offset)
except com.FieldsNotFoundError:
# the raised FieldsNotFoundError contains the Op, but we want
# to only expose the Expr.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid storing unused state on exceptions. This can have unintended consequences for object lifetimes due to exceptions keeping stack frames around for tracebacks.

Please remove any unused fields from your new exception types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, I'd never thought about how that might effect lifetimes, that's really interesting. Sounds good, I am removing them. I originally had them because in an earlier version I did actually need to keep track of them, but now I don't need that, so I'll stop storing them.

I have been getting sick of typing some_table_or_struct.field_that_doesnt_exist_or_has_a_small_typo and then getting a useless error message. This PR makes that UX much better.
@NickCrews NickCrews force-pushed the field-not-found-error branch from 6b0d0db to 434fe52 Compare February 22, 2025 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes that introduce an API break at any level tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants