Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose highest_precedence(*dtypes) #9335

Open
1 task done
NickCrews opened this issue Jun 8, 2024 · 2 comments · May be fixed by #10868
Open
1 task done

feat: expose highest_precedence(*dtypes) #9335

NickCrews opened this issue Jun 8, 2024 · 2 comments · May be fixed by #10868
Labels
feature Features or general enhancements

Comments

@NickCrews
Copy link
Contributor

NickCrews commented Jun 8, 2024

Is your feature request related to a problem?

I want to be able to unify the schemas of multiple tables.

Currently I have something like

from ibis.expr.datatypes import highest_precedence

def unify_schemas(
    schemas: Iterable[ibis.Schema | Mapping[str, Any]],
    *,
    how: Literal["error", "union", "intersection"] = "error",
    on_conflict: Literal["upcast", "error"] = "upcast",
) -> ibis.Schema:
    """Unify multiple schemas into one.

    Parameters
    ----------
    schemas
        The schemas to unify.
    how
        How to handle columns that are present in some schemas but not others.

        - "error": raise a ValueError
        - "union": keep all columns
        - "intersection": only keep columns that are in all schemas
    on_conflict
        What to do when schemas have a column with the same name, but different types.
        Options are:

        - "upcast": upcast the column to the most general type
        - "error": raise a ValueError
    """
    schemas = [ibis.schema(schema) for schema in schemas]
    column_sets = [set(schema) for schema in schemas]
    union = set().union(*column_sets)
    if how == "error":
        for schema in schemas:
            missing = union - set(schema)
            if missing:
                raise ValueError(
                    f"missing columns {missing} from schema {schema}", missing
                )
        out_columns = union
    elif how == "union":
        out_columns = union
    elif how == "intersection":
        out_columns = union.intersection(*column_sets)
    else:
        raise ValueError(f"unknown how: {how}")

    out_schema = {}
    errors = []
    for col in out_columns:
        types = {schema[col] for schema in schemas if col in schema}
        if on_conflict == "error":
            if len(types) > 1:
                errors.append((col, types))
            else:
                typ = next(iter(types))
        elif on_conflict == "upcast":
            typ = highest_precedence(types)
        else:
            raise ValueError(f"unknown on_conflict: {on_conflict}")
        out_schema[col] = typ
    if errors:
        raise ValueError(f"conflicting types: {errors}")
    return ibis.schema(out_schema)

Note that I have to do the import of highest_precedence()

What is the motivation behind your request?

No response

Describe the solution you'd like

Maybe DataType.highest_precendence(*others: DataType)? A top-level API like ibis.highest_dtype() also would be reasonable, but this seems like rare enough of a need that I don't really want to pollute the top-level namespace with it.

What version of ibis are you running?

main

What backend(s) are you using, if any?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@NickCrews NickCrews added the feature Features or general enhancements label Jun 8, 2024
@cpcloud
Copy link
Member

cpcloud commented Jun 12, 2024

Thanks for the issue!

Can clarify what you're asking for here? Is it just to "officialize" the API?

@NickCrews
Copy link
Contributor Author

yup, just include it in the docs so that we know it is a stable(ish) API. No functional changes needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: backlog
Development

Successfully merging a pull request may close this issue.

2 participants