You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to be able to unify the schemas of multiple tables.
Currently I have something like
fromibis.expr.datatypesimporthighest_precedencedefunify_schemas(
schemas: Iterable[ibis.Schema|Mapping[str, Any]],
*,
how: Literal["error", "union", "intersection"] ="error",
on_conflict: Literal["upcast", "error"] ="upcast",
) ->ibis.Schema:
"""Unify multiple schemas into one. Parameters ---------- schemas The schemas to unify. how How to handle columns that are present in some schemas but not others. - "error": raise a ValueError - "union": keep all columns - "intersection": only keep columns that are in all schemas on_conflict What to do when schemas have a column with the same name, but different types. Options are: - "upcast": upcast the column to the most general type - "error": raise a ValueError """schemas= [ibis.schema(schema) forschemainschemas]
column_sets= [set(schema) forschemainschemas]
union=set().union(*column_sets)
ifhow=="error":
forschemainschemas:
missing=union-set(schema)
ifmissing:
raiseValueError(
f"missing columns {missing} from schema {schema}", missing
)
out_columns=unionelifhow=="union":
out_columns=unionelifhow=="intersection":
out_columns=union.intersection(*column_sets)
else:
raiseValueError(f"unknown how: {how}")
out_schema= {}
errors= []
forcolinout_columns:
types= {schema[col] forschemainschemasifcolinschema}
ifon_conflict=="error":
iflen(types) >1:
errors.append((col, types))
else:
typ=next(iter(types))
elifon_conflict=="upcast":
typ=highest_precedence(types)
else:
raiseValueError(f"unknown on_conflict: {on_conflict}")
out_schema[col] =typiferrors:
raiseValueError(f"conflicting types: {errors}")
returnibis.schema(out_schema)
Note that I have to do the import of highest_precedence()
What is the motivation behind your request?
No response
Describe the solution you'd like
Maybe DataType.highest_precendence(*others: DataType)? A top-level API like ibis.highest_dtype() also would be reasonable, but this seems like rare enough of a need that I don't really want to pollute the top-level namespace with it.
What version of ibis are you running?
main
What backend(s) are you using, if any?
No response
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem?
I want to be able to unify the schemas of multiple tables.
Currently I have something like
Note that I have to do the import of
highest_precedence()
What is the motivation behind your request?
No response
Describe the solution you'd like
Maybe
DataType.highest_precendence(*others: DataType)
? A top-level API likeibis.highest_dtype()
also would be reasonable, but this seems like rare enough of a need that I don't really want to pollute the top-level namespace with it.What version of ibis are you running?
main
What backend(s) are you using, if any?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: