Releases · pola-rs/polars

16 Jun 12:30

github-actions

py-0.18.3

b98dd79

Python Polars 0.18.3

🚀 Performance improvements

use row format in streaming join ~15% (#9379)
row encode buffer reuse (#9371)
bytes row format for streaming groupby/unique keys >3.5x (#9346)
push slices down map functions (#9350)

✨ Enhancements

support all numeric dtypes in serde (#9393)
allow easy load/save of polars Config options to/from file (#9391)
ensure part of the plan is streaming if aggregati… (#9387)
add relaxed concatenation (#9382)
add sql DROP TABLE (#9355)
support ternary expressions in streaming (#9343)
add SQL support for null-aware equality checks (#9332)
add SQL support for regular expression operators (~, !~, ~*, and !~*) (#9327)
support // integer floordiv operator in the SQL engine (#9324)

🐞 Bug fixes

fix bug when comparing series (#9359)
list zip with (#9367)
parquet + categorical (#9363)
respect startby in groupby_dynamic when every is greater than 1d (#9362)
raise groupby apply on empty frame (#9360)
raise more informative error on string arguments (#9352)
Allow for tolerance when comparing nested dtype columns (#9272)
avoid is_in TypeError with sets of values containing 'None' (#9323)

🛠️ Other improvements

add top-k test for #9385 (#9388)
document apply 'return_dtype' requirement (#9361)
clarify when day of week takes effect in groupby_dynamic (#9342)
add "if you're coming from pandas" tip to groupby_dynamic (#9336)
fix string language formatting (#9341)
add doc entries for eq_missing and ne_missing expressions (#9331)
fixup options for validate arg in join (#9319)

Thank you to all our contributors for making this release possible!
@0xbe7a, @AnatolyBuga, @MarcoGorelli, @alexander-beedie, @dkrako, @durandtibo, @ritchie46 and @universalmind303

Contributors

alexander-beedie, ritchie46, and 6 other contributors

Assets 2

09 Jun 08:24

ritchie46

py-0.18.2

f7f6753

Python Polars 0.18.2

🚀 Performance improvements

increase streaming groupby spill size from 256 to 10_000 (#9312)
perf(rust, python) Improve rolling min and max for nonulls (#9277)

✨ Enhancements

allow use of StringCache object as a function decorator (#9309)
allow use of Config object as a function decorator (#9307)
serde for 'to_physical' expr (#9294)

🐞 Bug fixes

fix rolling weighted mean (#9292)
fix overly-broad string matching in selectors (#9303)
fix when loading model data from upcoming pydantic 2.x release (#9296)

🛠️ Other improvements

fix extraneous indent in examples block (#9297)
Fix typo in Selectors documentation (#9295)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @magarick, @ritchie46, @stinodego and @thomascamminady

Contributors

magarick, alexander-beedie, and 3 other contributors

Assets 2

07 Jun 18:08

github-actions

py-0.18.1

c2505e8

Python Polars 0.18.1

🏆 Highlights

add dedicated selectors module, consolidating/expanding existing selector capabilities (#9204)

🚀 Performance improvements

slightly improve n_unique performance (#9286)
use ciborium in Expression pickling (#9235)

✨ Enhancements

add join cardinality validation (#9278)
implement set operations for selector API (#9276)
keep sorted flag after Expr::truncate (#9275)
add "sql_expr" function (#9248)
rewrite correlation functions to expression architecture (#9258)
keep sorted flag on offset_by (#9253)
add expression json serde (#9236)
add intersection primitive for selector API (#9240)
building blocks for expression expansion sets (#9231)
Add ddof option to rolling_var and rolling_std (#8957)
immediately flatten nested unions (#9220)
Allow empty select/with_columns/groupby (#9205)
add a datetime selector (#9212)
support float expression on integers (#9210)
add dedicated selectors module, consolidating/expanding existing selector capabilities (#9204)
add binary to list<u8> cast (#9161)
groupby_dynamic by quarter. (#6842)
add arr.unique expression (#9159)
implement explode for DataType::Array (#9157)
Decimal type: sum, min, max aggregations in select and agg context. (#9135)
Decimal arithmetic (#9123)
support decimals as cast types in csv parser (#9121)
Improve error handling for repeat (#9117)

🐞 Bug fixes

fix pyarrow dataset literal filter (#9274)
raise on invalid sort_by (#9262)
match missing Array and Struct classes in FromPyObject (#9271)
correct ne/e_missing schema (#9257)
fix cached reproject offsets (#9254)
delay opening files in streaming engine (#9251)
ensure agg(F(lit)) == lit (#9222)
don't SO on concat(expressions) (#9214)
df.apply first rechunk (#9211)
clip window_size to length in rolling_apply (#9209)
raise error on invalid df.apply return (#9207)
Handle edge cases of named select input (#9198)
rolling_apply window_size == len (#9181)
respect time zone in strptime/to_datetime when exact=False (#9171)
make null chunking behavior equal to other dtypes (#9176)
return single numpy array in Array dtype -> numpy (#9164)
fix regression in boolean nulls comparison (#9142)
fix struct null_count if fields are null arrays (#9151)
Fix DataFrame.to_arrow() for 0x0 dataframes (#9144)
categorical construction from null values (#9145)
let apply caller determine if length needs to be checked. (#9140)
struct is_in should upcast numeric types (#9110)
Restore functionality of name arg for date_range (#9107)
bubble up dtype when converting from arrow (#9120)

🛠️ Other improvements

Fix grammar and add periods in Expr.over docs (#9244)
Update linting for py-polars crate (#9242)
Deprecate exprs=... input for select/with_columns/agg/struct (#9219)
Enable parallelization in Python Windows tests (#9232)
Use pytest tmp_path (#9206)
Build docs in parallel (#9229)
Unify Python docs workflows (#9228)
add docstring to __array__ methods (#8055)
Update expr parsing util to return PyExpr (#9166)
update pyo3 requirement from 0.18 to 0.19 (#9155)
clarify how the windows are formed in the rolling_* functions (#9192)
stabilise polars importtime check (#9196)
fix "to_decimal" docstring (#9197)
note that exact=False is a performance footgun (#9186)
change decimal inference and argument order (#9133)
Cache Rust build on main branch (#9130)
Improve df.clear() docs (#8809)
Bump maturin to 1.0.1 (#9115)
Bump lint dependency versions (#9116)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @MarcoGorelli, @alexander-beedie, @ankane, @avimallu, @bfeif, @dependabot, @dependabot[bot], @jonashaag, @josh, @lorentzenchr, @magarick, @ritchie46, @stinodego, @universalmind303 and @zundertj

Contributors

josh, jonashaag, and 13 other contributors

Assets 2

29 May 20:01

github-actions

py-0.18.0

b665064

Python Polars 0.18.0

🏆 Highlights

Rename list namespace accesor from .arr to .list (#8999)

⚠️ Breaking changes

propagate null in equality comparisons (#9053)
formalize implode -> explode relation (#9038)
Drop subclassing support for DataFrame/LazyFrame (#9008)
consistently return list of date/datetime from lazy date_range (#8513)
Default date_range/ones/zeros to eager=False (#9007)
Rename list namespace accesor from .arr to .list (#8999)
disallow time zones other than those in zoneinfo.available_timezones() (#8993)
remove window expression magic (#8992)
raise error when sorted flag not set (#8994)
Drop subclassing support for GroupBy (#7746)
in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
Remove deprecated tz_aware argument (#8696)

🚀 Performance improvements

speed up write_csv for time-zone-aware columns (#9093)
parallelize rolling_window group materialization (#9095)
elide hot loop in hash joins (#9075)

✨ Enhancements

conversion from Utf8 to Decimal. (#9090)
default to checking sortedness in groupby_rolling… (#9063)
propagate null in equality comparisons (#9053)
warn if constructing Series with time-zone-aware datetimes (#9058)
implement apply for rolling/dynamic_groupby (#9049)
Support more data types in lazy repeat (#9046)
implement strategy=nearest for join_asof (#9024)
arr.sum expression (#9041)
formalize implode -> explode relation (#9038)
add array namespace and min/max expression (#9032)
improve error message on row-wise overflow (#9021)
properly apply slice at UNION level (#9018)
consistently return list of date/datetime from lazy date_range (#8513)
Default date_range/ones/zeros to eager=False (#9007)
disallow time zones other than those in zoneinfo.available_timezones() (#8993)
raise error when sorted flag not set (#8994)
in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)

🐞 Bug fixes

rolling_groupy was returning incorrect results when offset was positive (#9082)
don't underflow on list.tail (#9089)
fix null/empty in List::take_unchecked (#9074)
repeat by (#9023)
raise in to_datetime/strptime if format contains hour but not minute directive (#9044)
Order of pl.Array arguments in docstring (#9059)
propagate nulls in broadcasting of order comparisons (#9050)
Improve read_parquet missing column error message (#8961)
fix apply with passed date/datetime return_dtype (#9035)
respect inner type in Array construction (#9020)
raise error on invalid aggregation (#9013)
fix fused arithmetic in window functions (#9012)
don't allow silent init of Series declared as int/temporal with floating point values (#9004)
deprecate time_unit property from Series (#8990)

🛠️ Other improvements

Improve expression parsing utils (#9094)
Refactor expression input parsing util (#9085)
Organize "as_datatype" functions (#9080)
Change eager path for repeat (#9048)
Clean up arange/date_range/time_range (#9027)
Drop subclassing support for DataFrame/LazyFrame (#9008)
minor SQLContext docstring cleanups (#9005)
Rename list namespace accesor from .arr to .list (#8999)
remove window expression magic (#8992)
Drop subclassing support for GroupBy (#7746)
refactor!(python): Remove old deprecated functionality (#8995)
Remove deprecated tz_aware argument (#8696)

Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @charliegallop, @jonashaag, @mcrumiller, @raymead, @ritchie46, @sorhawell, @stinodego, @tim-habitat and @universalmind303

Contributors

jonashaag, mcrumiller, and 10 other contributors

Assets 2

29 May 20:02

github-actions

rs-0.30.0

ee2366b

Rust Polars 0.30.0

🏆 Highlights

Rename list namespace accesor from .arr to .list (#8999)
Array (backed by arrow::FixedSizeList datatype (#8943)

⚠️ Breaking changes

propagate null in equality comparisons (#9053)
formalize implode -> explode relation (#9038)
consistently return list of date/datetime from lazy date_range (#8513)
Rename list namespace accesor from .arr to .list (#8999)
disallow time zones other than those in zoneinfo.available_timezones() (#8993)
remove window expression magic (#8992)
raise error when sorted flag not set (#8994)
in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
Remove deprecated tz_aware argument (#8696)

🚀 Performance improvements

speed up write_csv for time-zone-aware columns (#9093)
parallelize rolling_window group materialization (#9095)
elide hot loop in hash joins (#9075)
improve list explode perf (#8974)
Improve explodes: offsets_to_indexes performance (#8964)
avoid quadratic exclude behaviour when selecting against dtypes and/or wildcards (#8953)
use simd-json for all json parsing (#8922)
improve json_extract (#8858)
add optimizer passes and change initial order (#8811)
fused multiply sub / sub multiply (#8799)
improve parallel work distribution of sort expression ~4x (#8775)
change default row-group size (#8758)

✨ Enhancements

conversion from Utf8 to Decimal. (#9090)
default to checking sortedness in groupby_rolling… (#9063)
propagate null in equality comparisons (#9053)
implement apply for rolling/dynamic_groupby (#9049)
implement strategy=nearest for join_asof (#9024)
arr.sum expression (#9041)
formalize implode -> explode relation (#9038)
add array namespace and min/max expression (#9032)
improve error message on row-wise overflow (#9021)
properly apply slice at UNION level (#9018)
consistently return list of date/datetime from lazy date_range (#8513)
disallow time zones other than those in zoneinfo.available_timezones() (#8993)
raise error when sorted flag not set (#8994)
in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
error on invalid sortby expr (#8986)
Pushdown is_in to pyarrow dataset (#8930)
Array (backed by arrow::FixedSizeList datatype (#8943)
multiple enhancements for SQLContext (#8944)
add sql UNION, UNION ALL & UNION DISTINCT (#8936)
add sql compound identifiers (#8934)
add sql EXCLUDE (#8913)
add sql CASE (#8911)
add sql EXPLAIN (#8897)
improve json_extract (#8858)
add support for sql DISTINCT ON (#8824)
add LazyFrame null_count (#8837)
check categorical cache on transpose (#8836)
add support for OFFSET keyword in SQL queries (#8833)
add a new time_range utility function (#8776)
Add hint to use _saturating on overflow (#8805)
support boolean addition (#8778)
improved detail in several error messages (#8747)

🐞 Bug fixes

rolling_groupy was returning incorrect results when offset was positive (#9082)
fix null/empty in List::take_unchecked (#9074)
repeat by (#9023)
raise in to_datetime/strptime if format contains hour but not minute directive (#9044)
propagate nulls in broadcasting of order comparisons (#9050)
fix apply with passed date/datetime return_dtype (#9035)
raise error on invalid aggregation (#9013)
fix fused arithmetic in window functions (#9012)
JoinBuilder::force_parallel is modifying allow_parallel (#8617)
Fix erroneous warning in hist (#8982)
respect rechunk in parquet (#8935)
Simplify offsets_to_indexes, fix empty offsets edge cases (#8920)
sql qualified wildcards (#8916)
don't check sortedness in asof by (#8906)
check for object type in csv writer (#8894)
window function with filtered groups (#8880)
parse offset-aware strings as UTC in read_csv when try_parse_dates=True (#8864)
free buffer, but not its contents (#8848)
improve agg expr field types (#8834)
sql BETWEEN bounds should be inclusive (#8818)
sort cached window groups (#8813)
check null data before take (#8812)
fix broadcasting on integer bitwise (#8798)
correct aggregation of overlapping groups (#8794)
modify join error (#8768)
don't parallelize sort within rayon job (#8774)
fix deadlock in cache and improve parallelism/work… (#8765)
check offset before doing owned mutation (#8760)
validate data on successful deserialization (#8757)
improve supertype coercion of functions (#8755)

🛠️ Other improvements

use concrete type for time zones (#9076)
factor add_month out of add_impl_month_week_or_day (#9066)
remove unnecessary timezone trait usage, use concrete type (#9065)
Fix broken links (#9072)
bump sqlparser version (#9043)
move list namespace functions to seperate module (#9040)
Clean up arange/date_range/time_range (#9027)
Rename list namespace accesor from .arr to .list (#8999)
replace pattern match with unwrap (#9000)
remove window expression magic (#8992)
Remove deprecated tz_aware argument (#8696)
simplify take_every (#8971)
add readmes to all sub crates (#8770)
refactor(rust); improve arithmetic reuse and don't allocate on binary… (#8781)
accumulate windows flag during translation (#8773)

Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @cbowdon, @charliegallop, @chitralverma, @jonashaag, @kpberry, @mcrumiller, @petar-savov, @raymead, @ritchie46, @sorhawell, @stinodego, @tim-habitat, @uchiiii and @universalmind303

Contributors

jonashaag, cbowdon, and 16 other contributors

Assets 2

23 May 08:16

github-actions

py-0.17.15

b30a1f3

Python Polars 0.17.15

🏆 Highlights

Array (backed by arrow::FixedSizeList datatype (#8943)
Write dataframes as delta tables (#7616)

🚀 Performance improvements

improve list explode perf (#8974)
Improve explodes: offsets_to_indexes performance (#8964)
avoid quadratic exclude behaviour when selecting against dtypes and/or wildcards (#8953)
use simd-json for all json parsing (#8922)
improve performance of align_frames, and add new alignment option (#8899)

✨ Enhancements

error on invalid sortby expr (#8986)
Pushdown is_in to pyarrow dataset (#8930)
allow set column list input to 'drop' and 'drop_nulls' (#8962)
Array (backed by arrow::FixedSizeList datatype (#8943)
Add dtype argument for repeat (#8946)
Use schema keys to define the columns if only the schema is provided to pl.struct (#8952)
multiple enhancements for SQLContext (#8944)
add sql UNION, UNION ALL & UNION DISTINCT (#8936)
add sql compound identifiers (#8934)
add sql EXCLUDE (#8913)
add sql CASE (#8911)
add sql EXPLAIN (#8897)
Write dataframes as delta tables (#7616)
improve performance of align_frames, and add new alignment option (#8899)
improved inference from type annotations (#8895)

🐞 Bug fixes

Fix erroneous warning in hist (#8982)
don't modify Series with empty names in-place on DataFrame init (#8956)
respect rechunk in parquet (#8935)
Add hint on PyArrow to ADBC import error (#8898)
Simplify offsets_to_indexes, fix empty offsets edge cases (#8920)
sql qualified wildcards (#8916)
address edge cases with in-place modification of Series objects (#8915)
don't check sortedness in asof by (#8906)
check for object type in csv writer (#8894)
improve performance of align_frames, and add new alignment option (#8899)
window function with filtered groups (#8880)

🛠️ Other improvements

deprecate rename "in_place" parameter (#8960)
Clean up tests for repeat (#8979)
Deprecate name argument for repeat (#8977)
simplify take_every (#8971)
Clean up repeat/ones/zeros (#8963)
further enhance SQLContext docstrings (#8948)
docs(python) Fix typo in lazygroupby.rs error message (#8937)
fix docstring for time() (#8939)
refactor tzinfo-related tests (#8883)

Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @cbowdon, @chitralverma, @jonashaag, @kpberry, @mcrumiller, @petar-savov, @ritchie46, @stinodego and @universalmind303

Contributors

jonashaag, cbowdon, and 11 other contributors

Assets 2

16 May 18:02

github-actions

py-0.17.14

feb368b

Python Polars 0.17.14

🚀 Performance improvements

optimise align_frames and properly handle the case where the alignment key has duplicate values (#8825)

✨ Enhancements

add an align option to pl.concat (#8835)
add support for sql DISTINCT ON (#8824)
add LazyFrame null_count (#8837)
check categorical cache on transpose (#8836)
add support for OFFSET keyword in SQL queries (#8833)
optimise align_frames and properly handle the case where the alignment key has duplicate values (#8825)

🐞 Bug fixes

parse offset-aware strings as UTC in read_csv when try_parse_dates=True (#8864)
handle InitVar typing declarations on dataclass objects (#8856)
free buffer, but not its contents (#8848)
improve agg expr field types (#8834)
optimise align_frames and properly handle the case where the alignment key has duplicate values (#8825)
sql BETWEEN bounds should be inclusive (#8818)

🛠️ Other improvements

add examples for Config "set_tbl_formatting" and "set_fmt_str_lengths" methods (#8859)
Convert between Vec of Series/Pyseries using trait (#8846)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46, @stinodego and @universalmind303

Contributors

alexander-beedie, ritchie46, and 3 other contributors

Assets 2

12 May 12:39

github-actions

py-0.17.13

0ec1736

Python Polars 0.17.13

🚀 Performance improvements

add optimizer passes and change initial order (#8811)
fused multiply sub / sub multiply (#8799)
improve parallel work distribution of sort expression ~4x (#8775)
change default row-group size (#8758)
elide function calls in AnyValue::eq (#8725)

✨ Enhancements

add a new time_range utility function (#8776)
Add hint to use _saturating on overflow (#8805)
add a "restore_defaults" kwarg to Config init (#8797)
add lazy time expression (#8785)
support boolean addition (#8778)
support SQLContext registration of DataFrames (#8762)
support automatic SQLContext frame/table registration from local variables (#8749)
improved detail in several error messages (#8747)
support frame registration at SQLContext init time, and add an "unregister" method (#8744)
support repeat for all types (#8741)
add support for DISTINCT keyword in SQL select clauses (#8740)
support any day of the week in 'start_by' in groupby_dynamic (#8720)
add support for USING clause in SQL join operations (#8731)
add unit tests for extend_constant Expr (#8734)
add clean multi-frame registration to SQLContext (#8724)
add support for HAVING clause to SQL GROUP BY operations (#8704)
improved numpy string interop (#8703)

🐞 Bug fixes

sort cached window groups (#8813)
check null data before take (#8812)
fix broadcasting on integer bitwise (#8798)
Fix incorrect type hint for arange (#8796)
correct aggregation of overlapping groups (#8794)
don't parallelize sort within rayon job (#8774)
fix deadlock in cache and improve parallelism/work… (#8765)
check offset before doing owned mutation (#8760)
don't persist temporary column in disjoint calls to update (#8763)
validate data on successful deserialization (#8757)
improve supertype coercion of functions (#8755)
groupby_dynamic was unnecessarily failing on ambiguous local datetime (#8737)
ensure count aggregation has proper length when spilling (#8735)
fix return value of std for single-element sequence with ddof=1 (#8730)
don't take logical plan during streaming fmt (#8711)
Don't upcast in round() for f32 when decimal is 0 (#8706)

🛠️ Other improvements

add entry for lazy time func (#8786)
add unit tests for extend_constant Expr (#8734)
add rounding coverage for 32/64 bit floats (#8715)
Add warning to count methods on null (#8698)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @MarcoGorelli, @alexander-beedie, @mcrumiller, @ritchie46, @stinodego, @uchiiii, @universalmind303 and @zundertj

Contributors

mcrumiller, alexander-beedie, and 7 other contributors

Assets 2

08 May 15:04

github-actions

rs-0.29.0

0ad9645

Rust Polars 0.29.0

🏆 Highlights

Out-of-core unique (#8573)

⚠️ Breaking changes

Rename concat_lst to concat_list (#8597)
Schema improvements (#8286)
don't create duplicate pivot names (#8002)
rename toggle_string_cache to enable_string_cache (#7970)
change top_k(descending) -> bottom_k (#7969)
in sort, top_k, sort_by, and arg_sort_by, raise if descending is a sequence and its length doesn't match the number of columns to sort by (#7957)

🚀 Performance improvements

elide function calls in AnyValue::eq (#8725)
add fused multiply add optimization for expressions (#8690)
use expression for dot product (#8686)
improve nested grouptuples related code (#8618)
buffer spill partitions in ooc sort. ~10/20% (#8616)
improve OOC sort performance during partition phase (#8590)
remove some unnecessary calls and matches (#8490)
less naive count (#8473)
parallelize almost all flattens (#8468)
optimize horizontal min/max (#8463)
reinstate old behavior in numeric group-tuples (#8445)
remove false sharing in perfect hash table >2x (#8432)
further optimised conversions to python date/datetime (#8417)
optimize join inner materialization of single keys (#8405)
parallelize sorted group tuple materialization (#8387)
improve materialization of huge cardinality group tuples (#8382)
improve group_tuples materialization (#8375)
use online variance kernel for aggregation (#8306)
add specialized boolean aggregation for min/max (#8294)
fail fast on non-inferable strings in strptime if no fmt is provided (#8111)
make chunks search more resilient (#8229)
SIMD accelerated arg_min/arg_max (via argminmax) (#8074)
speed up csv parsing for slower datetimes formats (#8213)
arr.eval run on groupby expression engine when possible (#8199)
FromParalleIter<Option<str>> for Utf8Chunked ~1.9x (#8058)
speed up from_par_iter Option<bool> ~2.5x (#8057)
parallelize numeric ChunkedArray materialization ~2x. (#8053)
parallelize into_groups materialization ~-25% (#8036)
use a trusted anyvalue builder (#8001)
numeric grouptuples with nulls hash in single pass ~25% (#7980)
use perfect hash table for categoricals (#7951)
improve group_tuples of high cardinality data ~10% (#7938)
use streaming instead of partitioned groupby (#7907)
don't auto-stream groupby (#7906)
rechunk before aggs (#7903)
don't re-allocate groups in sorted to_dummies (#7897)

✨ Enhancements

add support for DISTINCT keyword in SQL select clauses (#8740)
support any day of the week in 'start_by' in groupby_dynamic (#8720)
add support for USING clause in SQL join operations (#8731)
add support for HAVING clause to SQL GROUP BY operations (#8704)
streaming unions (#8676)
expression cache (#8674)
rolling covariance and correlation (#8671)
Add dt.to_string alias for dt.strftime (#8290)
use temp dir for ooc spills (#8614)
make ooc-sort resilient against chunk_size (#8588)
Set strptime default strict/exact=true (#8587)
Out-of-core unique (#8573)
Add to_date, to_datetime, to_time to String namespace (#8579)
more detailed error message on failure to cast List dtype (#8583)
don't trigger unreachable code if no dtype is set (#8532)
accept expressions in groupby_dynamic/rolling (#8528)
expose quantile/mean for duration (#8491)
require explicitly sorted flag for upsample (#8488)
allow for _saturating suffix in duration strings (#8479)
let duration string accept "1mo_saturating" (#8469)
add dt.month_start and dt.month_end (#8435)
add SQL support for cumulative functions (#8457)
add str_slice method to StringNameSpace (#8427)
allow negative 'arange' expression (#8413)
warn if argument is not explicitly sorted (#8409)
Schema improvements (#8286)
add support for SQL "IN" expr (#8396)
cli output mode & sql read_json (#8336)
rename 'csv-file' to 'csv' (#8101)
preserve time zone in combine (#8263)
add use_earliest argument to replace_time_zone for dealing with ambiguous datetimes (#8087)
SQL CTE's (#8208)
add duration cumsum and remainder (#8219)
better algorithm for streaming unique (#8003)
Add approx distinct count via approx_unique() (#7937)
adopt FunctionExpr for cat namespace (#8173)
DatetimeArgs ergonomics (#8133)
Remove Seek constraint from IpcStreamReader and SerReader (#8166)
implement FunctionExpr for bound and round methods (#8172)
display skipped row if same number of rows (#8170)
move all boolean expressions into BooleanFunction enum (#8132)
rewrite log expressions to make them serializable (#8126)
make unique expr serde and cmp (#8153)
adopt FunctionExpr for abs to allow for serialization (#8129)
adopt FunctionExpr for cum* functions (#8130)
support negative index in pct_change (#8137)
add log1p to list of mathematical functions (#8102)
expand list of tz-aware formats which can be auto-inferred (#8085)
clearer error message if strptime without a fmt specified fails (#8086)
infer tz-aware formats with try_parse_dates in read_csv (#8084)
feat(python, rust)! make 'mo' interval raise if the target date does not exist (#8078)
auto-infer fmt for tz-aware date strings (#7405)
multiple sql contexts & optional sql highlighting in cli (#8072)
implement arg_sort for struct dtype (#8051)
support struct in df.unique (#7976)
change top_k(descending) -> bottom_k (#7969)
optimize away nested unions in lp (#7861)
Add seed argument to rank for random (#7913)
auto-infer detecting time-zone-awareness of fmt argument in strptime; deprecate tz_aware argument (#7886)
deal with null values in cut/qcut (#7878)
support datetime/date subclasses (e.g. FreezeGun) (#7819)

🐞 Bug fixes

groupby_dynamic was unnecessarily failing on ambiguous local datetime (#8737)
ensure count aggregation has proper length when spilling (#8735)
fix return value of std for single-element sequence with ddof=1 (#8730)
don't take logical plan during streaming fmt (#8711)
Don't upcast in round() for f32 when decimal is 0 (#8706)
block predicate containing shifts and windows after sort (#8670)
ensure perfect hash table processes the nulls (#8668)
Reading more tiny CSVs than workers in parallel will deadlock (#8441)
respect maintain_order in partitioned groupby (#8653)
fix explode null series (#8654)
fix categorical agg type (#8645)
allow list<null> -> list<cat> (#8636)
maintain sorted info on top-k and empty sort (#8615)
maintain sortedness in date -> datetime cast (#8606)
fix determining of supertype for tz-aware and tz-naive datetimes (#8585)
fix csv reader with new line in header (#8580)
correct for nested offsets in json serialization (#8584)
fix wrong dtype init in streaming groupby (#8574)
fix categorical/string_cache fill_null panic (#8562)
fix window function contention in binary expression (#8544)
fix StructChunked not_equal comparator/operator (#8547)
fix struct pyarrow ffi (#8543)
don't trigger unreachable code if no dtype is set (#8532)
keep sorted info on agg_first and simple singleton… (#8526)
unset fast_unique coming from arrow (#8521)
correct sign-reversed scale on DecimalChunked to Python Decimal conversion (fixes #8423) (#8508)
don't error on cast if column is not projected (#8495)
ensure window function succeeds on empty frame (#8492)
don't set verbose on union (#8487)
check literal/group length before claiming agg sta… (#8486)
fix error message of offset_by if offsetting by negative number of months (#8464)
fix sorted warning (#8462)
fix features serde and dtype-struct not compiling together (#8439)
respect dtype in anonymous list builder in case of… (#8428)
infer supertype in json serde (#8411)
duration on empty df (#8403)
don't inadvertently set Series initialised with nested tuple data as Object dtype (#8401)
use physical in streaming unique global table (#8390)
recursively bubble up all dtypes in list cast (#8386)
is_in struct logical types (#8378)
fix nested null parquet read (#8372)
fix logical type in ListChunked::new_from_index (#8367)
bubble up logical type in recursive list cast (#8356)
implement clone_inner for all series (#8357)
fix fill_null for categorical (#8353)
time.cast(str) as strftime (#8351)
fix logical dtypes in parallel list collection (#8349)
improve logical types of explode operation (#8348)
logical type in anonymous list builders (#8346)
escape csv header names if they contain special chars (#8331)
nested struct/list/categorical logical/physical (#8334)
fix deserialize empty list (#8326)
fix coalesce schema (#8324)
don't do null propagation (#8322)
ensure invalid list eval raises (#8317)
pass name to struct construction in aggregation (#8299)
Use three slashes for doc comments (#8284)
improve nested list construction (#8278)
Fix DataFrame.sum returning empty column names (#8283)
always sort in top_k fast path (#8275)
don't use fast paths for sorted join if there are … (#8272)
fix boolean par materialization (#8257)
improve null/empty list construction (#8255)
fix offsets in parallel utf8 materialization (#8254)
nested struct logical type consistency (#8249)
keep literal state if elementwise function is applied (#8195)
decimal ensure backed arrow arrays have correct dtype (#8193)
ensure cached nodes are initialized once (#8103)
validate map lenghts (#8147)
fix row-wise init of UInt64 values that exceed Int64 upper bound (#8146)
implement list<null> constructor (#8143)
add all primitives to av_buffer builder (#8140)
struct is_in (#8139)
fix wrong display name of binary expressions (#8131)
lazy: fix boolean sum...

Contributors

josh, jonashaag, and 32 other contributors

Assets 2

05 May 18:48

github-actions

py-0.17.12

1d3ef5e

Python Polars 0.17.12

🚀 Performance improvements

add fused multiply add optimization for expressions (#8690)
use expression for dot product (#8686)

✨ Enhancements

streaming unions (#8676)
allow arr.to_struct to take a list of field names, fix it for Series, improve related docstrings (#8673)
expression cache (#8674)
rolling covariance and correlation (#8671)
.to_physical() for List(Categorical) (#8499)
allow from_repr to handle parsing of table reprs with no dtype row (#8640)
Add dt.to_string alias for dt.strftime (#8290)
support DataFrame export to numpy structured/record arrays (#8628)
support transparent DataFrame init from numpy structured/record arrays. (#8620)
Prettify show_versions (#8627)

🐞 Bug fixes

allow arr.to_struct to take a list of field names, fix it for Series, improve related docstrings (#8673)
block predicate containing shifts and windows after sort (#8670)
ensure perfect hash table processes the nulls (#8668)
Reading more tiny CSVs than workers in parallel will deadlock (#8441)
respect maintain_order in partitioned groupby (#8653)
fix explode null series (#8654)
fix categorical agg type (#8645)
allow list<null> -> list<cat> (#8636)

🛠️ Other improvements

add notes/examples on use of inline regex flags to replace docstrings (#8685)
Add "See Also" sections for alias, map_alias, prefix, s… (#8682)
add notes/examples on use of inline regex flags to extract_all docstrings (#8675)
allow arr.to_struct to take a list of field names, fix it for Series, improve related docstrings (#8673)
add notes on the use of inline regex flags to extract docstrings (#8669)
Add missing implode to internal functions (#8667)
Clean up type checking imports (#8666)
Organize PySeries impl blocks (#8665)
clean-up some examples, extend pipe docstring (#8658)
add notes on the use of inline regex flags to contains docstrings (#8657)
fix/improve from_repr example/doctest (#8642)
Improve some bindings imports (#8630)
Move functions in Rust bindings to functions module (#8629)
only require typing_extensions before Python 3.8 (#8623)
Set up separate modules for lazy classes (#8624)
Remove duplicate util in Rust bindings (#8622)
Move Python version to env in release workflow (#8621)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @dependabot, @dependabot[bot], @ghuls, @jonashaag, @josh, @mcrumiller, @ritchie46 and @stinodego

Contributors

josh, jonashaag, and 7 other contributors

Assets 2

Releases: pola-rs/polars

Python Polars 0.18.3

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.18.2

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.18.1

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.18.0

🏆 Highlights

⚠️ Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Rust Polars 0.30.0

🏆 Highlights

⚠️ Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.17.15

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.17.14

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.17.13

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Rust Polars 0.29.0

🏆 Highlights

⚠️ Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

Contributors

Python Polars 0.17.12

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors