Releases: pola-rs/polars
Python Polars 0.18.3
🚀 Performance improvements
- use row format in streaming join
~15%
(#9379) - row encode buffer reuse (#9371)
- bytes row format for streaming groupby/unique keys
>3.5x
(#9346) - push slices down map functions (#9350)
✨ Enhancements
- support all numeric dtypes in serde (#9393)
- allow easy load/save of polars
Config
options to/from file (#9391) - ensure part of the plan is streaming if aggregati… (#9387)
- add relaxed concatenation (#9382)
- add sql DROP TABLE (#9355)
- support ternary expressions in streaming (#9343)
- add SQL support for null-aware equality checks (#9332)
- add SQL support for regular expression operators (
~
,!~
,~*
, and!~*
) (#9327) - support
//
integer floordiv operator in the SQL engine (#9324)
🐞 Bug fixes
- fix bug when comparing series (#9359)
- list zip with (#9367)
- parquet + categorical (#9363)
- respect startby in groupby_dynamic when every is greater than 1d (#9362)
- raise groupby apply on empty frame (#9360)
- raise more informative error on string arguments (#9352)
- Allow for tolerance when comparing nested dtype columns (#9272)
- avoid
is_in
TypeError with sets of values containing 'None' (#9323)
🛠️ Other improvements
- add top-k test for #9385 (#9388)
- document apply 'return_dtype' requirement (#9361)
- clarify when day of week takes effect in groupby_dynamic (#9342)
- add "if you're coming from pandas" tip to groupby_dynamic (#9336)
- fix string language formatting (#9341)
- add doc entries for
eq_missing
andne_missing
expressions (#9331) - fixup options for
validate
arg injoin
(#9319)
Thank you to all our contributors for making this release possible!
@0xbe7a, @AnatolyBuga, @MarcoGorelli, @alexander-beedie, @dkrako, @durandtibo, @ritchie46 and @universalmind303
Python Polars 0.18.2
🚀 Performance improvements
- increase streaming groupby spill size from 256 to 10_000 (#9312)
- perf(rust, python) Improve rolling min and max for nonulls (#9277)
✨ Enhancements
- allow use of
StringCache
object as a function decorator (#9309) - allow use of
Config
object as a function decorator (#9307) - serde for 'to_physical' expr (#9294)
🐞 Bug fixes
- fix rolling weighted mean (#9292)
- fix overly-broad string matching in selectors (#9303)
- fix when loading model data from upcoming
pydantic
2.x release (#9296)
🛠️ Other improvements
Thank you to all our contributors for making this release possible!
@alexander-beedie, @magarick, @ritchie46, @stinodego and @thomascamminady
Python Polars 0.18.1
🏆 Highlights
- add dedicated
selectors
module, consolidating/expanding existing selector capabilities (#9204)
🚀 Performance improvements
✨ Enhancements
- add join cardinality validation (#9278)
- implement set operations for selector API (#9276)
- keep sorted flag after Expr::truncate (#9275)
- add "sql_expr" function (#9248)
- rewrite correlation functions to expression architecture (#9258)
- keep sorted flag on
offset_by
(#9253) - add expression json serde (#9236)
- add intersection primitive for selector API (#9240)
- building blocks for expression expansion sets (#9231)
- Add ddof option to rolling_var and rolling_std (#8957)
- immediately flatten nested unions (#9220)
- Allow empty
select
/with_columns
/groupby
(#9205) - add a
datetime
selector (#9212) - support float expression on integers (#9210)
- add dedicated
selectors
module, consolidating/expanding existing selector capabilities (#9204) - add binary to list<u8> cast (#9161)
- groupby_dynamic by quarter. (#6842)
- add arr.unique expression (#9159)
- implement explode for DataType::Array (#9157)
Decimal
type:sum
,min
,max
aggregations inselect
andagg
context. (#9135)- Decimal arithmetic (#9123)
- support decimals as cast types in csv parser (#9121)
- Improve error handling for
repeat
(#9117)
🐞 Bug fixes
- fix pyarrow dataset literal filter (#9274)
- raise on invalid sort_by (#9262)
- match missing Array and Struct classes in FromPyObject (#9271)
- correct ne/e_missing schema (#9257)
- fix cached reproject offsets (#9254)
- delay opening files in streaming engine (#9251)
- ensure agg(F(lit)) == lit (#9222)
- don't SO on concat(expressions) (#9214)
- df.apply first rechunk (#9211)
- clip window_size to length in rolling_apply (#9209)
- raise error on invalid df.apply return (#9207)
- Handle edge cases of named
select
input (#9198) - rolling_apply window_size == len (#9181)
- respect time zone in strptime/to_datetime when exact=False (#9171)
- make null chunking behavior equal to other dtypes (#9176)
- return single numpy array in Array dtype -> numpy (#9164)
- fix regression in boolean nulls comparison (#9142)
- fix struct null_count if fields are null arrays (#9151)
- Fix DataFrame.to_arrow() for 0x0 dataframes (#9144)
- categorical construction from null values (#9145)
- let
apply
caller determine if length needs to be checked. (#9140) - struct
is_in
should upcast numeric types (#9110) - Restore functionality of
name
arg fordate_range
(#9107) - bubble up dtype when converting from arrow (#9120)
🛠️ Other improvements
- Fix grammar and add periods in
Expr.over
docs (#9244) - Update linting for
py-polars
crate (#9242) - Deprecate
exprs=...
input forselect
/with_columns
/agg
/struct
(#9219) - Enable parallelization in Python Windows tests (#9232)
- Use pytest
tmp_path
(#9206) - Build docs in parallel (#9229)
- Unify Python docs workflows (#9228)
- add docstring to __array__ methods (#8055)
- Update expr parsing util to return
PyExpr
(#9166) - update pyo3 requirement from 0.18 to 0.19 (#9155)
- clarify how the windows are formed in the rolling_* functions (#9192)
- stabilise polars importtime check (#9196)
- fix "to_decimal" docstring (#9197)
- note that
exact=False
is a performance footgun (#9186) - change decimal inference and argument order (#9133)
- Cache Rust build on main branch (#9130)
- Improve df.clear() docs (#8809)
- Bump
maturin
to1.0.1
(#9115) - Bump lint dependency versions (#9116)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @MarcoGorelli, @alexander-beedie, @ankane, @avimallu, @bfeif, @dependabot, @dependabot[bot], @jonashaag, @josh, @lorentzenchr, @magarick, @ritchie46, @stinodego, @universalmind303 and @zundertj
Python Polars 0.18.0
🏆 Highlights
- Rename list namespace accesor from
.arr
to.list
(#8999)
⚠️ Breaking changes
- propagate null in equality comparisons (#9053)
- formalize implode -> explode relation (#9038)
- Drop subclassing support for
DataFrame
/LazyFrame
(#9008) - consistently return list of date/datetime from lazy date_range (#8513)
- Default
date_range
/ones
/zeros
toeager=False
(#9007) - Rename list namespace accesor from
.arr
to.list
(#8999) - disallow time zones other than those in zoneinfo.available_timezones() (#8993)
- remove window expression magic (#8992)
- raise error when sorted flag not set (#8994)
- Drop subclassing support for GroupBy (#7746)
- in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
- parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
- Remove deprecated tz_aware argument (#8696)
🚀 Performance improvements
- speed up write_csv for time-zone-aware columns (#9093)
- parallelize rolling_window group materialization (#9095)
- elide hot loop in hash joins (#9075)
✨ Enhancements
- conversion from
Utf8
toDecimal
. (#9090) - default to checking sortedness in groupby_rolling… (#9063)
- propagate null in equality comparisons (#9053)
- warn if constructing Series with time-zone-aware datetimes (#9058)
- implement apply for rolling/dynamic_groupby (#9049)
- Support more data types in lazy
repeat
(#9046) - implement strategy=nearest for join_asof (#9024)
- arr.sum expression (#9041)
- formalize implode -> explode relation (#9038)
- add array namespace and min/max expression (#9032)
- improve error message on row-wise overflow (#9021)
- properly apply slice at UNION level (#9018)
- consistently return list of date/datetime from lazy date_range (#8513)
- Default
date_range
/ones
/zeros
toeager=False
(#9007) - disallow time zones other than those in zoneinfo.available_timezones() (#8993)
- raise error when sorted flag not set (#8994)
- in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
- parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
🐞 Bug fixes
- rolling_groupy was returning incorrect results when offset was positive (#9082)
- don't underflow on list.tail (#9089)
- fix null/empty in List::take_unchecked (#9074)
- repeat by (#9023)
- raise in to_datetime/strptime if format contains hour but not minute directive (#9044)
- Order of pl.Array arguments in docstring (#9059)
- propagate nulls in broadcasting of order comparisons (#9050)
- Improve read_parquet missing column error message (#8961)
- fix apply with passed date/datetime return_dtype (#9035)
- respect inner type in Array construction (#9020)
- raise error on invalid aggregation (#9013)
- fix fused arithmetic in window functions (#9012)
- don't allow silent init of
Series
declared as int/temporal with floating point values (#9004) - deprecate
time_unit
property fromSeries
(#8990)
🛠️ Other improvements
- Improve expression parsing utils (#9094)
- Refactor expression input parsing util (#9085)
- Organize "as_datatype" functions (#9080)
- Change eager path for
repeat
(#9048) - Clean up
arange
/date_range
/time_range
(#9027) - Drop subclassing support for
DataFrame
/LazyFrame
(#9008) - minor
SQLContext
docstring cleanups (#9005) - Rename list namespace accesor from
.arr
to.list
(#8999) - remove window expression magic (#8992)
- Drop subclassing support for GroupBy (#7746)
- refactor!(python): Remove old deprecated functionality (#8995)
- Remove deprecated tz_aware argument (#8696)
Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @charliegallop, @jonashaag, @mcrumiller, @raymead, @ritchie46, @sorhawell, @stinodego, @tim-habitat and @universalmind303
Rust Polars 0.30.0
🏆 Highlights
- Rename list namespace accesor from
.arr
to.list
(#8999) Array
(backed byarrow::FixedSizeList
datatype (#8943)
⚠️ Breaking changes
- propagate null in equality comparisons (#9053)
- formalize implode -> explode relation (#9038)
- consistently return list of date/datetime from lazy date_range (#8513)
- Rename list namespace accesor from
.arr
to.list
(#8999) - disallow time zones other than those in zoneinfo.available_timezones() (#8993)
- remove window expression magic (#8992)
- raise error when sorted flag not set (#8994)
- in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
- parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
- Remove deprecated tz_aware argument (#8696)
🚀 Performance improvements
- speed up write_csv for time-zone-aware columns (#9093)
- parallelize rolling_window group materialization (#9095)
- elide hot loop in hash joins (#9075)
- improve list explode perf (#8974)
- Improve explodes:
offsets_to_indexes
performance (#8964) - avoid quadratic
exclude
behaviour when selecting against dtypes and/or wildcards (#8953) - use simd-json for all json parsing (#8922)
- improve
json_extract
(#8858) - add optimizer passes and change initial order (#8811)
- fused multiply sub / sub multiply (#8799)
- improve parallel work distribution of sort expression
~4x
(#8775) - change default row-group size (#8758)
✨ Enhancements
- conversion from
Utf8
toDecimal
. (#9090) - default to checking sortedness in groupby_rolling… (#9063)
- propagate null in equality comparisons (#9053)
- implement apply for rolling/dynamic_groupby (#9049)
- implement strategy=nearest for join_asof (#9024)
- arr.sum expression (#9041)
- formalize implode -> explode relation (#9038)
- add array namespace and min/max expression (#9032)
- improve error message on row-wise overflow (#9021)
- properly apply slice at UNION level (#9018)
- consistently return list of date/datetime from lazy date_range (#8513)
- disallow time zones other than those in zoneinfo.available_timezones() (#8993)
- raise error when sorted flag not set (#8994)
- in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
- parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
- error on invalid sortby expr (#8986)
- Pushdown
is_in
to pyarrow dataset (#8930) Array
(backed byarrow::FixedSizeList
datatype (#8943)- multiple enhancements for
SQLContext
(#8944) - add sql UNION, UNION ALL & UNION DISTINCT (#8936)
- add sql compound identifiers (#8934)
- add sql EXCLUDE (#8913)
- add sql CASE (#8911)
- add sql EXPLAIN (#8897)
- improve
json_extract
(#8858) - add support for sql DISTINCT ON (#8824)
- add LazyFrame
null_count
(#8837) - check categorical cache on transpose (#8836)
- add support for
OFFSET
keyword in SQL queries (#8833) - add a new
time_range
utility function (#8776) - Add hint to use _saturating on overflow (#8805)
- support boolean addition (#8778)
- improved detail in several error messages (#8747)
🐞 Bug fixes
- rolling_groupy was returning incorrect results when offset was positive (#9082)
- fix null/empty in List::take_unchecked (#9074)
- repeat by (#9023)
- raise in to_datetime/strptime if format contains hour but not minute directive (#9044)
- propagate nulls in broadcasting of order comparisons (#9050)
- fix apply with passed date/datetime return_dtype (#9035)
- raise error on invalid aggregation (#9013)
- fix fused arithmetic in window functions (#9012)
- JoinBuilder::force_parallel is modifying allow_parallel (#8617)
- Fix erroneous warning in
hist
(#8982) - respect rechunk in parquet (#8935)
- Simplify offsets_to_indexes, fix empty offsets edge cases (#8920)
- sql qualified wildcards (#8916)
- don't check sortedness in asof by (#8906)
- check for object type in csv writer (#8894)
- window function with filtered groups (#8880)
- parse offset-aware strings as UTC in read_csv when try_parse_dates=True (#8864)
- free buffer, but not its contents (#8848)
- improve agg expr field types (#8834)
- sql
BETWEEN
bounds should be inclusive (#8818) - sort cached window groups (#8813)
- check null data before take (#8812)
- fix broadcasting on integer bitwise (#8798)
- correct aggregation of overlapping groups (#8794)
- modify join error (#8768)
- don't parallelize sort within rayon job (#8774)
- fix deadlock in cache and improve parallelism/work… (#8765)
- check offset before doing owned mutation (#8760)
- validate data on successful deserialization (#8757)
- improve supertype coercion of functions (#8755)
🛠️ Other improvements
- use concrete type for time zones (#9076)
- factor add_month out of add_impl_month_week_or_day (#9066)
- remove unnecessary timezone trait usage, use concrete type (#9065)
- Fix broken links (#9072)
- bump sqlparser version (#9043)
- move list namespace functions to seperate module (#9040)
- Clean up
arange
/date_range
/time_range
(#9027) - Rename list namespace accesor from
.arr
to.list
(#8999) - replace pattern match with unwrap (#9000)
- remove window expression magic (#8992)
- Remove deprecated tz_aware argument (#8696)
- simplify
take_every
(#8971) - add readmes to all sub crates (#8770)
- refactor(rust); improve arithmetic reuse and don't allocate on binary… (#8781)
- accumulate windows flag during translation (#8773)
Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @cbowdon, @charliegallop, @chitralverma, @jonashaag, @kpberry, @mcrumiller, @petar-savov, @raymead, @ritchie46, @sorhawell, @stinodego, @tim-habitat, @uchiiii and @universalmind303
Python Polars 0.17.15
🏆 Highlights
🚀 Performance improvements
- improve list explode perf (#8974)
- Improve explodes:
offsets_to_indexes
performance (#8964) - avoid quadratic
exclude
behaviour when selecting against dtypes and/or wildcards (#8953) - use simd-json for all json parsing (#8922)
- improve performance of
align_frames
, and add new alignment option (#8899)
✨ Enhancements
- error on invalid sortby expr (#8986)
- Pushdown
is_in
to pyarrow dataset (#8930) - allow set column list input to 'drop' and 'drop_nulls' (#8962)
Array
(backed byarrow::FixedSizeList
datatype (#8943)- Add
dtype
argument forrepeat
(#8946) - Use schema keys to define the columns if only the schema is provided to
pl.struct
(#8952) - multiple enhancements for
SQLContext
(#8944) - add sql UNION, UNION ALL & UNION DISTINCT (#8936)
- add sql compound identifiers (#8934)
- add sql EXCLUDE (#8913)
- add sql CASE (#8911)
- add sql EXPLAIN (#8897)
- Write dataframes as delta tables (#7616)
- improve performance of
align_frames
, and add new alignment option (#8899) - improved inference from type annotations (#8895)
🐞 Bug fixes
- Fix erroneous warning in
hist
(#8982) - don't modify
Series
with empty names in-place onDataFrame
init (#8956) - respect rechunk in parquet (#8935)
- Add hint on PyArrow to ADBC import error (#8898)
- Simplify offsets_to_indexes, fix empty offsets edge cases (#8920)
- sql qualified wildcards (#8916)
- address edge cases with in-place modification of
Series
objects (#8915) - don't check sortedness in asof by (#8906)
- check for object type in csv writer (#8894)
- improve performance of
align_frames
, and add new alignment option (#8899) - window function with filtered groups (#8880)
🛠️ Other improvements
- deprecate
rename
"in_place" parameter (#8960) - Clean up tests for
repeat
(#8979) - Deprecate
name
argument forrepeat
(#8977) - simplify
take_every
(#8971) - Clean up
repeat
/ones
/zeros
(#8963) - further enhance
SQLContext
docstrings (#8948) - docs(python) Fix typo in
lazygroupby.rs
error message (#8937) - fix docstring for
time()
(#8939) - refactor tzinfo-related tests (#8883)
Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @cbowdon, @chitralverma, @jonashaag, @kpberry, @mcrumiller, @petar-savov, @ritchie46, @stinodego and @universalmind303
Python Polars 0.17.14
🚀 Performance improvements
- optimise
align_frames
and properly handle the case where the alignment key has duplicate values (#8825)
✨ Enhancements
- add an
align
option topl.concat
(#8835) - add support for sql DISTINCT ON (#8824)
- add LazyFrame
null_count
(#8837) - check categorical cache on transpose (#8836)
- add support for
OFFSET
keyword in SQL queries (#8833) - optimise
align_frames
and properly handle the case where the alignment key has duplicate values (#8825)
🐞 Bug fixes
- parse offset-aware strings as UTC in read_csv when try_parse_dates=True (#8864)
- handle
InitVar
typing declarations ondataclass
objects (#8856) - free buffer, but not its contents (#8848)
- improve agg expr field types (#8834)
- optimise
align_frames
and properly handle the case where the alignment key has duplicate values (#8825) - sql
BETWEEN
bounds should be inclusive (#8818)
🛠️ Other improvements
- add examples for
Config
"set_tbl_formatting" and "set_fmt_str_lengths" methods (#8859) - Convert between Vec of Series/Pyseries using trait (#8846)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46, @stinodego and @universalmind303
Python Polars 0.17.13
🚀 Performance improvements
- add optimizer passes and change initial order (#8811)
- fused multiply sub / sub multiply (#8799)
- improve parallel work distribution of sort expression
~4x
(#8775) - change default row-group size (#8758)
- elide function calls in AnyValue::eq (#8725)
✨ Enhancements
- add a new
time_range
utility function (#8776) - Add hint to use _saturating on overflow (#8805)
- add a "restore_defaults" kwarg to
Config
init (#8797) - add lazy
time
expression (#8785) - support boolean addition (#8778)
- support
SQLContext
registration ofDataFrames
(#8762) - support automatic
SQLContext
frame/table registration from local variables (#8749) - improved detail in several error messages (#8747)
- support frame registration at
SQLContext
init time, and add an "unregister" method (#8744) - support repeat for all types (#8741)
- add support for
DISTINCT
keyword in SQL select clauses (#8740) - support any day of the week in 'start_by' in groupby_dynamic (#8720)
- add support for
USING
clause in SQL join operations (#8731) - add unit tests for
extend_constant
Expr (#8734) - add clean multi-frame registration to
SQLContext
(#8724) - add support for
HAVING
clause to SQLGROUP BY
operations (#8704) - improved
numpy
string interop (#8703)
🐞 Bug fixes
- sort cached window groups (#8813)
- check null data before take (#8812)
- fix broadcasting on integer bitwise (#8798)
- Fix incorrect type hint for
arange
(#8796) - correct aggregation of overlapping groups (#8794)
- don't parallelize sort within rayon job (#8774)
- fix deadlock in cache and improve parallelism/work… (#8765)
- check offset before doing owned mutation (#8760)
- don't persist temporary column in disjoint calls to
update
(#8763) - validate data on successful deserialization (#8757)
- improve supertype coercion of functions (#8755)
- groupby_dynamic was unnecessarily failing on ambiguous local datetime (#8737)
- ensure count aggregation has proper length when spilling (#8735)
- fix return value of std for single-element sequence with ddof=1 (#8730)
- don't take logical plan during streaming fmt (#8711)
- Don't upcast in round() for f32 when decimal is 0 (#8706)
🛠️ Other improvements
- add entry for lazy
time
func (#8786) - add unit tests for
extend_constant
Expr (#8734) - add rounding coverage for 32/64 bit floats (#8715)
- Add warning to count methods on null (#8698)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @MarcoGorelli, @alexander-beedie, @mcrumiller, @ritchie46, @stinodego, @uchiiii, @universalmind303 and @zundertj
Rust Polars 0.29.0
🏆 Highlights
- Out-of-core unique (#8573)
⚠️ Breaking changes
- Rename
concat_lst
toconcat_list
(#8597) - Schema improvements (#8286)
- don't create duplicate pivot names (#8002)
- rename
toggle_string_cache
toenable_string_cache
(#7970) - change top_k(descending) -> bottom_k (#7969)
- in
sort
,top_k
,sort_by
, andarg_sort_by
, raise ifdescending
is a sequence and its length doesn't match the number of columns to sort by (#7957)
🚀 Performance improvements
- elide function calls in AnyValue::eq (#8725)
- add fused multiply add optimization for expressions (#8690)
- use expression for dot product (#8686)
- improve nested grouptuples related code (#8618)
- buffer spill partitions in ooc sort.
~10/20%
(#8616) - improve OOC sort performance during partition phase (#8590)
- remove some unnecessary calls and matches (#8490)
- less naive count (#8473)
- parallelize almost all flattens (#8468)
- optimize horizontal min/max (#8463)
- reinstate old behavior in numeric group-tuples (#8445)
- remove false sharing in perfect hash table
>2x
(#8432) - further optimised conversions to python date/datetime (#8417)
- optimize join inner materialization of single keys (#8405)
- parallelize sorted group tuple materialization (#8387)
- improve materialization of huge cardinality group tuples (#8382)
- improve group_tuples materialization (#8375)
- use online variance kernel for aggregation (#8306)
- add specialized boolean aggregation for min/max (#8294)
- fail fast on non-inferable strings in strptime if no
fmt
is provided (#8111) - make chunks search more resilient (#8229)
- SIMD accelerated
arg_min
/arg_max
(viaargminmax
) (#8074) - speed up csv parsing for slower datetimes formats (#8213)
arr.eval
run on groupby expression engine when possible (#8199)FromParalleIter<Option<str>> for Utf8Chunked
~1.9x
(#8058)- speed up from_par_iter Option<bool>
~2.5x
(#8057) - parallelize numeric ChunkedArray materialization
~2x
. (#8053) - parallelize
into_groups
materialization ~-25%
(#8036) - use a trusted anyvalue builder (#8001)
- numeric grouptuples with nulls hash in single pass
~25%
(#7980) - use perfect hash table for categoricals (#7951)
- improve group_tuples of high cardinality data
~10%
(#7938) - use streaming instead of partitioned groupby (#7907)
- don't auto-stream groupby (#7906)
- rechunk before aggs (#7903)
- don't re-allocate groups in sorted to_dummies (#7897)
✨ Enhancements
- add support for
DISTINCT
keyword in SQL select clauses (#8740) - support any day of the week in 'start_by' in groupby_dynamic (#8720)
- add support for
USING
clause in SQL join operations (#8731) - add support for
HAVING
clause to SQLGROUP BY
operations (#8704) - streaming unions (#8676)
- expression cache (#8674)
- rolling covariance and correlation (#8671)
- Add
dt.to_string
alias fordt.strftime
(#8290) - use temp dir for ooc spills (#8614)
- make ooc-sort resilient against chunk_size (#8588)
- Set
strptime
defaultstrict/exact=true
(#8587) - Out-of-core unique (#8573)
- Add
to_date
,to_datetime
,to_time
to String namespace (#8579) - more detailed error message on failure to cast
List
dtype (#8583) - don't trigger unreachable code if no dtype is set (#8532)
- accept expressions in
groupby_dynamic/rolling
(#8528) - expose quantile/mean for duration (#8491)
- require explicitly sorted flag for upsample (#8488)
- allow for _saturating suffix in duration strings (#8479)
- let duration string accept "1mo_saturating" (#8469)
- add dt.month_start and dt.month_end (#8435)
- add SQL support for cumulative functions (#8457)
- add
str_slice
method toStringNameSpace
(#8427) - allow negative 'arange' expression (#8413)
- warn if argument is not explicitly sorted (#8409)
- Schema improvements (#8286)
- add support for SQL "IN" expr (#8396)
- cli output mode & sql read_json (#8336)
- rename 'csv-file' to 'csv' (#8101)
- preserve time zone in combine (#8263)
- add
use_earliest
argument toreplace_time_zone
for dealing with ambiguous datetimes (#8087) - SQL CTE's (#8208)
- add duration cumsum and remainder (#8219)
- better algorithm for streaming unique (#8003)
- Add approx distinct count via
approx_unique()
(#7937) - adopt
FunctionExpr
forcat
namespace (#8173) DatetimeArgs
ergonomics (#8133)- Remove Seek constraint from IpcStreamReader and SerReader (#8166)
- implement
FunctionExpr
for bound and round methods (#8172) - display skipped row if same number of rows (#8170)
- move all boolean expressions into
BooleanFunction
enum (#8132) - rewrite log expressions to make them serializable (#8126)
- make unique expr serde and cmp (#8153)
- adopt
FunctionExpr
forabs
to allow for serialization (#8129) - adopt
FunctionExpr
forcum*
functions (#8130) - support negative index in
pct_change
(#8137) - add
log1p
to list of mathematical functions (#8102) - expand list of tz-aware formats which can be auto-inferred (#8085)
- clearer error message if strptime without a fmt specified fails (#8086)
- infer tz-aware formats with try_parse_dates in read_csv (#8084)
- feat(python, rust)! make 'mo' interval raise if the target date does not exist (#8078)
- auto-infer fmt for tz-aware date strings (#7405)
- multiple sql contexts & optional sql highlighting in cli (#8072)
- implement arg_sort for struct dtype (#8051)
- support struct in df.unique (#7976)
- change top_k(descending) -> bottom_k (#7969)
- optimize away nested unions in lp (#7861)
- Add seed argument to rank for random (#7913)
- auto-infer detecting time-zone-awareness of fmt argument in strptime; deprecate tz_aware argument (#7886)
- deal with null values in cut/qcut (#7878)
- support datetime/date subclasses (e.g. FreezeGun) (#7819)
🐞 Bug fixes
- groupby_dynamic was unnecessarily failing on ambiguous local datetime (#8737)
- ensure count aggregation has proper length when spilling (#8735)
- fix return value of std for single-element sequence with ddof=1 (#8730)
- don't take logical plan during streaming fmt (#8711)
- Don't upcast in round() for f32 when decimal is 0 (#8706)
- block predicate containing shifts and windows after sort (#8670)
- ensure perfect hash table processes the nulls (#8668)
- Reading more tiny CSVs than workers in parallel will deadlock (#8441)
- respect maintain_order in partitioned groupby (#8653)
- fix explode null series (#8654)
- fix categorical agg type (#8645)
- allow list<null> -> list<cat> (#8636)
- maintain sorted info on top-k and empty sort (#8615)
- maintain sortedness in date -> datetime cast (#8606)
- fix determining of supertype for tz-aware and tz-naive datetimes (#8585)
- fix csv reader with new line in header (#8580)
- correct for nested offsets in json serialization (#8584)
- fix wrong dtype init in streaming groupby (#8574)
- fix categorical/string_cache fill_null panic (#8562)
- fix window function contention in binary expression (#8544)
- fix StructChunked
not_equal
comparator/operator (#8547) - fix struct pyarrow ffi (#8543)
- don't trigger unreachable code if no dtype is set (#8532)
- keep sorted info on agg_first and simple singleton… (#8526)
- unset fast_unique coming from arrow (#8521)
- correct sign-reversed scale on DecimalChunked to Python Decimal conversion (fixes #8423) (#8508)
- don't error on cast if column is not projected (#8495)
- ensure window function succeeds on empty frame (#8492)
- don't set verbose on union (#8487)
- check literal/group length before claiming agg sta… (#8486)
- fix error message of offset_by if offsetting by negative number of months (#8464)
- fix sorted warning (#8462)
- fix features serde and dtype-struct not compiling together (#8439)
- respect dtype in anonymous list builder in case of… (#8428)
- infer supertype in json serde (#8411)
- duration on empty df (#8403)
- don't inadvertently set
Series
initialised with nested tuple data asObject
dtype (#8401) - use physical in streaming unique global table (#8390)
- recursively bubble up all dtypes in list cast (#8386)
- is_in struct logical types (#8378)
- fix nested null parquet read (#8372)
- fix logical type in ListChunked::new_from_index (#8367)
- bubble up logical type in recursive list cast (#8356)
- implement clone_inner for all series (#8357)
- fix fill_null for categorical (#8353)
- time.cast(str) as strftime (#8351)
- fix logical dtypes in parallel list collection (#8349)
- improve logical types of explode operation (#8348)
- logical type in anonymous list builders (#8346)
- escape csv header names if they contain special chars (#8331)
- nested struct/list/categorical logical/physical (#8334)
- fix deserialize empty list (#8326)
- fix coalesce schema (#8324)
- don't do null propagation (#8322)
- ensure invalid list eval raises (#8317)
- pass name to struct construction in aggregation (#8299)
- Use three slashes for doc comments (#8284)
- improve nested list construction (#8278)
- Fix DataFrame.sum returning empty column names (#8283)
- always sort in
top_k
fast path (#8275) - don't use fast paths for sorted join if there are … (#8272)
- fix boolean par materialization (#8257)
- improve null/empty list construction (#8255)
- fix offsets in parallel utf8 materialization (#8254)
- nested struct logical type consistency (#8249)
- keep literal state if elementwise function is applied (#8195)
- decimal ensure backed arrow arrays have correct dtype (#8193)
- ensure cached nodes are initialized once (#8103)
- validate
map
lenghts (#8147) - fix row-wise init of
UInt64
values that exceedInt64
upper bound (#8146) - implement list<null> constructor (#8143)
- add all primitives to av_buffer builder (#8140)
- struct
is_in
(#8139) - fix wrong display name of binary expressions (#8131)
- lazy: fix boolean sum...
Python Polars 0.17.12
🚀 Performance improvements
✨ Enhancements
- streaming unions (#8676)
- allow
arr.to_struct
to take a list of field names, fix it forSeries
, improve related docstrings (#8673) - expression cache (#8674)
- rolling covariance and correlation (#8671)
- .to_physical() for List(Categorical) (#8499)
- allow
from_repr
to handle parsing of table reprs with no dtype row (#8640) - Add
dt.to_string
alias fordt.strftime
(#8290) - support
DataFrame
export tonumpy
structured/record arrays (#8628) - support transparent
DataFrame
init fromnumpy
structured/record arrays. (#8620) - Prettify show_versions (#8627)
🐞 Bug fixes
- allow
arr.to_struct
to take a list of field names, fix it forSeries
, improve related docstrings (#8673) - block predicate containing shifts and windows after sort (#8670)
- ensure perfect hash table processes the nulls (#8668)
- Reading more tiny CSVs than workers in parallel will deadlock (#8441)
- respect maintain_order in partitioned groupby (#8653)
- fix explode null series (#8654)
- fix categorical agg type (#8645)
- allow list<null> -> list<cat> (#8636)
🛠️ Other improvements
- add notes/examples on use of inline regex flags to
replace
docstrings (#8685) - Add "See Also" sections for alias, map_alias, prefix, s… (#8682)
- add notes/examples on use of inline regex flags to
extract_all
docstrings (#8675) - allow
arr.to_struct
to take a list of field names, fix it forSeries
, improve related docstrings (#8673) - add notes on the use of inline regex flags to
extract
docstrings (#8669) - Add missing
implode
to internal functions (#8667) - Clean up type checking imports (#8666)
- Organize PySeries
impl
blocks (#8665) - clean-up some examples, extend
pipe
docstring (#8658) - add notes on the use of inline regex flags to
contains
docstrings (#8657) - fix/improve
from_repr
example/doctest (#8642) - Improve some bindings imports (#8630)
- Move functions in Rust bindings to
functions
module (#8629) - only require
typing_extensions
before Python 3.8 (#8623) - Set up separate modules for lazy classes (#8624)
- Remove duplicate util in Rust bindings (#8622)
- Move Python version to env in release workflow (#8621)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @dependabot, @dependabot[bot], @ghuls, @jonashaag, @josh, @mcrumiller, @ritchie46 and @stinodego