Releases: pola-rs/polars
Python Polars 0.17.11
🚀 Performance improvements
- improve nested grouptuples related code (#8618)
- buffer spill partitions in ooc sort.
~10/20%
(#8616) - avoid potentially redundant casts on
Series
init (#8613)
✨ Enhancements
- add
Expr.meta
namespaceeq
andne
methods (#8599) - avoid potentially redundant casts on
Series
init (#8613) - use temp dir for ooc spills (#8614)
- add strict dtype equality comparison methods (
is_
andis_not
) (#8600) - automatically convert
series <op> expr
topl.lit(series) <op> expr
(#8549)
🐞 Bug fixes
- maintain sorted info on top-k and empty sort (#8615)
- fix ooc sort regression; don't take IO-thread before init (#8607)
- maintain sortedness in date -> datetime cast (#8606)
🛠️ Other improvements
- document sortedness of return value of upsample (#8612)
- Set up
functions
module in Rust bindings (#8598) - Split PyExpr
impl
block into modules (#8596)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @dependabot, @dependabot[bot], @mcrumiller, @ritchie46 and @stinodego
Python Polars 0.17.10
🏆 Highlights
- Out-of-core unique (#8573)
🚀 Performance improvements
- improve OOC sort performance during partition phase (#8590)
- significant speedup for python iteration over
Series
data (#8501)
✨ Enhancements
- make ooc-sort resilient against chunk_size (#8588)
- Out-of-core unique (#8573)
- Add
to_date
,to_datetime
,to_time
to String namespace (#8579) - enhance parametric strategy retrieval, enable
List
strategy by default (#8571) - Add default value for
round
(#8566) - don't trigger unreachable code if no dtype is set (#8532)
- Ergonomic inputs for
all
,any
,sum
, andcumsum
(#8541) - accept expressions in
groupby_dynamic/rolling
(#8528) - add
is_nested
property to dtypes (#8514)
🐞 Bug fixes
- fix determining of supertype for tz-aware and tz-naive datetimes (#8585)
- correct for nested offsets in json serialization (#8584)
- fix wrong dtype init in streaming groupby (#8574)
- fix edge-case with
NamedTuple
input that contains unhashable field data (#8578) - temporarily disable
List
dtype in parametric tests (#8581) - fix categorical/string_cache fill_null panic (#8562)
- fix testing asserts for
NaN
values inStruct
data (#8557) - fix window function contention in binary expression (#8544)
- fix struct pyarrow ffi (#8543)
- don't trigger unreachable code if no dtype is set (#8532)
- fix testing asserts for
NaN
values inList
data (#8537) - keep sorted info on agg_first and simple singleton… (#8526)
- don't downcast
Decimal
toFloat64
in truediv (#8523) - unset fast_unique coming from arrow (#8521)
- correct sign-reversed scale on DecimalChunked to Python Decimal conversion (fixes #8423) (#8508)
- Clarify and fix behaviour in
pl.min/max
(#8509)
🛠️ Other improvements
- warn about changing date_range default from lazy=False to eager=False (#8593)
- Rename
internals
module to_reexport
(#8554) - change partition strategy (#8561)
- fix testing asserts for
NaN
values inStruct
data (#8557) - note sortedness of results from groupby ops (#8540)
- better type signature for set_sorted (#8529)
- add test for categorical input that is not fast_unique (#8527)
- Improvements to the Python release workflow (#8121)
- Update docs requirements (#8200)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @cgevans, @ritchie46, @stinodego and @uchiiii
Python Polars 0.17.9
Migration guide.
Operation that require columns to be sorted will now give a warning if they are not explicitly sorted, or tagged as sorted.
# 1. inform polars that a column is sorted on the DataFrame / LazyFrame.
(
df.set_sorted("foo")
.groupby_dynamic(..)
)
# 2. inform polars inline via the `set_sorted` expression
df.join_asof(df2, on=pl.col("foo").set_sorted())
# 3. explicitly sort first
# this is expensive if the data is already sorted
df.sort("foo")
✨ Enhancements
- expose quantile/mean for duration (#8491)
- require explicitly sorted flag for upsample (#8488)
- allow for _saturating suffix in duration strings (#8479)
🐞 Bug fixes
- don't error on cast if column is not projected (#8495)
- ensure window function succeeds on empty frame (#8492)
- don't set verbose on union (#8487)
- check literal/group length before claiming agg sta… (#8486)
🛠️ Other improvements
- Remove unneeded operation in
strptime
(#8496) - additional parametric testing docs/examples (#8485)
- improve sorted warning/ fix tests (#8484)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46 and @stinodego
Python Polars 0.17.8
🚀 Performance improvements
- less naive count (#8473)
- parallelise dataframe
describe
method (#8465) - parallelize almost all flattens (#8468)
- optimize horizontal min/max (#8463)
- reinstate old behavior in numeric group-tuples (#8445)
✨ Enhancements
- apply thousand-separators to "shape" html output, consi… (#8472)
- let duration string accept "1mo_saturating" (#8469)
- add dt.month_start and dt.month_end (#8435)
- add SQL support for cumulative functions (#8457)
- improve utility of dtype groups (#8453)
- improved parametric
Decimal
strategy (#8444) - improved hypothesis/parametric testing profile registration (#8433)
🐞 Bug fixes
- fix error message of offset_by if offsetting by negative number of months (#8464)
- fix sorted warning (#8462)
- improve utility of dtype groups (#8453)
🛠️ Other improvements
- bubble up time_iter errors (#8467)
- additional test coverage for dtype groups (#8458)
- integrate live refresh/reload facility while writing docs (#8452)
- add a series of parametric/hypothesis example tests to the main testing docs page (#8454)
- parametric testing docs improvements (#8447)
- improved parametric
Decimal
strategy (#8444) - improved hypothesis/parametric testing profile registration (#8433)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46, @universalmind303 and @utkarshgupta137
Python Polars 0.17.7
🚀 Performance improvements
- remove false sharing in perfect hash table
>2x
(#8432) - further optimised conversions to python date/datetime (#8417)
✨ Enhancements
- initial parametric/hypothesis
Decimal
dtype testing strategy (note: disabled by default) (#8430) - add
Series
support topl.from_repr
(#8429) - Allow
%f
instrptime
format strings (#8404)
🐞 Bug fixes
- raise upon invalid use of zero_copy_only (#8418)
- respect dtype in anonymous list builder in case of… (#8428)
str.strptime
error message: utf -> utc (#8422)
🛠️ Other improvements
- initial parametric/hypothesis
Decimal
dtype testing strategy (note: disabled by default) (#8430)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @ayemjay, @jonashaag, @mzjp2, @pgimalac, @ritchie46 and @stinodego
Python Polars 0.17.6
🚀 Performance improvements
- optimize join inner materialization of single keys (#8405)
- parallelize sorted group tuple materialization (#8387)
- improve materialization of huge cardinality group tuples (#8382)
- improve group_tuples materialization (#8375)
- conversion speedups from polars int64 timestamps to python temporal types:
✨ Enhancements
- allow existing
item
method to optionally take row/col indices (#8412) - allow negative 'arange' expression (#8413)
- warn if argument is not explicitly sorted (#8409)
- .to_numpy(use_pyarrow=False) for Object and Boolean (#8397)
- new hypothesis strategy that can generate data for
List
dtypes (#8400) - offer cleaner usage pattern for
Config
object in context-manager context (#8394) - add support for SQL "IN" expr (#8396)
- add a "signed" param to
Series.is_integer
(#8383) - add is_integer (#8373)
- raise error on invalid dict aggregation (#8371)
- cli output mode & sql read_json (#8336)
- more informative keyerror on invalid getitem (#8320)
🐞 Bug fixes
- infer supertype in json serde (#8411)
- duration on empty df (#8403)
- don't inadvertently set
Series
initialised with nested tuple data asObject
dtype (#8401) - use physical in streaming unique global table (#8390)
- recursively bubble up all dtypes in list cast (#8386)
- is_in struct logical types (#8378)
- fix nested null parquet read (#8372)
- fix logical type in ListChunked::new_from_index (#8367)
- fix unintentional loading of hypothesis profile (#8362)
- bubble up logical type in recursive list cast (#8356)
- ensure that
iter_rows
doesn't return nestedTimestamp
values (#8359) - implement clone_inner for all series (#8357)
- add missing
__hash__
support toField
, include "time_zone" inDatetime
hash, fixStruct
hash (#8354) - fix fill_null for categorical (#8353)
- time.cast(str) as strftime (#8351)
- fix logical dtypes in parallel list collection (#8349)
- improve logical types of explode operation (#8348)
- logical type in anonymous list builders (#8346)
- address potential error caused by float division on time_unit scaling (#8337)
- escape csv header names if they contain special chars (#8331)
- nested struct/list/categorical logical/physical (#8334)
- fix struct schema argument (#8327)
- fix precision issue when converting pl.Datetime("ms") to Python datetime (#8332)
- fix deserialize empty list (#8326)
- List<Null> consistency (#8325)
- fix coalesce schema (#8324)
- don't do null propagation (#8322)
- validate
window_size
user input in rolling_expr (#8318) - ensure invalid list eval raises (#8317)
- fix typing overloads of
read_excel
(#8300)
🛠️ Other improvements
- new hypothesis strategy that can generate data for
List
dtypes (#8400) - update
duration
docstring/example (#8392) - Upgrade ruff (#8380)
- enhanced parametric testing for temporal dtypes (#8347)
- Minor update to
strptime
(#8345) - adjust pytest config so as not to inadvertently prevent test debugging in IPython consoles (#8308)
- add newline in pl.DataFrame.pivot docs (#8307)
Thank you to all our contributors for making this release possible!
@JoonHong-Kim, @MarcoGorelli, @StefanBRas, @alexander-beedie, @avimallu, @grantmcdermott, @jonashaag, @rben01, @ritchie46, @stinodego and @universalmind303
Python Polars 0.17.5
🚀 Performance improvements
- use online variance kernel for aggregation (#8306)
Thank you to all our contributors for making this release possible!
@ritchie46
Python Polars 0.17.4
🚀 Performance improvements
- add specialized boolean aggregation for min/max (#8294)
✨ Enhancements
- preserve time zone in combine (#8263)
🐞 Bug fixes
- pass name to struct construction in aggregation (#8299)
- improve nested list construction (#8278)
- Truncate long column name in glimpse (#8281)
- Fix DataFrame.sum returning empty column names (#8283)
- always sort in
top_k
fast path (#8275) - don't use fast paths for sorted join if there are … (#8272)
🛠️ Other improvements
- use
concat_owned_array_unchecked
when possible (#8274)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @ritchie46, @stinodego, @zaynetro and @zundertj
Python Polars 0.17.3
🏆 Highlights
- support
DataFrame
init frompydantic
model data (#8178)
🚀 Performance improvements
- fail fast on non-inferable strings in strptime if no
fmt
is provided (#8111) - make chunks search more resilient (#8229)
- SIMD accelerated
arg_min
/arg_max
(viaargminmax
) (#8074) - speed up csv parsing for slower datetimes formats (#8213)
- improve datetime interpret perf (#8209)
arr.eval
run on groupby expression engine when possible (#8199)- ~2-3x speedup for
DataFrame
init frompydantic
models (#8181)
✨ Enhancements
- add
use_earliest
argument toreplace_time_zone
for dealing with ambiguous datetimes (#8087) - fail loudly on .%f directive, as it differs from the Python standard library (#8237)
- SQL CTE's (#8208)
- automatically convert
series OP expr
->pl.lit(series) OP expr
where OP is arithmetic (#8225) - add pickle support for
LazyFrame
(#8220) - add duration cumsum and remainder (#8219)
- support
DataFrame
init from nesteddataclass
,pydantic
, andNamedTuple
objects (#8185) - better algorithm for streaming unique (#8003)
- Add approx distinct count via
approx_unique()
(#7937) - add percentiles to
describe
methods (#8169) - support
DataFrame
init frompydantic
model data (#8178) - display skipped row if same number of rows (#8170)
🐞 Bug fixes
- add special numpy float branch in anyvalue conversion (#8259)
- fix boolean par materialization (#8257)
- improve null/empty list construction (#8255)
- fix offsets in parallel utf8 materialization (#8254)
- nested struct logical type consistency (#8249)
- keep literal state if elementwise function is applied (#8195)
- decimal ensure backed arrow arrays have correct dtype (#8193)
🛠️ Other improvements
- parametric/hypothesis testing code cleanups (#8253)
- Rename
strptime
/strftime
args (#8221) - change sampling ratio for groupby strategy (#8223)
- Rename
Expr.list
toimplode
(#8165) - don't panic on err in offset_by (#8210)
- re-enable test parallization for Windows tests (#8214)
- Fix small typo: "im memory" -> "in memory" (#8187)
- remove unused dtype_to_arrow_type (#8177)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @avimallu, @borchero, @chitralverma, @clickingbuttons, @ghuls, @josh, @jvdd, @rben01, @ritchie46, @stinodego and @universalmind303
Python Polars 0.17.2
✨ Enhancements
- make unique expr serde and cmp (#8153)
- Enhanced parametric testing
DataFrame
generation (#8149) - support negative index in
pct_change
(#8137) - add
log1p
to list of mathematical functions (#8102)
🐞 Bug fixes
- object conversion in anyvalue (#8155)
- Address a ~15% regression in
import polars
speed (#8151) - validate
map
lenghts (#8147) - fix row-wise init of
UInt64
values that exceedInt64
upper bound (#8146) - implement list<null> constructor (#8143)
- add all primitives to av_buffer builder (#8140)
- struct
is_in
(#8139) - fix wrong display name of binary expressions (#8131)
🛠️ Other improvements
- Enhanced parametric testing
DataFrame
generation (#8149)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @borchero, @dependabot, @dependabot[bot], @jonashaag, @ritchie46 and @stinodego