Releases: twitter/algebird
Scala 2.12 goodness!
This is the first Algebird release to publish Scala 2.12 artifacts!
Apart from that, here are some of the changes since the last release:
Various BloomFilter improvements:
Remove seed variable in BloomFilter and rename k to hashIndex: #557
Polymorphic Bloom filters: #607
Optimize BloomFilter significantly: #610
Bloom filter distance function: #612
Optimize Hamming distance for Bloom Filters: #617
Incorporate more Algebra types:
Use standard algebra types: #523
Use more algebra types - #620
SpaceSaver updates:
Widen the visibility of SpaceSaver.SSMany, SpaceSaver.SSOne: #577
SpaceSaver fromBytes & toBytes: #603
Catch OOM in SpaceSaverTest: #614
Remove typeclass from interval constructor: #605
Better toString in ExpHistogram: #604
Remove legacy CountMinSketchMonoid: #602
Convert all laws to take Equiv instances, deprecate Equiv versions: #595
replace FromIntLike with Ring and toK function: #594
bail out of SemigroupMacro.sumOption for to.isEmpty
: #599
Handle empty in Generated{Product, Abstract}Algebra: #597
Add explicit return types to Group instances for Moments, AveragedValue: #596
Remove view bounds on Moments, DecayedValue, AveragedValue: #592
Add MonoidAggregator.collectBefore: #611
Thanks to @johandahlberg , @johnynek , @ElPicador , @sritchie , @isnotinvain for the contributions!
Early Scala 2.12 release
This is an early release of some Scala 2.12 Algebird packages that contains some binary incompatible changes. Please pick up release: https://github.com/twitter/algebird/releases/tag/0.13.0. That contains the appropriate set of Scala 2.12 Algebird artifacts.
Even Faster SumOption!
The main new feature of this release is a faster (benchmarked!) implementation of tuple and product semigroup sumOptions. This means if you are aggregating on scalding or spark, you should see a significant (~ 2x faster).
There is a new Set
membership monoid called SetDiff
. It can model adding and removing from sets (which can be useful for applications in summingbird).
We have an exponential histogram Fold, which is an approximate data-structure that can tell you approximate counts over sliding windows (see #568). A future work will add a monoid for this type, however when possible, using the Fold is better since it has better error properties.
Lastly, there are many new docs.
Huge thanks to @sritchie who was the main contributor to this release.
changelog:
- Add
SetDiff
data structure toalgebird-core
: #555 - Add
Ring[BigDecimal]
, modeled afterRing[BigInt]
: #553 - "Exponential Histogram" sliding window counter implementation added to
algebird-core
asExpHist
: #568 - improve HLLSeries performance: #575
- Add a microsite at https://twitter.github.io/algebird via the
sbt-microsites
plugin, along with docs for all typeclasses and data structures: #576 - Adds lots of scalacheck
Arbitrary
andGen
instances toalgebird-test
, undercom.twitter.algebird.scalacheck.{ gen, arbitrary }
: #579 - Add
Monoid[Max[Vector[T]]]
,Monoid[Max[Stream[T]]]
: #579 - Add
FirstAggregator
andLastAggregator
, and docs and API / perf improvements forFirst
,Last
,Min
,Max
: #579 - Add
LawsEquiv
versions of all laws: #584 - Deprecates broken group/ring for
Future
/Try
: #584 - Add
metricsLaws[T]
toBaseProperties
inalgebird-test
: #584 - Modify generated
Tuple2Monoid
, etc to extendTupleNSemigroup
, giving subclasses access to efficientsumOption
: #585 - optimize
Generated{Abstract,Product}Algebra.sumOption
with benchmarking #591 - Add an efficient
sumOption
,+
,-
, methods and docs toAveragedValue
: #589
Add me maybe
This is an optimization and bug-fix release that is compatible with 0.12.x
. We add two new features: Semigroup.maybePlus[T](t: T, o: Option[T]): T
and Aggregator.numericSum
to convert to double and and sum from any scala.math.Numeric
.
The full change log is below. Thanks to all contributors!
- Optimize
CMS.create(Seq[K])
#537 - Add sumOption support to primitive Rings #538
- Add Aggregator.numericSum function #539
- Add code coverage checks #541
- Add Semigroup.maybePlus for combining a value with an optional value. #544
- Clean up compilation warnings #546
- Fix algebird-spark to work with spark 2 (#550)
Better Aggregator Methods, Faster CountMinSketch and Batched!
This release adds many convenience methods to Aggregator
, adds a new type called Batched[T]
, and speeds up CMS.
Aggregator now has methods for reservoir sampling, and more top-K (sort*Take) aggregators. Batched allows you to defer doing any work on plus
until you have a certain size, then it calls sumOption
internally. This is designed for aggregations that are expensive to do iteratively, but sumOption can be made efficient. Lastly, CMS was significantly improved in performance, a sumOption method was added, and a mutable builder (CMSSummation) was added (see #533).
This release should be 100% binary compatible with 0.12.0
(this check is now part of the travis-ci checks we run).
- Add an Identity Monad #511
- Improve toRichTraverable to work with Iterator also #518 #535
- fix several flakey tests #510 #514 #525
- Improve SpaceSaver design #519
- Add sortByTake, sortByReverseTake to Aggregator #527
- Add a randomSample and reservoirSample aggregators #529
- Add a Batched type for converting plus to sumOption (defer plus until you have a batch): #530
- Add a default size to appoximatePercentile: #531
- Add a .group method to MapAlgebra and RichTraversable #532
- Optimize CountMinSketch, add a mutable Builder for faster construction: #533
Thank you to:
@joshualande @non @dossett @jnievelt @piyushnarang @koertkuipers @Gabriel439 @NathanHowell @johnynek @ianoc
More Speed
- Implement an appendMonoid Aggregator factory which yields aggregators…: #501
- Dealing with probabilistic tests: #478
- Add Applicative.sequenceGen: #498
- Create a sparse Count-Min-Sketch.: #464
- fix name and visibility of negativePowersOfTwo: #492
- Speed up HLL presentation by 100x: #491
- Test Semigroup#sumOption using Iterator instead of List: #490
- Fix tests that were not actually running: #485
- add immutable version of sorted(Reverse)Take: #484
- Cuber/roller macros: #483
- Add sanity requirement for Approximate: #481
- Ioconnell/make develop version have snapshot suffix: #482
- Upgrade scalacheck and scalatest: #480
- Adding scoped top-N CMS monoid: #471
- Fix Qtree quantileBounds off-by-one error: #472
- Move benchmarks to JMH: #473
- Ianoc/q tree benchmark more coverage: #474
- Optimize QTree a bunch: #475
- Disable coveralls, shows up as builds pending that are long finished: #476
Sparking up some aggregators
Version 0.11.0
Move CMSHasherByteArray from scalding: #467
Upgrade sbt launcher script (sbt-extras): #469
Create case class macros for algebraic structures: #466
Refactor MapAggregator: #462
Algebird support for spark: #397
Add MapAggregator from 1 (key, aggregator) pair: #452
Remove unnecessary use of scala.math: #455
Don't call deprecated HyperLogLog methods in tests: #456
Update product_generators.rb: #457
Pzheng/gaussian euclidean: #448
QTreeAggregators and Easier to Use HLL
HyperLogLogSeries, CMS enhancements, AdaptiveCache, new Aggregators
- HyperLogLogSeries #295
- CMS: add contramap to convert CMS[K] to CMS[L], add support for String and Bytes, remove Ordering context bound for K #399
- AdaptiveCache #419
- Add MapAggregator to compose tuples of (key, agg) pairs #411
- EventuallyAggregator and variants #407
- Add HLL method to do error-based Aggregator #438
- Speed up QTree #433
- Added function to safely downsize a HyperLogLog sketch #418
- Bumping to bijection 0.8.0 #441
- Now on Scala 2.10.5
Now with better CMS hashing, easier Aggregations, and more!
- Replace mapValues with one single map to avoid serialization in frameworks like Spark. #344
- Add Fold trait for composable incremental processing (for develop) #350
- Add a GC friendly LRU cache to improve SummingCache #341
- BloomFilter should warn or raise on unrealistic input. #355
- GH-345: Parameterize CMS to CMS[K] and decouple counting/querying from heavy hitters #354
- Add Array Monoid & Group. #356
- Improvements to Aggregator #359
- Improve require setup for depth/delta and associated test spec #361
- Bump from 2.11.2 to 2.11.4 #365
- Move to sbt 0.13.5 #364
- Correct wrong comment in estimation function #372
- Add increments to all Summers #373
- removed duplicate semigroup #375
- GH-381: Fix serialization errors when using new CMS implementation in Storm #382
- Fix snoble's name #384
- Lift methods for Aggregator and MonoidAggregator #380
- applyCumulative method on Aggregator #386
- Add Aggregator.zip #389
- GH-388: Fix CMS test issue caused by roundtripping depth->delta->depth #395
- GH-392: Improve hashing of BigInt #394
- add averageFrom to DecayedValue #391
- Freshen up Applicative instances a bit #387
- less noise on DecayedValue tests #405
- Preparer #400
- Upgrade bijection to 0.7.2 #406