Skip to content

Releases: twitter/algebird

Scala 2.12 goodness!

13 Feb 21:43
Compare
Choose a tag to compare

This is the first Algebird release to publish Scala 2.12 artifacts!
Apart from that, here are some of the changes since the last release:

Various BloomFilter improvements:
Remove seed variable in BloomFilter and rename k to hashIndex: #557
Polymorphic Bloom filters: #607
Optimize BloomFilter significantly: #610
Bloom filter distance function: #612
Optimize Hamming distance for Bloom Filters: #617

Incorporate more Algebra types:
Use standard algebra types: #523
Use more algebra types - #620

SpaceSaver updates:
Widen the visibility of SpaceSaver.SSMany, SpaceSaver.SSOne: #577
SpaceSaver fromBytes & toBytes: #603
Catch OOM in SpaceSaverTest: #614

Remove typeclass from interval constructor: #605

Better toString in ExpHistogram: #604

Remove legacy CountMinSketchMonoid: #602
Convert all laws to take Equiv instances, deprecate Equiv versions: #595
replace FromIntLike with Ring and toK function: #594
bail out of SemigroupMacro.sumOption for to.isEmpty: #599
Handle empty in Generated{Product, Abstract}Algebra: #597
Add explicit return types to Group instances for Moments, AveragedValue: #596
Remove view bounds on Moments, DecayedValue, AveragedValue: #592
Add MonoidAggregator.collectBefore: #611

Thanks to @johandahlberg , @johnynek , @ElPicador , @sritchie , @isnotinvain for the contributions!

Early Scala 2.12 release

31 Jan 01:43
Compare
Choose a tag to compare
Pre-release

This is an early release of some Scala 2.12 Algebird packages that contains some binary incompatible changes. Please pick up release: https://github.com/twitter/algebird/releases/tag/0.13.0. That contains the appropriate set of Scala 2.12 Algebird artifacts.

Even Faster SumOption!

02 Dec 22:53
3c0d3a5
Compare
Choose a tag to compare

The main new feature of this release is a faster (benchmarked!) implementation of tuple and product semigroup sumOptions. This means if you are aggregating on scalding or spark, you should see a significant (~ 2x faster).

There is a new Set membership monoid called SetDiff. It can model adding and removing from sets (which can be useful for applications in summingbird).

We have an exponential histogram Fold, which is an approximate data-structure that can tell you approximate counts over sliding windows (see #568). A future work will add a monoid for this type, however when possible, using the Fold is better since it has better error properties.

Lastly, there are many new docs.

Huge thanks to @sritchie who was the main contributor to this release.

changelog:

  • Add SetDiff data structure to algebird-core: #555
  • Add Ring[BigDecimal], modeled after Ring[BigInt]: #553
  • "Exponential Histogram" sliding window counter implementation added to algebird-core as ExpHist: #568
  • improve HLLSeries performance: #575
  • Add a microsite at https://twitter.github.io/algebird via the sbt-microsites plugin, along with docs for all typeclasses and data structures: #576
  • Adds lots of scalacheck Arbitrary and Gen instances to algebird-test, under com.twitter.algebird.scalacheck.{ gen, arbitrary }: #579
  • Add Monoid[Max[Vector[T]]], Monoid[Max[Stream[T]]]: #579
  • Add FirstAggregator and LastAggregator, and docs and API / perf improvements for First, Last, Min, Max: #579
  • Add LawsEquiv versions of all laws: #584
  • Deprecates broken group/ring for Future/Try: #584
  • Add metricsLaws[T] to BaseProperties in algebird-test: #584
  • Modify generated Tuple2Monoid, etc to extend TupleNSemigroup, giving subclasses access to efficient sumOption: #585
  • optimize Generated{Abstract,Product}Algebra.sumOption with benchmarking #591
  • Add an efficient sumOption, +, -, methods and docs to AveragedValue: #589

Add me maybe

23 Sep 22:22
e260fa2
Compare
Choose a tag to compare

This is an optimization and bug-fix release that is compatible with 0.12.x. We add two new features: Semigroup.maybePlus[T](t: T, o: Option[T]): T and Aggregator.numericSum to convert to double and and sum from any scala.math.Numeric.

The full change log is below. Thanks to all contributors!

  • Optimize CMS.create(Seq[K]) #537
  • Add sumOption support to primitive Rings #538
  • Add Aggregator.numericSum function #539
  • Add code coverage checks #541
  • Add Semigroup.maybePlus for combining a value with an optional value. #544
  • Clean up compilation warnings #546
  • Fix algebird-spark to work with spark 2 (#550)

Better Aggregator Methods, Faster CountMinSketch and Batched!

26 Jun 00:43
088bb65
Compare
Choose a tag to compare

This release adds many convenience methods to Aggregator, adds a new type called Batched[T], and speeds up CMS.

Aggregator now has methods for reservoir sampling, and more top-K (sort*Take) aggregators. Batched allows you to defer doing any work on plus until you have a certain size, then it calls sumOption internally. This is designed for aggregations that are expensive to do iteratively, but sumOption can be made efficient. Lastly, CMS was significantly improved in performance, a sumOption method was added, and a mutable builder (CMSSummation) was added (see #533).

This release should be 100% binary compatible with 0.12.0 (this check is now part of the travis-ci checks we run).

  • Add an Identity Monad #511
  • Improve toRichTraverable to work with Iterator also #518 #535
  • fix several flakey tests #510 #514 #525
  • Improve SpaceSaver design #519
  • Add sortByTake, sortByReverseTake to Aggregator #527
  • Add a randomSample and reservoirSample aggregators #529
  • Add a Batched type for converting plus to sumOption (defer plus until you have a batch): #530
  • Add a default size to appoximatePercentile: #531
  • Add a .group method to MapAlgebra and RichTraversable #532
  • Optimize CountMinSketch, add a mutable Builder for faster construction: #533

Thank you to:
@joshualande @non @dossett @jnievelt @piyushnarang @koertkuipers @Gabriel439 @NathanHowell @johnynek @ianoc

More Speed

02 Feb 03:13
Compare
Choose a tag to compare
  • Implement an appendMonoid Aggregator factory which yields aggregators…: #501
  • Dealing with probabilistic tests: #478
  • Add Applicative.sequenceGen: #498
  • Create a sparse Count-Min-Sketch.: #464
  • fix name and visibility of negativePowersOfTwo: #492
  • Speed up HLL presentation by 100x: #491
  • Test Semigroup#sumOption using Iterator instead of List: #490
  • Fix tests that were not actually running: #485
  • add immutable version of sorted(Reverse)Take: #484
  • Cuber/roller macros: #483
  • Add sanity requirement for Approximate: #481
  • Ioconnell/make develop version have snapshot suffix: #482
  • Upgrade scalacheck and scalatest: #480
  • Adding scoped top-N CMS monoid: #471
  • Fix Qtree quantileBounds off-by-one error: #472
  • Move benchmarks to JMH: #473
  • Ianoc/q tree benchmark more coverage: #474
  • Optimize QTree a bunch: #475
  • Disable coveralls, shows up as builds pending that are long finished: #476

Sparking up some aggregators

29 Jul 15:55
Compare
Choose a tag to compare

Version 0.11.0

Move CMSHasherByteArray from scalding: #467
Upgrade sbt launcher script (sbt-extras): #469
Create case class macros for algebraic structures: #466
Refactor MapAggregator: #462
Algebird support for spark: #397
Add MapAggregator from 1 (key, aggregator) pair: #452
Remove unnecessary use of scala.math: #455
Don't call deprecated HyperLogLog methods in tests: #456
Update product_generators.rb: #457
Pzheng/gaussian euclidean: #448

QTreeAggregators and Easier to Use HLL

21 May 17:56
Compare
Choose a tag to compare
  • Make HLL easier to use, add Hash128 typeclass #440
  • add ! to ApproximateBoolean #442
  • add QTreeAggregator and add approximatePercentileBounds to Aggregator #443
  • Make level configurable in QTreeAggregators #444

HyperLogLogSeries, CMS enhancements, AdaptiveCache, new Aggregators

11 May 21:07
Compare
Choose a tag to compare
  • HyperLogLogSeries #295
  • CMS: add contramap to convert CMS[K] to CMS[L], add support for String and Bytes, remove Ordering context bound for K #399
  • AdaptiveCache #419
  • Add MapAggregator to compose tuples of (key, agg) pairs #411
  • EventuallyAggregator and variants #407
  • Add HLL method to do error-based Aggregator #438
  • Speed up QTree #433
  • Added function to safely downsize a HyperLogLog sketch #418
  • Bumping to bijection 0.8.0 #441
  • Now on Scala 2.10.5

Now with better CMS hashing, easier Aggregations, and more!

21 Jan 23:20
Compare
Choose a tag to compare
  • Replace mapValues with one single map to avoid serialization in frameworks like Spark. #344
  • Add Fold trait for composable incremental processing (for develop) #350
  • Add a GC friendly LRU cache to improve SummingCache #341
  • BloomFilter should warn or raise on unrealistic input. #355
  • GH-345: Parameterize CMS to CMS[K] and decouple counting/querying from heavy hitters #354
  • Add Array Monoid & Group. #356
  • Improvements to Aggregator #359
  • Improve require setup for depth/delta and associated test spec #361
  • Bump from 2.11.2 to 2.11.4 #365
  • Move to sbt 0.13.5 #364
  • Correct wrong comment in estimation function #372
  • Add increments to all Summers #373
  • removed duplicate semigroup #375
  • GH-381: Fix serialization errors when using new CMS implementation in Storm #382
  • Fix snoble's name #384
  • Lift methods for Aggregator and MonoidAggregator #380
  • applyCumulative method on Aggregator #386
  • Add Aggregator.zip #389
  • GH-388: Fix CMS test issue caused by roundtripping depth->delta->depth #395
  • GH-392: Improve hashing of BigInt #394
  • add averageFrom to DecayedValue #391
  • Freshen up Applicative instances a bit #387
  • less noise on DecayedValue tests #405
  • Preparer #400
  • Upgrade bijection to 0.7.2 #406