Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce sampling to StateBackedIterable #33621

Closed
wants to merge 130 commits into from

Conversation

stankiewicz
Copy link
Contributor

Fixes #33620

StateBackedIterable currently encodes every element to report it's size to observer.
With this change it will take into account isRegisterByteSizeObserverCheap response and depending on it it will encode or sample. Sampling algorithm is similar to other places e.g. [OutputObjectAndByteCounter](runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputObjectAndByteCounter.java].

…still encode as is, for others it will sample.
@github-actions github-actions bot added the java label Jan 16, 2025
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @Abacn for label java.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

dependabot bot and others added 25 commits January 16, 2025 13:42
Bumps [golang.org/x/text](https://github.com/golang/text) from 0.20.0 to 0.21.0.
- [Release notes](https://github.com/golang/text/releases)
- [Commits](golang/text@v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: golang.org/x/text
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#33303)

Bumps [github.com/aws/aws-sdk-go-v2/feature/s3/manager](https://github.com/aws/aws-sdk-go-v2) from 1.17.38 to 1.17.43.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](aws/aws-sdk-go-v2@credentials/v1.17.38...credentials/v1.17.43)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/feature/s3/manager
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* enable managed service

* format fix

* trigger xlang tests
* [yaml] various inline provider doc fixes

Signed-off-by: Jeffrey Kinard <[email protected]>

* Update yaml_combine.py

---------

Signed-off-by: Jeffrey Kinard <[email protected]>
Co-authored-by: Robert Bradshaw <[email protected]>
* Add throttling metrics and retries to vertex embeddings

* Format + run postcommits

* fix + lint
* create unit test

* minimize to not using flatmaptuple

* fix by adding a tuple conersion in flatmaptuple

* add comment referring to ticket

* remove extra pipeline

* manually isort

* retrigger builder

* retrigger builder

* isort?

* try manually isorting again

* Revert "try manually isorting again"

This reverts commit a0fac32.

* manually fix isort
* Reapply "bump hadoop version (apache#33011)" (apache#33257)

This reverts commit 7e25649.

* Fix hbase and hcatalog test dependencies

* Add missing pinned hadoop dependency version for compat test target
* initial benchmark framework code

* Implement Dataflow cost benchmark framework + add wordcount example

* formatting

* move to base wordcount instead

* add comment for pipeline execution in wordcount
* [java] BQ: add missing avro conversions to BQ TableRow

Avro float fields can be used to write BQ FLOAT columns.
Add TableRow conversion for such field.

Adding conversion for aveo 1.10+ logical types local-timestamp-millis
and local-timestam-micros.

* Rework tests

* Add map and fixed types conversion

* Fix checkstyle

* Use valid parameters

* Test record nullable field
* Remove use of google-github-actions/auth step.

* Create beam_PostCommit_Java_IO_Performance_Tests.json
* add hadoop auth

* trigger xlang tests

* place dep in expansion service
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.31.0 to 0.32.0.
- [Commits](golang/net@v0.31.0...v0.32.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ks (apache#33352)

Bumps [github.com/nats-io/nats-server/v2](https://github.com/nats-io/nats-server) from 2.10.22 to 2.10.23.
- [Release notes](https://github.com/nats-io/nats-server/releases)
- [Changelog](https://github.com/nats-io/nats-server/blob/main/.goreleaser.yml)
- [Commits](nats-io/nats-server@v2.10.22...v2.10.23)

---
updated-dependencies:
- dependency-name: github.com/nats-io/nats-server/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#32674)

* bump confluent version

Kafka Schema Registry Client has been reported with following vuln 
CVE-2024-26308
CVE-2024-25710 due to vulnerable dependencies.

* try slighly older version due to unmet dependencies to ThrottlingQuotaExceededException

* try slighly older version due to unmet dependencies to ThrottlingQuotaExceededException

* comment on version
…e#33351)

Bumps [cloud.google.com/go/profiler](https://github.com/googleapis/google-cloud-go) from 0.4.1 to 0.4.2.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@ai/v0.4.1...apps/v0.4.2)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/profiler
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: StateBackedIterable serializes elements size for every element when ComposedCombine is used