Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] fix ingestion hang because of alter job timeout (backport #55207) #55236

Merged
merged 1 commit into from
Jan 20, 2025

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Jan 20, 2025

Why I'm doing:

In shared-data cluster, alter job will increase partition's next version and then change state to FINISHED_REWRITING. But if this alter job is timeout, it won't be executed and can't finish, it will lead to version gap and cause issue like:

2025-01-13 05:13:26.728Z ERROR (lake-publish-task-160840|200302) [PublishVersionDaemon.publishPartitionBatch():488] publish partition batch partition.getVisibleVersion() + 1 != version.get(0) 4473684 37071 37073

What I'm doing:

Two changes in my PR:

  1. Only skip the alter job which can be cancelled.
  2. Fix timeout setting when create LakeTableAlterMetaJob.

This pull request includes changes to improve the handling of job cancellation and timeouts in the AlterJobV2 class, as well as a minor adjustment to the timeout parameter in the SchemaChangeHandler class. The most important changes are summarized below:

Improvements to job cancellation handling:

Adjustments to timeout parameter:

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

@wanpengfei-git wanpengfei-git merged commit bdee0e4 into branch-3.4 Jan 20, 2025
35 of 36 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-3.4/pr-55207 branch January 20, 2025 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants