Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-2188] Define Initializer.AfterInitializeMemento for GoT to tunnel state from GenerateWorkUnits to CommitActivity #4091

Merged
merged 2 commits into from
Jan 7, 2025

Conversation

phet
Copy link
Contributor

@phet phet commented Jan 6, 2025

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

Stateful writer/converter Initializers, such as JdbcWriterInitializer, work fine with Gobblin-on-MR, but get disrupted by GoT. While GoMR does also launch an MR application, the remainder of the MRJobLauncher execution is within the same process. An Initializer must execute at the end of WorkDiscovery, before WorkUnit processing may begin, but is .close()d only after Job Commit completes. Crucially, with GoMR, the same Initializer instance remains in memory all throughout. With GoT, in contrast, Work Discovery and Commit execute completely independently - creating new objects/instances, perhaps even on a new host/container.

Some Initializers, such as the JdbcWriter's JdbcWriterInitializer are stateful. (In its case, to maintain the temp/staging table's name, so that may be dropped upon successful Commit.) Specific state originates during Work Discovery (the GenerateWorkUnitsImpl activity in GoT) yet must be available during Commit (the CommitActivityImpl in GoT).

Accordingly we define the Initializer.AfterInitializeMemento interface so any concrete Initializer impl has the opportunity to optionally define an opaque snapshot of its internal state that may be (de)serialized and "revived" as a new Initializer instance of the same concrete type, and holding equivalent internal state.

The evolved Initializer interface/API provides default impls for the new methods commemorate and recall, making it source-compatible with existing implementations. Any existing Initializer that does NOT maintain internal state for its close() to use may be recompiled unchanged, and succeed with GoT.

This GoT enhancement leverages JobState to tunnel AfterInitializeMementos, since it is serialized at the end of Work Discovery (in GenerateWorkUnitsImpl), and is later loaded when the Commit activity begins (in CommitStepActivityImpl).

Review Tip: start first with how Initializer.AfterInitializeMementos are created in GenerateWorkUnitsImpl and used by CommitStepActivityImpl before diving into their impl (with test).

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

includes new MultiInitializerTest

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

…state between `GenerateWorkUnits` and `CommitActivity`
Copy link
Contributor

@Blazer-007 Blazer-007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Looks great 🚀

@phet phet force-pushed the initializer-mementos-for-got branch from 418bf05 to 746e378 Compare January 7, 2025 09:33
@phet phet changed the title Create AfterInitializeMemento for Initializers for GoT to tunnel state between GenerateWorkUnits and CommitActivity [GOBBLIN-2188] Define Initializer.AfterInitializeMemento for GoT to tunnel state from GenerateWorkUnits to CommitActivity Jan 7, 2025
@phet phet merged commit a0cef28 into apache:master Jan 7, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants