Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replica flush the old data after RDB file is ok in disk-based replication #926

Merged
merged 5 commits into from
Sep 14, 2024

Conversation

enjoy-binbin
Copy link
Member

@enjoy-binbin enjoy-binbin commented Aug 20, 2024

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

…tion

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.

Signed-off-by: Binbin <[email protected]>
Copy link

codecov bot commented Aug 20, 2024

Codecov Report

Attention: Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.

Project coverage is 70.63%. Comparing base (7795152) to head (f4316a0).
Report is 80 commits behind head on unstable.

Files with missing lines Patch % Lines
src/replication.c 57.14% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #926      +/-   ##
============================================
+ Coverage     70.37%   70.63%   +0.25%     
============================================
  Files           112      114       +2     
  Lines         61505    61670     +165     
============================================
+ Hits          43286    43559     +273     
+ Misses        18219    18111     -108     
Files with missing lines Coverage Δ
src/replication.c 87.29% <57.14%> (+0.14%) ⬆️

... and 41 files with indirect coverage changes

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
@ranshid
Copy link
Member

ranshid commented Aug 20, 2024

@enjoy-binbin did we have a test failure here, or should we introduce new test for this case?

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

It's not a breaking change?

The behavior change is that if the sync fails, the old data is still there, instead of deleted. It seems better. Nobody relies on the old behaviour I think...?

@enjoy-binbin
Copy link
Member Author

did we have a test failure here, or should we introduce new test for this case?

emmm, i don't think we have a test failure here, we don't have test that will cover this case, i test it in local. We should actually add a new test to verify this, but that will require us to add some debug code. (or i can do try to make the rename fail, i can take a try)

@enjoy-binbin
Copy link
Member Author

It's not a breaking change?
The behavior change is that if the sync fails, the old data is still there, instead of deleted. It seems better. Nobody relies on the old behaviour I think...?

I don't think it is a breaking change (but somehow it can be?). Yes, actually it is, if the following fails, we can still keep the data in memory.

        /* Make sure the new file (also used for persistence) is fully synced
         * (not covered by earlier calls to rdb_fsync_range). */
        if (fsync(server.repl_transfer_fd) == -1) {
            ...
            cancelReplicationHandshake(1);
            return;
        }

        /* Rename rdb like renaming rewrite aof asynchronously. */
        int old_rdb_fd = open(server.rdb_filename, O_RDONLY | O_NONBLOCK);
        if (rename(server.repl_transfer_tmpfile, server.rdb_filename) == -1) {
            ...
            cancelReplicationHandshake(1);
            if (old_rdb_fd != -1) close(old_rdb_fd);
            return;
        }
        /* Close old rdb asynchronously. */
        if (old_rdb_fd != -1) bioCreateCloseJob(old_rdb_fd, 0, 0);

        /* Sync the directory to ensure rename is persisted */
        if (fsyncFileDir(server.rdb_filename) == -1) {
            ...
            cancelReplicationHandshake(1);
            return;
        }

@zuiderkwast
Copy link
Contributor

@valkey-io/core-team Do you think this is a breaking change?

Signed-off-by: Binbin <[email protected]>
@hwware
Copy link
Member

hwware commented Aug 22, 2024

I do not think it is a break change, but it changes some functions call position, but I can not find the reason why we should do this way.

@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label Sep 4, 2024
@enjoy-binbin
Copy link
Member Author

i am going to merge this one, i don't think it is a breaking change, does any of you have other concerns?

@enjoy-binbin enjoy-binbin merged commit 1739038 into valkey-io:unstable Sep 14, 2024
44 checks passed
@enjoy-binbin enjoy-binbin deleted the flush_db_after_rdb_ok branch September 14, 2024 03:49
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…tion (valkey-io#926)

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…tion (valkey-io#926)

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…tion (valkey-io#926)

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
enjoy-binbin added a commit to enjoy-binbin/valkey that referenced this pull request Sep 14, 2024
…tion (valkey-io#926)

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

Signed-off-by: Binbin <[email protected]>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…tion (valkey-io#926)

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…tion (valkey-io#926)

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…tion (valkey-io#926)

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
PingXie pushed a commit that referenced this pull request Sep 15, 2024
…tion (#926)

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Oct 9, 2024
…tion (valkey-io#926)

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

Signed-off-by: Binbin <[email protected]>
naglera pushed a commit to naglera/placeholderkv that referenced this pull request Oct 10, 2024
…tion (valkey-io#926)

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: naglera <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants