Replica flush the old data after RDB file is ok in disk-based replication #926

enjoy-binbin · 2024-08-20T10:22:10Z

Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

…tion Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database. Signed-off-by: Binbin <[email protected]>

codecov · 2024-08-20T10:35:51Z

Codecov Report

Attention: Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.

Project coverage is 70.63%. Comparing base (7795152) to head (f4316a0).
Report is 80 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/replication.c	57.14%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #926      +/-   ##
============================================
+ Coverage     70.37%   70.63%   +0.25%     
============================================
  Files           112      114       +2     
  Lines         61505    61670     +165     
============================================
+ Hits          43286    43559     +273     
+ Misses        18219    18111     -108

Files with missing lines	Coverage Δ
src/replication.c	`87.29% <57.14%> (+0.14%)`	⬆️

... and 41 files with indirect coverage changes

Signed-off-by: Binbin <[email protected]>

ranshid · 2024-08-20T12:39:43Z

@enjoy-binbin did we have a test failure here, or should we introduce new test for this case?

zuiderkwast

Looks good to me.

It's not a breaking change?

The behavior change is that if the sync fails, the old data is still there, instead of deleted. It seems better. Nobody relies on the old behaviour I think...?

enjoy-binbin · 2024-08-20T15:28:18Z

did we have a test failure here, or should we introduce new test for this case?

emmm, i don't think we have a test failure here, we don't have test that will cover this case, i test it in local. We should actually add a new test to verify this, but that will require us to add some debug code. (or i can do try to make the rename fail, i can take a try)

enjoy-binbin · 2024-08-20T15:31:33Z

It's not a breaking change?
The behavior change is that if the sync fails, the old data is still there, instead of deleted. It seems better. Nobody relies on the old behaviour I think...?

I don't think it is a breaking change (but somehow it can be?). Yes, actually it is, if the following fails, we can still keep the data in memory.

        /* Make sure the new file (also used for persistence) is fully synced
         * (not covered by earlier calls to rdb_fsync_range). */
        if (fsync(server.repl_transfer_fd) == -1) {
            ...
            cancelReplicationHandshake(1);
            return;
        }

        /* Rename rdb like renaming rewrite aof asynchronously. */
        int old_rdb_fd = open(server.rdb_filename, O_RDONLY | O_NONBLOCK);
        if (rename(server.repl_transfer_tmpfile, server.rdb_filename) == -1) {
            ...
            cancelReplicationHandshake(1);
            if (old_rdb_fd != -1) close(old_rdb_fd);
            return;
        }
        /* Close old rdb asynchronously. */
        if (old_rdb_fd != -1) bioCreateCloseJob(old_rdb_fd, 0, 0);

        /* Sync the directory to ensure rename is persisted */
        if (fsyncFileDir(server.rdb_filename) == -1) {
            ...
            cancelReplicationHandshake(1);
            return;
        }

zuiderkwast · 2024-08-20T15:38:37Z

@valkey-io/core-team Do you think this is a breaking change?

Signed-off-by: Binbin <[email protected]>

hwware · 2024-08-22T14:52:27Z

I do not think it is a break change, but it changes some functions call position, but I can not find the reason why we should do this way.

src/replication.c

enjoy-binbin · 2024-09-13T02:46:15Z

i am going to merge this one, i don't think it is a breaking change, does any of you have other concerns?

tests/integration/replication.tcl

Signed-off-by: Binbin <[email protected]>

…tion (valkey-io#926) Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database. The real changes is in disk-based part, the rest is just code movement. Signed-off-by: Binbin <[email protected]> Signed-off-by: Ping Xie <[email protected]>

…tion (valkey-io#926) Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database. The real changes is in disk-based part, the rest is just code movement. Signed-off-by: Binbin <[email protected]>

…tion (valkey-io#926) Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database. The real changes is in disk-based part, the rest is just code movement. Signed-off-by: Binbin <[email protected]> Signed-off-by: Ping Xie <[email protected]>

…tion (#926) Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database. The real changes is in disk-based part, the rest is just code movement. Signed-off-by: Binbin <[email protected]> Signed-off-by: Ping Xie <[email protected]>

…tion (valkey-io#926) Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database. The real changes is in disk-based part, the rest is just code movement. Signed-off-by: Binbin <[email protected]>

…tion (valkey-io#926) Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database. The real changes is in disk-based part, the rest is just code movement. Signed-off-by: Binbin <[email protected]> Signed-off-by: naglera <[email protected]>

Replica flush the old data after RDB file is ok in disk-based replica…

931f6e6

…tion Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database. Signed-off-by: Binbin <[email protected]>

enjoy-binbin requested a review from zuiderkwast August 20, 2024 10:22

enjoy-binbin added 2 commits August 20, 2024 19:22

more logs

b506dfa

Signed-off-by: Binbin <[email protected]>

fix format

d5423f6

Signed-off-by: Binbin <[email protected]>

zuiderkwast approved these changes Aug 20, 2024

View reviewed changes

add a test

23d1b8d

Signed-off-by: Binbin <[email protected]>

hwware reviewed Aug 22, 2024

View reviewed changes

src/replication.c Show resolved Hide resolved

enjoy-binbin added the release-notes This issue should get a line item in the release notes label Sep 4, 2024

enjoy-binbin commented Sep 14, 2024

View reviewed changes

tests/integration/replication.tcl Outdated Show resolved Hide resolved

Update tests/integration/replication.tcl

f4316a0

Signed-off-by: Binbin <[email protected]>

enjoy-binbin merged commit 1739038 into valkey-io:unstable Sep 14, 2024
44 checks passed

enjoy-binbin deleted the flush_db_after_rdb_ok branch September 14, 2024 03:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replica flush the old data after RDB file is ok in disk-based replication #926

Replica flush the old data after RDB file is ok in disk-based replication #926

enjoy-binbin commented Aug 20, 2024 •

edited

Loading

codecov bot commented Aug 20, 2024 •

edited

Loading

ranshid commented Aug 20, 2024

zuiderkwast left a comment

enjoy-binbin commented Aug 20, 2024

enjoy-binbin commented Aug 20, 2024

zuiderkwast commented Aug 20, 2024

hwware commented Aug 22, 2024

enjoy-binbin commented Sep 13, 2024

Replica flush the old data after RDB file is ok in disk-based replication #926

Replica flush the old data after RDB file is ok in disk-based replication #926

Conversation

enjoy-binbin commented Aug 20, 2024 • edited Loading

codecov bot commented Aug 20, 2024 • edited Loading

Codecov Report

ranshid commented Aug 20, 2024

zuiderkwast left a comment

Choose a reason for hiding this comment

enjoy-binbin commented Aug 20, 2024

enjoy-binbin commented Aug 20, 2024

zuiderkwast commented Aug 20, 2024

hwware commented Aug 22, 2024

enjoy-binbin commented Sep 13, 2024

enjoy-binbin commented Aug 20, 2024 •

edited

Loading

codecov bot commented Aug 20, 2024 •

edited

Loading