Made initial table synchronization parallel. #198
+799
−78
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Made initial table synchronization parallel.
(at pglogical_apply.c:1802 in process_syncing_tables())
If no worker slots are available to launch bgworker, the process just throws WARNING, and retries again after some time, rather than throwing ERROR and exiting.
(at pglogical_worker.c:142 and line 207 in pglogical_worker_register())
Up to max_sync_workers_per_subscription sync workers will be created per subscription, and that many tables will be sync-ed parallelly.
(at pglogical_worker.c:124 in pglogical_worker_register())
During initial table data copying, we used to create one more backend connection to the target database, with the backend in COPY FROM stdin mode, and the bgworker used to route CopyData from "COPY TO" output from publisher to this backend. Now, this extra backend connection code has been removed. Instead, the sync worker directly writes into the underlying target database. This sync worker is also in replication mode, meaning that FK violation constraint triggers won't get called, and this allows us to sync each table in any arbitrary order.
(at pglogical_sync.c:516 in copy_table_data())
Now the subscription init code just gets the list of tables from the subscribed replication_set and add those table's meta into the local meta and mark them as in INIT state. Then the apply process spawns sync workers for each non-READY tables. In contrast, in the old code, the subscription init process used to get the list of tables and copied them sequentially.
(at pglogical_sync.c:622 in copy_replication_sets_data() and pglogical_apply.c:1792 in process_syncing_tables())
In the previous code, sync worker in CATCHUP state doesn't exit even if it has replayed up to the required LSN. It exits only after receiving at least one logical change (WAL) from the publisher, because the exit code is written only in handle_commit() function. Now the same code(after little modification) from handle_commit() is copied into the process_syncing_tables() as well. And this process_syncing_tables() will get called periodically, and within that, the sync process exits if it has caught up with the apply process.
(at pglogical_apply.c:1808 in process_syncing_tables())