Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-Pick Lock leaf partitions for Insert Statement when GDD disabled #971

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yjhjstz
Copy link
Member

@yjhjstz yjhjstz commented Feb 27, 2025

Fixes #930

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


kainwen and others added 2 commits February 28, 2025 00:46
Greenplum will lock relation on QD during parse-analyze stage and then
dispatch the plan to QEs to execute. Note the dispatch is async. For
insert statement on root, we cannot predict which leaf partitions will
be inserted. Thus, if we never lock leaf partitions on QD, this leads
to async dispatch to QEs, means, some of the session's QEs will
execute insert and hold locks, some of the session's QEs might be
blocked by other sessions. The problematical lock pattern in MPP
database  is: without locks on QD and first time hold lock is on QEs.

An global deadlock case is:

First, create a partition table

```sql
create table rank (id int, year int)
distributed randomly
partition by range (year)
(start (2006) end (2009) every (1));
```

Second, in a new session (named `sess1`)

```sql
set gp_vmem_idle_resource_timeout = 1000000; -- make sure IDLE QEs not recycled
select pg_backend_pid(); -- find the QD's process id
select sess_id from pg_stat_activity where pid = pg_backend_pid(); -- this is to find the session id
select * from gp_dist_random('gp_id'); -- this is to create a gang
-- now run ps -ef | grep postgres | grep con<session id> we can get the QEs' pids

-- Use GDB to attach to one of this session's  QE
-- In GDB, set a break point at exec_mpp_query
-- (gdb) b exec_mpp_query
-- then continue
-- (gdb) c

-- now we run insert into this partition table by directly inserting into the root
insert into rank select i,i%3+2006 from generate_series(1, 1000)i;
-- the above insert will hang, because we set a break point on one of its QE
```

Third, start a new session (named `sess2`)

```sql
truncate rank_1_prt_2; -- truncate a leaf
-- the above truncate SQL will not be blocked on QD, because sess 1's insert do not hold locks for leafs on QD
-- so the above truncate will be dispatched to segments
-- on the segments that we do not set break point of sess1's QE, this truncation can execute, and hold locks on the
-- leaf relation; however, on the other segments, this truncation will be blocked by the insert statement on leaf locks
-- it will hang now
```

Finally, quit the gdb of session 1's QE, then that QE will continue and try to hold locks on leaf, and will be blocked by
the truncate.

Thus deadlock happens:
* on one segment, sess1 wait for sess 2 (insert wait for the truncate)
* on the other segments, sess 2 wait for sess 1(truncate wait for insert)

See Issue https://github.com/greenplum-db/gpdb/issues/13652 for
details.

This commit fixes the issue by locking all leaf partitions on QD
during parse-analyze stage when GDD is disabled. If GDD is enabled, we
can break global deadlock.

fix lockmode

fix partition_locking
The injected fault 'func_init_plan_end' is set only triggered once, and it's
possible that the fault can be triggered by dtx recovery process before than
insert statement. So in this case, the insert statement is not blocked which
causes the pipeline failed.

Inject the fault 'func_init_plan_end' in the same session as the insert, and
specify the current sessionid, so that the fault can't be triggered by other
sessions.

The test insert_root_partition_truncate_deadlock with gdd have the same problem.
@yjhjstz yjhjstz marked this pull request as ready for review February 28, 2025 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Cherry-pick] Lock leaf partitions for Insert Statement when GDD disabled.
4 participants