Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop 3.0 -> 3.1 merge #9

Open
wants to merge 314 commits into
base: develop-3.1
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
314 commits
Select commit Hold shift + click to select a range
cafb81b
Change API to return pause and to configure pause length for test
ThomasArts Jun 4, 2019
20ff0e6
Simplify bucketprefix by recomputing the length when needed
ThomasArts Jun 4, 2019
d349985
Update model
ThomasArts Jun 4, 2019
bd77b56
No warnings
ThomasArts Jun 4, 2019
1befc45
Fix tests to pause return
ThomasArts Jun 4, 2019
9096b54
Send only to oneself, such that if crashes inbetween message does not…
ThomasArts Jun 4, 2019
77a0d0b
Important to remember always to call done_work after fetching
ThomasArts Jun 4, 2019
97aed9a
Add special mocking for eqc
ThomasArts Jun 4, 2019
4191fc8
Initial snk model
ThomasArts Jun 4, 2019
d2144d6
We need a little sleep to force a context switch
ThomasArts Jun 4, 2019
64809c6
Add feature for queue def recording
ThomasArts Jun 10, 2019
a687d16
Seems API failure, found by code inspection
ThomasArts Jun 10, 2019
e2d3a2e
Merge pull request #30 from Quviq/mas-i1691-ttaaefullsync
martinsumner Jun 10, 2019
d2182ae
Add DBid support to leveled backend
martinsumner Jun 15, 2019
95f733b
Some cleanup of replrtq_src model
UlfNorell Jun 10, 2019
21ebeb2
whitespace
UlfNorell Jun 10, 2019
0fe1c0f
Remove EQC ifdefs from replrtq_snk
UlfNorell Jun 10, 2019
86092cb
Change replrtq_snk test to check worker saturation
UlfNorell Jun 10, 2019
6da2b47
BUGFIX: fix crash when more workers than work
UlfNorell Jun 10, 2019
e2bd4f6
BUGFIX: reschedule work immediately on done_work
UlfNorell Jun 10, 2019
2461585
Handle multiple queues in replrtq_snk model
UlfNorell Jun 11, 2019
7baa2c9
Model suspend and resume
UlfNorell Jun 14, 2019
ddcaaa0
Clean up model setup
UlfNorell Jun 14, 2019
0dc705c
Refactor monitor to keep track of peers and workers and allow reusing…
UlfNorell Jun 14, 2019
765f1a9
Model adding queues
UlfNorell Jun 14, 2019
47202fc
Model removing queues
UlfNorell Jun 14, 2019
56761a1
Fix bug in model
UlfNorell Jun 14, 2019
ce68bff
Work around bug when adding a removed queue with different peers
UlfNorell Jun 14, 2019
3881627
More slack after suspend/remove
UlfNorell Jun 14, 2019
3683a31
Add commented options
martinsumner Jun 19, 2019
f27b7f0
Merge pull request #32 from Quviq/mas-i1691-ttaaefullsync
martinsumner Jun 20, 2019
dd00040
simplify rpwl sink configuration
martinsumner Jun 21, 2019
57eb2cb
Fix config typos
martinsumner Jun 25, 2019
5831387
Experiment with a more patient consume
martinsumner Jun 26, 2019
c0f9163
Sleep ms not microsec
martinsumner Jun 26, 2019
a6a4433
Add log on completion of full-sync
martinsumner Jun 26, 2019
d80ea96
Randomise delays for work thrown back
martinsumner Jun 26, 2019
f7f3e53
Experiment
martinsumner Jun 26, 2019
e7a9f37
Experiment pt 2
martinsumner Jun 26, 2019
81bc0ad
Increase pause when exchange
martinsumner Jun 26, 2019
b323f1e
Require repair only of dominated objects
martinsumner Jun 26, 2019
5e8b8a4
Try alternate configuration
martinsumner Jun 27, 2019
a8e36dd
Switch to alternate n compare based kc_index_tictactree
martinsumner Jun 28, 2019
73401e0
Allow for logging of repairs
martinsumner Jun 28, 2019
a4aa1ae
GET_FSM with expected clock
martinsumner Jul 3, 2019
0296518
Reoslve handling of siblings
martinsumner Jul 4, 2019
364f626
Change repl push to w=1
martinsumner Jul 4, 2019
fae7e00
Merge branch 'develop-2.9' into mas-i1691-ttaaefullsync
martinsumner Aug 6, 2019
e7125d8
Update rebar.config
martinsumner Aug 8, 2019
997e30b
Typo tidy-up
martinsumner Aug 13, 2019
d064f6d
Merge pull request #39 from basho/develop-2.9
martinsumner Sep 3, 2019
6355454
Merge branch 'mas-i1691-ttaaefullsync' into develop-2.9
martinsumner Sep 3, 2019
77313b4
Merge pull request #40 from martinsumner/develop-2.9
martinsumner Sep 3, 2019
9fff5d4
Add support for PB fetch from sink
martinsumner Sep 5, 2019
7476d99
Further amendments to support pb repl
martinsumner Sep 5, 2019
cddf3e7
Attempt to reuse PB socket
martinsumner Sep 5, 2019
3ceeb19
Revert "Attempt to reuse PB socket"
martinsumner Sep 5, 2019
8c733c9
Allow for reuse of PB client
martinsumner Sep 6, 2019
d1fa6d7
Merge branch 'mas-i1691-ttaaefullsync' into develop-2.9
martinsumner Nov 6, 2019
f5daaa4
Merge pull request #42 from basho/develop-2.9
martinsumner Nov 6, 2019
042dce7
Revert "Merge branch 'mas-i1691-ttaaefullsync' into develop-2.9"
martinsumner Nov 6, 2019
e77fc61
Reoslve merge issue
martinsumner Nov 6, 2019
694110f
Refactor for dialyzer
martinsumner Nov 6, 2019
48e085b
Update riak_kv_replrtq_snk.erl
martinsumner Nov 6, 2019
7f0eca0
Update rebar.config
martinsumner Nov 7, 2019
53cd307
Initial PB API for aae fold
martinsumner Nov 8, 2019
04df2f5
Update for message changes
martinsumner Nov 11, 2019
74b3ae2
Change to more generic pb api
martinsumner Nov 12, 2019
b7cf019
keycountresp requires type
martinsumner Nov 12, 2019
b128e99
Switch to response_type
martinsumner Nov 12, 2019
645de9d
Undecode branches for consistency
martinsumner Nov 13, 2019
0629233
Typos and format corrections
martinsumner Nov 13, 2019
b339a28
Merge pull request #1717 from basho/mas-i1714-readonlyfs
martinsumner Nov 18, 2019
a292d17
Add pb client support for full-sync
martinsumner Nov 18, 2019
f3940cd
Merge pull request #1732 from basho/mas-i1676-aaefoldpb
martinsumner Nov 18, 2019
e772239
Switch api branch
martinsumner Nov 18, 2019
fac5cea
Update rebar.config
martinsumner Nov 18, 2019
22e16bd
Merge remote-tracking branch 'upstream/develop-2.9' into mas-i1691-tt…
martinsumner Nov 18, 2019
4e228c7
Merged develop-2.9 and mas-1691-ttaaefullsync
martinsumner Nov 18, 2019
f9eeaa7
Merge pull request #1733 from martinsumner/mas-i1691-ttaaefullsync
martinsumner Nov 18, 2019
305ffc6
Switch to basho fork
martinsumner Nov 18, 2019
cee82b3
Bump tags for release
martinsumner Nov 20, 2019
1091a17
Correct tag reference for release
martinsumner Nov 20, 2019
c10eb71
Update rebar.config
martinsumner Nov 20, 2019
5281c68
Update rebar.config
martinsumner Nov 20, 2019
ca97e1f
Add range_repl
martinsumner Nov 26, 2019
c018280
Add reap request
martinsumner Nov 29, 2019
15fb685
Add riak_client API to single object reap
martinsumner Dec 2, 2019
52d896d
Add find_tombs to aae_fold
martinsumner Dec 3, 2019
2f543cb
Add reap_tombs fold
martinsumner Dec 3, 2019
bca4179
riak_client and http/pb api need consistent results
martinsumner Dec 4, 2019
bad3af7
Add reap fold that just counts
martinsumner Dec 4, 2019
6dd69d2
Addition of erase folds
martinsumner Dec 6, 2019
b6259bf
Tidy up errors on client failure
martinsumner Dec 6, 2019
36ee389
Extend PB API for aae_fold
martinsumner Dec 9, 2019
d9309dd
Add missing response handler
martinsumner Dec 9, 2019
3638239
Increase range to include new aae_fold requests
martinsumner Dec 9, 2019
31dc84e
Extend HTTP API for aae_fold
martinsumner Dec 9, 2019
148bc1d
Add aee list_bucket fold
martinsumner Dec 10, 2019
e6f0178
Extend and correct aae list_buckets API
martinsumner Dec 11, 2019
0513c9f
Corrections to HTTP API - list buckets (aae_fold)
martinsumner Dec 11, 2019
03a10e6
Include option for security on nextgen_repl
martinsumner Dec 12, 2019
9c234cb
Fix the spec for init_client
martinsumner Dec 16, 2019
47f4d9c
Merge pull request #1738 from basho/mas-i1691-ssl
martinsumner Jan 9, 2020
0c99b08
Remove repl_cache
martinsumner Jan 9, 2020
20d6780
Change sink worker to allow for multiple queues in config
martinsumner Jan 10, 2020
e6297b9
Adapt to new peers format
ThomasArts Dec 20, 2019
4d544ed
Adapt to queue part of config format
ThomasArts Jan 12, 2020
04f5779
Reduce number workers to max 5
ThomasArts Jan 12, 2020
9bf8207
Push objects to queue
martinsumner Jan 12, 2020
37a50d8
Merge pull request #1740 from Quviq/mas-i1691-snkworker
martinsumner Jan 13, 2020
c747955
Add explanation to set_workercount/2
martinsumner Jan 13, 2020
27e1222
Update comment on tokenised string
martinsumner Jan 13, 2020
9b52077
Merge pull request #1739 from basho/mas-i1691-snkworker
martinsumner Jan 13, 2020
d91d5a0
Add size limit on objects on nextgenrepl queue
martinsumner Jan 13, 2020
cc20cb9
Merge branch 'mas-i1691-ttaaefullsync' into mas-i1691-pushobject
martinsumner Jan 13, 2020
e9005b9
Change schema to have nextgenrepl disabled by default
martinsumner Jan 13, 2020
62b72b1
Tidy-up
martinsumner Jan 14, 2020
b234b99
Add stats to the fetch and push repl process
martinsumner Jan 15, 2020
adf78e1
Enforce priority to be 1 or 2
martinsumner Jan 15, 2020
74d129f
Ping the aae_controller
martinsumner Jan 16, 2020
d8d0854
Add notion of changeable worker size
ThomasArts Jan 16, 2020
4c6220b
Merge pull request #1742 from Quviq/mas-i1691-pushobject
martinsumner Jan 17, 2020
eba6045
Merge pull request #1741 from basho/mas-i1691-pushobject
martinsumner Jan 17, 2020
1c9105d
Merge branch 'mas-i1691-ttaaefullsync' into mas-i1691-pauseonbacklog
martinsumner Jan 17, 2020
9d02654
Stop client after sync
martinsumner Jan 18, 2020
c1d06c0
Stop client .. after it has been used
martinsumner Jan 18, 2020
dce6faf
Need to prompt_work
martinsumner Jan 20, 2020
a0a6e62
Resolve consistency of client count
martinsumner Jan 20, 2020
e27e786
Per-peer limit worker count limit
martinsumner Jan 22, 2020
aee5e21
Close clients if no longer in use
martinsumner Jan 22, 2020
48e543c
Update NextGenREPL.md
martinsumner Jan 23, 2020
77454bd
Doc update
martinsumner Jan 23, 2020
cfa5d21
Enforce more clearly requirement for immediate timeout
martinsumner Jan 29, 2020
67e627b
Avoid ducplicating/mixing logs on non-normal close
martinsumner Jan 29, 2020
d5df071
Remove priority from reap/erase API
martinsumner Jan 29, 2020
286ec0d
Adapt sink model to have peer limit and restrict to one initial queue
ThomasArts Jan 29, 2020
9df75d8
Model erroring connections in sink model
UlfNorell Jan 29, 2020
f62fa99
Make redo_timeout configurable
ThomasArts Jan 29, 2020
3344ce4
Add eraser_eqc
hanssv Jan 29, 2020
39f83cb
Fix Thomas' typos
hanssv Jan 29, 2020
5b5bb0b
Add reaper_eqc
hanssv Jan 29, 2020
7a781b9
Merge pull request #1744 from Quviq/mas-i1691-pauseonbacklog
martinsumner Jan 29, 2020
9143e48
Merge pull request #1743 from basho/mas-i1691-pauseonbacklog
martinsumner Jan 29, 2020
55e5dd1
Typo
martinsumner Jan 29, 2020
889c953
Changes following review
martinsumner Jan 30, 2020
0da2790
Merge pull request #1745 from basho/mas-i1691-releasereview
martinsumner Jan 30, 2020
790076e
Schema Review
martinsumner Jan 31, 2020
852d246
Merge pull request #1746 from basho/mas-i1691-schemaview
martinsumner Jan 31, 2020
03d37d0
Add on-wire zlib compression per object
martinsumner Jan 31, 2020
b23e9a3
Set repl_compress in state not in process
martinsumner Feb 2, 2020
f556593
Merge pull request #1747 from basho/mas-i1691-compressoption
martinsumner Feb 2, 2020
60de4f8
Remove log
martinsumner Feb 4, 2020
6c955a5
Log not crash on remote connection/timeout failure
martinsumner Feb 4, 2020
5260529
Use exported function types
martinsumner Feb 5, 2020
59201a0
Merge pull request #1748 from basho/mas-i1691-logonremotefail
martinsumner Feb 5, 2020
080bdb8
Update NextGenREPL.md
martinsumner Feb 10, 2020
ba40aaa
Create ReapErase.md
martinsumner Feb 11, 2020
5a4d7b5
Switch to develop-2.9 branches
martinsumner Feb 11, 2020
a698341
Merge branch 'develop-2.9' into mas-i1691-ttaaefullsync
martinsumner Feb 11, 2020
d1de9ac
Add travis CI to 2.9
martinsumner Feb 11, 2020
9426d6b
Merge branch 'mas-i1691-ttaaefullsync' of https://github.com/basho/ri…
martinsumner Feb 11, 2020
ce6ec15
Merge pull request #1749 from basho/mas-i1691-ttaaefullsync
martinsumner Feb 11, 2020
0537645
Switch to fixed riak_dt
martinsumner Feb 11, 2020
9e9511e
2.9.1 Release - tagged rebar.config
martinsumner Feb 13, 2020
9127669
Missed riak_pipe tag
martinsumner Feb 13, 2020
aace510
Add GET to handling of node_confirms
martinsumner Feb 18, 2020
11ad7c2
Update API to allow node_confirms per request
martinsumner Feb 19, 2020
3288532
Switch back to develop-2.9
martinsumner Feb 25, 2020
6e8d145
Merge pull request #1751 from basho/mas-i1750-nodeconfirms
martinsumner Feb 25, 2020
38a9046
Add support for recalc reload strategy in leveled
martinsumner Mar 16, 2020
756ce01
Ensure directory present for lock file
martinsumner Mar 17, 2020
df5220e
Switch leveld branch post-merge
martinsumner Mar 30, 2020
982edb1
Merge pull request #1753 from basho/mas-i306-compactrecalc
martinsumner Mar 30, 2020
7fd5297
Update rebar.config
martinsumner Apr 7, 2020
df9a9a6
Switch to 292 merged branches
martinsumner Apr 9, 2020
6f82526
Merge branch 'develop-2.9' into develop-3.0-292
martinsumner Apr 9, 2020
08495da
Resolve dialyzer issues
martinsumner Apr 9, 2020
b2de7d5
reenable warnings as errors
martinsumner Apr 9, 2020
7883c11
Use correct travis script for OTP20 +
martinsumner Apr 9, 2020
328f2b9
Stop use of riak_client not as M:F
martinsumner Apr 10, 2020
e03461f
Avoid potential divide by 0
martinsumner Apr 10, 2020
dddbdd9
Further riak_client changes
martinsumner Apr 10, 2020
82eb70d
Fix spurious lost put coordinator acks
keynslug Apr 12, 2020
29febea
Tidy-up checks
martinsumner Apr 14, 2020
1b32ec4
Expect forward-ack from any node not only from chosen
keynslug Apr 25, 2020
f2e22d9
Resolve issues with eqc tests from 2.9.2
martinsumner Apr 28, 2020
e348703
Stop mocking when exiting eqc tests
martinsumner Apr 28, 2020
3cb90ee
Merge remote-tracking branch 'keynslug/fix/[email protected]' i…
martinsumner Apr 28, 2020
5786564
Remove gen_fsm_compat
martinsumner Apr 28, 2020
b3a9a66
standardise versions
martinsumner May 7, 2020
2c5e843
Merge pull request #1756 from basho/develop-3.0-292
martinsumner May 7, 2020
a84936a
Allow the mbox_check to be disabled via environment, rather than just…
May 10, 2020
f69f7c8
Use the capability as before to ensure it's safe to be enabled.
May 10, 2020
7833f0c
Merge pull request #1757 from basho/patch/allow-mboxcheck-disabling
martinsumner May 29, 2020
742713e
Force clearing of Tictac AAE on vnode delete
martinsumner Jun 9, 2020
b4cb054
Merge pull request #1760 from basho/mas-i1759-clearaae
martinsumner Jun 12, 2020
05ac0b9
Merge branch 'mas-i1759-clearaae' into develop-3.0-29update
martinsumner Jun 12, 2020
d7800fd
Add cs_bucket support
martinsumner Jun 12, 2020
07f2557
Handle object_key_in_range results other than true
martinsumner Jun 15, 2020
c27696a
Merge branch 'mas-i1758-objectoutofrange' into develop-3.0-29update
martinsumner Jun 15, 2020
4731882
Expand comment
martinsumner Jun 15, 2020
67b8483
Clarify comment
martinsumner Jun 15, 2020
17302e5
Merge pull request #1761 from basho/mas-i1758-objectoutofrange
martinsumner Jun 15, 2020
09061c4
Merge branch 'mas-i1758-objectoutofrange' into develop-3.0-29update
martinsumner Jun 15, 2020
db1b2de
Merge pull request #1755 from keynslug/fix/[email protected]
martinsumner Jun 15, 2020
614d9f4
Disable soft-limit check
martinsumner Jun 15, 2020
b825588
Missing .
martinsumner Jun 15, 2020
24ec5d2
Merge pull request #1763 from basho/mas-pr1757-softlimitdisable
martinsumner Jun 17, 2020
f2117df
Merge branch 'mas-pr1757-softlimitdisable' into develop-3.0-29update
martinsumner Jun 17, 2020
669c648
Trigger Travis CI build
martinsumner Jun 17, 2020
56528b7
Merge pull request #1762 from basho/develop-3.0-29update
martinsumner Jun 19, 2020
53ce67c
Update Tictac AAE tag
martinsumner Jun 22, 2020
ddf7359
Remove JS config from schema
martinsumner Jul 1, 2020
ccd1e36
Merge pull request #1767 from basho/develop-3.0-removejs
martinsumner Jul 1, 2020
8314392
Bump kv_index_tictactree
martinsumner Jul 3, 2020
b005659
Add exchange skipping to tictacaae
martinsumner Jul 7, 2020
4ad1d05
Correct spacing in log
martinsumner Jul 7, 2020
acc9663
Don't AAE exchange immediately on restart
martinsumner Jul 8, 2020
a60e0e6
Queue rebuild_trees without taking snapshot
martinsumner Jul 8, 2020
014bbdb
Add logs
martinsumner Jul 8, 2020
6c2f4d3
Protect from div 0 errors
martinsumner Jul 15, 2020
057ee4c
Wait for loading to complete
martinsumner Jul 21, 2020
3c59780
Update riak_kv_vnode.erl
martinsumner Jul 21, 2020
822227c
Update rebar.config
martinsumner Jul 23, 2020
45ff58c
Icrease initial tick delay on startup
martinsumner Jul 24, 2020
0b4bd20
Increase default exchange tick
martinsumner Jul 24, 2020
596f427
Handle cluster change
martinsumner Jul 25, 2020
cf15834
Switch first OnlyIfBrokenBuild to first poke
martinsumner Jul 27, 2020
e393404
Update riak_kv_vnode.erl
martinsumner Jul 27, 2020
1a2423b
Don't kill tick on initial rebuild
martinsumner Jul 27, 2020
bbf2164
Comments and rollback slightly exchange poke increase
martinsumner Jul 28, 2020
2893378
Allow debug of repairs
martinsumner Jul 30, 2020
3ac9ccf
Refine name
martinsumner Jul 30, 2020
5df988a
Remove tabs
martinsumner Jul 30, 2020
12f6593
Use Token Bucket for both AAE
martinsumner Aug 1, 2020
a95d9ea
Add ability to override tictac exchange max_results
martinsumner Aug 3, 2020
6d7b9c6
Add tictacaae constraint
martinsumner Aug 3, 2020
d6f5748
Extend repair for consistency (with exchange_fsm)
martinsumner Aug 4, 2020
1a1d34e
Avoid div 0
martinsumner Aug 4, 2020
6391409
Typo
martinsumner Aug 4, 2020
69c774c
Update src/riak_kv_vnode.erl
martinsumner Aug 8, 2020
e135df6
Correct indentation
martinsumner Aug 8, 2020
5d56912
Merge branch 'mas-i1765-tokenbucket' of https://github.com/basho/riak…
martinsumner Aug 8, 2020
6cb709b
Count in folds for comparison
martinsumner Aug 11, 2020
f9c5351
Merge branch 'mas-i1771-foldcounter' into mas-i1765-develop30
martinsumner Aug 12, 2020
6c545f5
Merge pull request #1773 from basho/mas-i1765-develop30
martinsumner Aug 13, 2020
2b814c8
Use tags for release
martinsumner Aug 18, 2020
d174957
Bump tags to fix plugins
martinsumner Aug 20, 2020
f074aa3
Merge pull request #8 from basho/develop-3.0
Nov 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ src/riak_core_pb.erl
*@*
undefined
log/crash.log
.eqc/*counterexample.eqc
11 changes: 11 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
language: erlang
otp_release:
- 20.3.8
- 21.3
- 22.3
script:
- chmod u+x rebar3
- ./rebar3 do upgrade, compile, dialyzer, xref, eunit
notifications:
slack:
secure: SE0EMU9HenZlLBuNg7l6WLMxJPkfyAEGgodvAqMEuQmICtrh1nV019D/A8ykejYYPPsJafWVOfypOSDrSHCndzXvEZvU8l45nJ6XLdUdrDYEmvcSqN3EqmVSsuf9H3g99bvKygXaY27MkTS5ixLil7PzybG+YpwMnQGcQxYo5Eg=
424 changes: 424 additions & 0 deletions docs/NextGenREPL.md

Large diffs are not rendered by default.

75 changes: 75 additions & 0 deletions docs/ReapErase.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Reap and Erase

## Background

The Reaper and Eraser are now processes supported in Riak as at release 2.9.1. They are intended to be a starting point of improving a number of situations:

* Riak supports [multiple delete modes](https://docs.riak.com/riak/kv/2.2.3/using/reference/object-deletion/index.html#configuring-object-deletion), and the recommended mode for safety is `keep`. However, a delete_mode of `keep` creates an unresolved problem of permanently uncollected garbage - the tombstones are never erased. Note, even with an `interval` delete mode, tombstones may fail to be cleared and then continue to exist forever.

* Riak supports time-to-live for objects configured within the backend, also known as [Global Object Expiration](https://riak.com/products/riak-kv/global-object-expiration/index.html?p=12768.html). However, there are two flaws with this form of automated expiration:

* It considers only when the object is added to the backend, not when the object is added to the database (i.e. the object's last modified time). An object may extend and survive expiry through handoffs.

* Object expiry is not coordinated with anti-entropy changes, and so as objects expire they may be resurrected by anti-entropy, which in turn will reset their expiry data to further in the future. It can also lead to huge cascades in false repair action when AAE trees are rebuilt - as one vnode's AAE trees suddenly rebuild without all the expired objects. This has led to some customers attempting to coordinate AAE tree rebuilds to make the occur concurrently, which can have a significant performance impact during the rebuild period.

To begin to address these problems, the `riak_kv_reaper` and `riak_kv_eraser` have been introduced in Riak KV 2.9.1.

## Riak KV Reaper

The `riak_kv_reaper` is a process that receives requests to reap tombstones, queues those requests, and continuously reaps tombstones from that queue when processing time is available.

The reaper queue can be fed via a tictac aae fold from the riak client - `aae_reap_tombs`:

```
-spec aae_reap_tombs(pid(),
riakc_obj:bucket(), key_range(),
segment_filter(),
modified_range(),
change_method()) ->
{ok, non_neg_integer()} | {error, any()}.
```

Reaping tombstones can be done only against a single specific bucket at a time, and can be further restricted to a key range within the bucket. A segment filter can be added to only reap tombstones within a given part of the AAE tree; the segment filter may be useful when trying to break up reaping activity to do only a proportion of required reaps at a time. A modified range should be passed so that the reap can be restricted only to tombstones which have existed beyond a certain point - for example to only reap tombstones more than one month old.

The reap fold will then discover keys to be reaped, and queue them for reaping (or count them if `change_method` is set to count). To run reaps in parallel across nodes in the cluster use `local` as the `change_method`. To have a single queue of reaps for a single process dedicated to this fold then `{job, JobID}` can be passed as the `change_method`.

The actual reap will remove both the tombstone from the backend as well as removing the reference from the Active Anti-Entropy system. Before attempting a reap a check is made to ensure all primary vnodes in the preflist are online - and if not the reap will be deferred by switching it to the back of the queue. If a reap were to proceed without a primary being available, then it is likely to be eventually resurrected through anti-entropy.

The reaping itself will only act if:

* the object to be reaped is confirmed as a tombstone, and;

* the object to be reaped has the same vector clock as when the reap requirement was discovered (the comparison is based on a hash of the sorted vector clock).

Note that when using `riak_kv_ttaaefs_manager` for full-sync, or any riak_repl full-sync mechanism, that is reap jobs are not co-ordinated between clusters tombstones will be resurrected by full-sync jobs.

## Riak KV Eraser

The `riak_kv_eraser` is a process that receives requests to delete objects, queues those requests, and continuously delete objects from that queue when processing time is available. The eraser is simply an unscheduled garbage collection process at the moment, but is planned to be extended in 2.9.2 to be part of a more complete TTL management solution.

The eraser queue can be fed via a tictac aae fold from the riak client - `aae_erase_keys`:

```
-spec aae_erase_keys(pid(),
riakc_obj:bucket(), key_range(),
segment_filter(),
modified_range(),
change_method()) ->
{ok, non_neg_integer()} | {error, any()}.
```

The function inputs are the same as with `aae_reap_tombs`. For this fold, the results will be queued for the `riak_kv_eraser`. If all primary vnodes are not up, then as with the reap the delete will not be attempted, but will be re-queued.

The delete will only complete if the object to be deleted has a vector clock equal to that discovered at the time the delete was queued.


## Outstanding TODO

* Allow for scheduled reap and erase jobs to generate reap and erase activity.

* This makes more sense for erase jobs to have them auto-scheduled, as reap jobs won't naturally co-ordinate between clusters - so tombstones may resurrect through full-sync.


* Change replication so that it will filter the sending of tombstones beyond a certain modified time, so as not to resurrect old tombstones via full-sync.

* Have a TTL bucket property so that GETs can be filtered beyond that modified time. Need to consider local GETs (e.g. at a vnode before a PUT).
2 changes: 2 additions & 0 deletions eqc/backend_eqc.erl
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@
async_put/5,
init_backend/3]).

-export([fold_keys/5, check/1, check2/1]).

-define(TEST_SECONDS, 120).

-record(qcst, {backend, % Backend module under test
Expand Down
8 changes: 6 additions & 2 deletions eqc/ec_eqc.erl
Original file line number Diff line number Diff line change
Expand Up @@ -670,11 +670,15 @@ get_fsm_proc(ReqId, #params{n = N, r = R}) ->
NotFoundOk = true,
AllowMult = true,
DeletedVclock = true,
ExpectedVclock = false,
NodeConfirms = 0,
GetCore = riak_kv_get_core:init(N, R,
0, %% SLF hack
FailThreshold,
NotFoundOk, AllowMult, DeletedVclock,
[{Idx, primary} || Idx <- lists:seq(1, N)] %% SLF hack
[{Idx, primary} || Idx <- lists:seq(1, N)], %% SLF hack
ExpectedVclock,
NodeConfirms
),
#proc{name = {get_fsm, ReqId}, handler = get_fsm,
procst = #getfsmst{getcore = GetCore}}.
Expand All @@ -689,7 +693,7 @@ get_fsm(#msg{from = {kv_vnode, Idx, _}, c = {r, Result, Idx, _ReqId}},
reply_to = ReplyTo,
responded = Responded,
getcore = GetCore} = ProcSt} = P) ->
UpdGetCore1 = riak_kv_get_core:add_result(Idx, Result, GetCore),
UpdGetCore1 = riak_kv_get_core:add_result(Idx, Result, node(), GetCore),
{ReplyMsgs, UpdGetCore3, UpdResponded} =
case riak_kv_get_core:enough(UpdGetCore1) of
true when Responded == false ->
Expand Down
90 changes: 90 additions & 0 deletions eqc/eraser_eqc.erl
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
-module(eraser_eqc).

-include_lib("eqc/include/eqc.hrl").
-include_lib("eqc/include/eqc_component.hrl").

-compile([export_all, nowarn_export_all]).

%% -- State ------------------------------------------------------------------
initial_state() ->
#{ deletes => [] }.

%% -- Operations -------------------------------------------------------------

%% --- Operation: start ---
start_pre(S) -> not maps:is_key(pid, S).

start_args(_S) ->
[gen_delete_mode()].

start(DelMode) ->
{ok, Pid} = gen_server:start_link(riak_kv_eraser, [eqc_job, fun erase/2, DelMode], []),
Pid.

start_next(S, Pid, [DelMode]) ->
S#{ delete_mode => DelMode, pid => Pid }.

%% --- Operation: delete ---
delete_pre(S) -> maps:is_key(pid, S).

delete_args(#{ pid := Pid }) ->
[Pid, gen_delete_ref()].

delete(Pid, {Ref, Retries}) ->
ets:insert(?MODULE, {Ref, Retries}),
riak_kv_eraser:request_delete(Pid, Ref).

delete_next(S = #{ deletes := Ds }, _Value, [_Pid, {Ref, Retries}]) ->
S#{ deletes := [{Ref, Retries} | Ds] }.

delete_features(_, [_, {_, N}], _) ->
[{retries, N}].

%% -- Generators -------------------------------------------------------------
gen_delete_mode() ->
elements([keep, immediate]).

gen_delete_ref() ->
?LET(Ref, gen_reference(),
weighted_default({4, {Ref, 1}}, {1, {Ref, choose(2, 5)}})).

gen_reference() ->
os:timestamp().

%% -- Properties -------------------------------------------------------------

prop_eraser() ->
application:set_env(riak_kv, eraser_redo_timeout, 20),
?FORALL(Cmds, commands(?MODULE),
begin
ets:new(?MODULE, [named_table, public]),
{H, S, Res} = run_commands(Cmds),
StopRes = stop_job(maps:get(pid, S, undefined)),
ETSRes = ets:tab2list(?MODULE),
ets:delete(?MODULE),
aggregate(call_features(H),
pretty_commands(?MODULE, Cmds, {H, S, Res},
conjunction([
{result, equals(Res, ok)},
{stop, equals(StopRes, ok)},
{ets, equals(ETSRes, [])}])))
end).

stop_job(undefined) -> ok;
stop_job(Pid) ->
MonRef = monitor(process, Pid),
riak_kv_eraser:stop_job(Pid),
receive
{'DOWN', MonRef, _, _, _} -> ok
after 1000 ->
timeout_stop_job
end.


erase(Ref, _) ->
case ets:lookup(?MODULE, Ref) of
[{_, 1}] ->
ets:delete(?MODULE, Ref), true;
[{_, N}] ->
ets:insert(?MODULE, {Ref, N - 1}), false
end.
26 changes: 16 additions & 10 deletions eqc/fsm_eqc_vnode.erl
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,15 @@
%% -------------------------------------------------------------------

-module(fsm_eqc_vnode).
-behaviour(gen_fsm_compat).
-behaviour(gen_fsm).
-include("include/riak_kv_vnode.hrl").

-compile({nowarn_deprecated_function,
[{gen_fsm, start_link, 3},
{gen_fsm, start_link, 4},
{gen_fsm, sync_send_all_state_event, 2}]}).


-export([start_link/0, start_link/1, set_data/2, set_vput_replies/1,
get_history/0, get_put_history/0,
get_reply_history/0, log_postcommit/1, get_postcommits/0]).
Expand Down Expand Up @@ -66,30 +72,30 @@ start_link() ->
start_link(?MODULE).

start_link(undefined) ->
gen_fsm_compat:start_link(?MODULE, [], []);
gen_fsm:start_link(?MODULE, [], []);
start_link(RegName) ->
gen_fsm_compat:start_link({local, RegName}, ?MODULE, [], []).
gen_fsm:start_link({local, RegName}, ?MODULE, [], []).

set_data(Objs, PartVals) ->
ok = gen_fsm_compat:sync_send_all_state_event(?MODULE, {set_data, Objs, PartVals}).
ok = gen_fsm:sync_send_all_state_event(?MODULE, {set_data, Objs, PartVals}).

set_vput_replies(VPutReplies) ->
ok = gen_fsm_compat:sync_send_all_state_event(?MODULE, {set_vput_replies, VPutReplies}).
ok = gen_fsm:sync_send_all_state_event(?MODULE, {set_vput_replies, VPutReplies}).

get_history() ->
gen_fsm_compat:sync_send_all_state_event(?MODULE, get_history).
gen_fsm:sync_send_all_state_event(?MODULE, get_history).

get_put_history() ->
gen_fsm_compat:sync_send_all_state_event(?MODULE, get_put_history).
gen_fsm:sync_send_all_state_event(?MODULE, get_put_history).

get_reply_history() ->
gen_fsm_compat:sync_send_all_state_event(?MODULE, get_reply_history).
gen_fsm:sync_send_all_state_event(?MODULE, get_reply_history).

log_postcommit(Obj) ->
gen_fsm_compat:sync_send_all_state_event(?MODULE, {log_postcommit, Obj}).
gen_fsm:sync_send_all_state_event(?MODULE, {log_postcommit, Obj}).

get_postcommits() ->
gen_fsm_compat:sync_send_all_state_event(?MODULE, get_postcommits).
gen_fsm:sync_send_all_state_event(?MODULE, get_postcommits).

%% ====================================================================
%% gen_fsm callbacks
Expand Down
87 changes: 87 additions & 0 deletions eqc/reaper_eqc.erl
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
-module(reaper_eqc).

-include_lib("eqc/include/eqc.hrl").
-include_lib("eqc/include/eqc_component.hrl").

-compile([export_all, nowarn_export_all]).

%% -- State ------------------------------------------------------------------
initial_state() ->
#{ reaps => [] }.

%% -- Operations -------------------------------------------------------------

%% --- Operation: start ---
start_pre(S) -> not maps:is_key(pid, S).

start_args(_S) ->
[].

start() ->
{ok, Pid} = gen_server:start_link(riak_kv_reaper, [eqc_job, fun reaper/1], []),
Pid.

start_next(S, Pid, []) ->
S#{ pid => Pid }.

%% --- Operation: reap ---
reap_pre(S) -> maps:is_key(pid, S).

reap_args(#{ pid := Pid }) ->
[Pid, gen_reap_ref()].

reap(Pid, {Ref, Retries}) ->
ets:insert(?MODULE, {Ref, Retries}),
riak_kv_reaper:request_reap(Pid, Ref).

reap_next(S = #{ reaps := Ds }, _Value, [_Pid, {Ref, Retries}]) ->
S#{ reaps := [{Ref, Retries} | Ds] }.

reap_features(_, [_, {_, N}], _) ->
[{retries, N}].

%% -- Generators -------------------------------------------------------------
gen_reap_ref() ->
?LET(Ref, gen_reference(),
weighted_default({4, {Ref, 1}}, {1, {Ref, choose(2, 5)}})).

gen_reference() ->
os:timestamp().

%% -- Properties -------------------------------------------------------------

prop_reaper() ->
application:set_env(riak_kv, reaper_redo_timeout, 20),
?FORALL(Cmds, commands(?MODULE),
begin
ets:new(?MODULE, [named_table, public]),
{H, S, Res} = run_commands(Cmds),
StopRes = stop_job(maps:get(pid, S, undefined)),
ETSRes = ets:tab2list(?MODULE),
ets:delete(?MODULE),
aggregate(call_features(H),
pretty_commands(?MODULE, Cmds, {H, S, Res},
conjunction([
{result, equals(Res, ok)},
{stop, equals(StopRes, ok)},
{ets, equals(ETSRes, [])}])))
end).

stop_job(undefined) -> ok;
stop_job(Pid) ->
MonRef = monitor(process, Pid),
riak_kv_reaper:stop_job(Pid),
receive
{'DOWN', MonRef, _, _, _} -> ok
after 1000 ->
timeout_stop_job
end.


reaper(Ref) ->
case ets:lookup(?MODULE, Ref) of
[{_, 1}] ->
ets:delete(?MODULE, Ref), true;
[{_, N}] ->
ets:insert(?MODULE, {Ref, N - 1}), false
end.
Loading