sql: add Bloom filters for ATXs and malicious identities #6332

ivan4th · 2024-09-13T22:51:59Z

Motivation

atxs.Has and identities.IsMalicious require database access and a lot of such queries are made when fetching ATXs, resulting in false in majority of cases. Bloom filter can be used to avoid database access in most cases.

Description

This adds Bloom filters that are initialized on startup and are updated as new ATXs and malicious identities are being added. ~114 MiB Bloom filter has 1% false positive rate for 100M ATXs, and 234 KiB Bloom filter has 0.01% false positive rate for 10K malicious identities. False positives don't mean that the check will yield incorrect result, they just incur a database query which is always done when the Bloom filter gives positive result.

About 2 min on an old Xeon (E5-2696) machine is needed to load the filters during startup. This appears to be a worthy tradeoff; the filter false positive rate and expected size values aren't expected to change often, thus I didn't add config values for them just yet.
UPD: will update the PR so that bloom filters are loaded in background and only used when they're ready, falling back to the old "always query DB" behavior while the filters are not ready yet.

The rqlite/sql SQLite parser/stringifier dependency introduced in the code would of course not be justified if it was only intended for the Bloom filters, as the necessary SQL could be hardcoded instead, but there are several places where SQL is processed or dynamically generated in go-spacemesh and the intent is to use the new sql/expr package in several other places to, extending it as needed:

normalizing SQL schema for drift detection
removing comments from migration scripts (not done correctly right now)
replacing sql/builder package
syncv2 database-backed sync data structure

In case of things like Bloom filters writing out all the queries explicitly may sound as a good "less magic" approach, and that may be subject for discussion, but repeated SQL queries for "mostly same" thing do cause issues, e.g. here we have a bug in equivocation set handling for malicious identities which resulted from not all related SQL queries being updated correctly: #6331 (to be fixed soon in a separate PR, w/o dynamic SQL)

The intent for sql/expr package is to hide most of the rqlite/sql functionality we don't need, and provide a simple and minimalistic interface for dynamic SQL needs of the codebase instead. The idea is not to use rqlite/sql directly in other go-spacemesh packages. sql/expr has been extracted from the syncv2 code and thus has slightly more functionality than Bloom filters use.

Test Plan

Verify on a mainnet node

TODO

Test changes and document test plan
Update changelog as needed

codecov · 2024-09-14T00:29:23Z

Codecov Report

Attention: Patch coverage is 88.81119% with 32 lines in your changes missing coverage. Please review.

Project coverage is 81.8%. Comparing base (5091028) to head (18f5acb).
Report is 12 commits behind head on develop.

Files with missing lines	Patch %	Lines
sql/bloom.go	91.3%	6 Missing and 4 partials ⚠️
sql/expr/expr.go	90.0%	6 Missing and 2 partials ⚠️
sql/identities/identities.go	74.1%	4 Missing and 4 partials ⚠️
sql/atxs/atxs.go	75.0%	2 Missing and 2 partials ⚠️
sql/database.go	95.1%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           develop   #6332    +/-   ##
========================================
  Coverage     81.8%   81.8%            
========================================
  Files          312     314     +2     
  Lines        34606   34890   +284     
========================================
+ Hits         28318   28563   +245     
- Misses        4452    4484    +32     
- Partials      1836    1843     +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dshulyak · 2024-09-14T06:40:43Z

i also was looking into leveraging bloom filters after realizing that most of the identities are not malicious therefore this index scan always has to complete fully but return nothing #6326. apparently sqlite itself has support for bloom filters, but i am not sure when query optimizer picks them

one thing to note is that this query was added wrongly, the way it is written will make situation with atx handling worse if it wasn't yet released.

there should be more optimal approach, that doesn't pay the cost all the time but only when it is actually needed. for example when equivocation happens update all atxs in equivocation set.

	SELECT 1 FROM identities
	WHERE (marriage_atx = (
		SELECT marriage_atx FROM identities WHERE pubkey = ?1 AND marriage_atx IS NOT NULL) AND proof IS NOT NULL
	)
	OR (pubkey = ?1 AND marriage_atx IS NULL AND proof IS NOT NULL);

dshulyak · 2024-09-14T06:48:15Z

one other thing is that i will likely get rid of atx warmup and long tortoise loading, adding more time to startup seems like poor choice. maybe consider loading this bloom filter in the background and use it only once loaded (if sqlite bloom filter is hard to use)

ivan4th · 2024-09-14T14:07:08Z

maybe consider loading this bloom filter in the background and use it only once loaded

That's a good idea, thanks, will do so

ivan4th · 2024-09-14T14:08:42Z

Updating all the identities in the equivocation set when one of them becomes malicious also makes sense to me, I had similar thoughts too, but this should probably be done in a separate PR

ivan4th · 2024-09-14T22:51:48Z

Switched to background loading of the Bloom filters

dshulyak · 2024-09-16T04:38:38Z

btw are ATX optimization speed up things noticeably for you? i mean, not just measuing how long atxs.Has was before/now, but in terms of syncv2 rate or resource usage. the reason why am asking is that it doesn't appear to be largest hotspot in atxs processing, this is distribution of vfs read syscalls by sqlite on v1.6.8, they don't hit disc because of large memory on my computer but this is the largest slow down after verification

it is far from optimal, i would rework that to use smarter access, just changing how/when sqlite is used:

cache poet proofs, they are mostly reused, but have to be loaded all the time
load only necessary data for validation, not whole atx blobs, and preferably in one scan
run fetchReferences only for data that can't be loaded, e.g don't do work of checking deps twice
sql.Commit is likely will be optimized with batching, it won't have to write same pages to WAL multiple times
atxs.Add will be optimized by reducing number of indexes on atxs table
sql.WithTx has a lot of reads in it, this makes atx processing significantly slower as this is all them synhronized
IsMalicious has to scan whole table now, it is good luck that it is small, but it will be more optimal if read has to check only small index . if after such change checking index is still too slow then bloom filter is the right solution
atxs.Has can use epoch || atxid (added epoch prefix) to query for existence as it will improve scan

that list just addresses "mistakes" and doesn't introduce any additional complexity, and once addressed maybe adding more complexity will not be necessary. but if adding bloom filter help significantly in short term i won't object adding them

dshulyak · 2024-09-16T08:48:13Z

i prototyped several optimizations from the list above, the result is that unless atx verifier hits the disk the latency is dominated by the post verification (even if it is just a few labels for vrf nonce).

disk reads for my branch vs v1.6.8

latency distribution, note that all reads are from memory, otherwise the difference would be more substantial

the rate might be somewhat better but still dominated by verification, sorry timestamp are not adjusted

so i think the goal is just not too reduce disk/vfs usage as much as possible, but otherwise small optimizations won't matter

poszu · 2024-09-16T07:22:14Z

sql/bloom.go

+// NewDBBloomFilter creates a new Bloom filter that for a database table.
+// tableName is the name of the table, idColumn is the name of the column that contains
+// the IDs, filter is an optional SQL expression that selects the rows to include in the
+// filter, and falsePositiveRate is the desired false positive rate.
+func NewDBBloomFilter(


It would be great to document how to use it, the tradeoffs, etc. For example, why not set falsePositiveRate=0.0, the extraCoef usage, and so on.

filter is an optional SQL expression that selects the rows to include in the filter

It doesn't take an argument filter. Should this part be removed?

poszu · 2024-09-16T07:27:09Z

sql/bloom.go

+	if bf.minSize > 0 && size < bf.minSize {
+		size = bf.minSize
+	}


NIT:

Suggested change

if bf.minSize > 0 && size < bf.minSize {

size = bf.minSize

}

if bf.minSize > 0 {

size = max(size, bf.minSize)

}

poszu · 2024-09-16T07:44:18Z

sql/bloom.go

+	var bs []byte
+	nRows, err := db.Exec(bf.loadSQL(), nil, func(stmt *Statement) bool {
+		l := stmt.ColumnLen(0)
+		if cap(bs) < l {
+			bs = make([]byte, l)
+		} else {
+			bs = bs[:l]
+		}
+		stmt.ColumnBytes(0, bs)
+		f.Add(bs)


How about using bytes.Buffer to avoid manual size work:

Suggested change

var bs []byte

nRows, err := db.Exec(bf.loadSQL(), nil, func(stmt *Statement) bool {

l := stmt.ColumnLen(0)

if cap(bs) < l {

bs = make([]byte, l)

} else {

bs = bs[:l]

}

stmt.ColumnBytes(0, bs)

f.Add(bs)

var buf bytes.Buffer

nRows, err := db.Exec(bf.loadSQL(), nil, func(stmt *Statement) bool {

buf.Reset()

buf.ReadFrom(stmt.ColumnReader(0))

f.Add(buf.Bytes())

poszu · 2024-09-16T07:47:02Z

sql/bloom.go

+	bf.mtx.Lock()
+	bf.f = f
+	bf.mtx.Unlock()
+	bf.logger.Info("done loading Bloom filter", zap.String("name", bf.name), zap.Int("rows", nRows))


Could it be a problem if new ATXs are inserted after the db.Exec finished but before bf.f = f? I.e. could these ATXs added "in between" be "lost" to the filter?

Perhaps it's worth unit-testing such case, wdyt?

poszu · 2024-09-16T08:11:59Z

node/node.go

@@ -1979,6 +1980,8 @@ func (app *App) setupDBs(ctx context.Context, lg log.Log) error {
 	}
 	{
 		warmupLog := app.log.Zap().Named("warmup")
+		atxs.StartBloomFilter(app.db, warmupLog)


The warmup of the bloom filters will run in parallel with the atxsdata warmup which is also doing a lot of SQL reads. Wouldn't it hurt performance too much? How about starting the bloom filters warmup after atxsdata warmup finished?

poszu · 2024-09-16T08:37:02Z

sql/database.go

+// Contains verifies that the ID exists within the specified set.
+func Contains(db Executor, name string, id []byte) (bool, error) {
+	if set, ok := db.(IDSetCollection); ok {
+		return set.Contains(name, id)
+	}
+	return false, ErrNoSet
+}


When would it be used with something that does not implement IDSetCollection? Why have this function instead of calling set.Contains?

poszu · 2024-09-16T08:37:42Z

sql/expr/expr.go

@@ -0,0 +1,212 @@
+// Package expr proviedes a simple SQL expression parser and builder.


Suggested change

// Package expr proviedes a simple SQL expression parser and builder.

// Package expr provides a simple SQL expression parser and builder.

poszu · 2024-09-16T08:41:05Z

sql/identities/identities.go


 	"github.com/spacemeshos/go-spacemesh/common/types"
 	"github.com/spacemeshos/go-spacemesh/sql"
 )

+const (
+	// Bloom filter size is < 234 KiB while below 100k identities.


Do you mean identities in general or malicious ones?

Suggested change

// Bloom filter size is < 234 KiB while below 100k identities.

// Bloom filter size is < 234 KiB while below 100k malicious identities.

poszu · 2024-09-16T09:02:52Z

sql/identities/identities.go

+	ids, err := EquivocationSet(db, nodeID)
+	if err != nil {
+		return fmt.Errorf("get equivocation set for %v: %w", nodeID, err)
+	}
+	for _, id := range ids {
+		sql.AddToSet(db, "malicious", id[:])
+	}


Based on your discussion with @dshulyak, shouldn't this be:

Suggested change

ids, err := EquivocationSet(db, nodeID)

if err != nil {

return fmt.Errorf("get equivocation set for %v: %w", nodeID, err)

}

for _, id := range ids {

sql.AddToSet(db, "malicious", id[:])

}

sql.AddToSet(db, "malicious", id[:])

provided SetMalicious is called for every ID in the equivocation set? Note: calling SetMalicious is not yet implemented for ATX V2 AFAIR:

go-spacemesh/activation/malfeasance2.go

Lines 13 to 16 in 83b0499

func (p *MalfeasancePublisher) Publish(ctx context.Context, id types.NodeID, proof wire.Proof) error {

// TODO(mafa): implement me

return nil

}

poszu · 2024-09-16T09:10:17Z

@dshulyak

i prototyped several optimizations from the list above, the result is that unless atx verifier hits the disk the latency is dominated by the post verification (even if it is just a few labels for vrf nonce).
so i think the goal is just not too reduce disk/vfs usage as much as possible, but otherwise small optimizations won't matter

You have a very fast disk, don't you? The results could be very different on a slow one, where the disk would quickly become the limiting factor. Wdyt?

dshulyak · 2024-09-16T10:14:51Z

one other thing if you are focusing on this part that runs the atxs.Has check, it makes more sense to refactor it such that caller always checks locally if atx exists and don't redo the work in the fetcher.

more concretely, atx handler always needs to load previous atx for validation, if it tries to load it and sql.ErrNotFound is raised, you call that handler with previous atx id, otherwise you don't. in my opinion it is better to fix "logic" rather than adding more workarounds

ivan4th · 2024-10-07T12:29:28Z

Converted to draft for now b/c more testing may be needed to justify the use of the bloom filters, with more pressing issues being there right now.

ivan4th requested review from dshulyak, fasmat, poszu and acud as code owners September 13, 2024 22:52

ivan4th force-pushed the feature/bloom-filter branch 3 times, most recently from aa86c6c to 234c6c5 Compare September 14, 2024 00:06

sql: add Bloom filters for ATXs and malicious identities

0b11181

ivan4th force-pushed the feature/bloom-filter branch from 234c6c5 to 0b11181 Compare September 14, 2024 00:08

Fix linter issues

571dc91

sql: load Bloom filters in background

18f5acb

poszu reviewed Sep 16, 2024

View reviewed changes

ivan4th marked this pull request as draft October 7, 2024 12:28

ivan4th mentioned this pull request Oct 23, 2024

[Merged by Bors] - sync2: add sqlstore #6405

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: add Bloom filters for ATXs and malicious identities #6332

sql: add Bloom filters for ATXs and malicious identities #6332

ivan4th commented Sep 13, 2024 •

edited

Loading

codecov bot commented Sep 14, 2024 •

edited

Loading

dshulyak commented Sep 14, 2024

dshulyak commented Sep 14, 2024

ivan4th commented Sep 14, 2024

ivan4th commented Sep 14, 2024

ivan4th commented Sep 14, 2024

dshulyak commented Sep 16, 2024 •

edited

Loading

dshulyak commented Sep 16, 2024 •

edited

Loading

poszu Sep 16, 2024

poszu Sep 16, 2024

poszu Sep 16, 2024

poszu Sep 16, 2024

poszu Sep 16, 2024

poszu Sep 16, 2024

poszu Sep 16, 2024

poszu Sep 16, 2024

poszu Sep 16, 2024

poszu commented Sep 16, 2024

dshulyak commented Sep 16, 2024

ivan4th commented Oct 7, 2024

		@@ -0,0 +1,212 @@
		// Package expr proviedes a simple SQL expression parser and builder.

	// Package expr proviedes a simple SQL expression parser and builder.
	// Package expr provides a simple SQL expression parser and builder.

	// Bloom filter size is < 234 KiB while below 100k identities.
	// Bloom filter size is < 234 KiB while below 100k malicious identities.

	func (p *MalfeasancePublisher) Publish(ctx context.Context, id types.NodeID, proof wire.Proof) error {
	// TODO(mafa): implement me
	return nil
	}

sql: add Bloom filters for ATXs and malicious identities #6332

Are you sure you want to change the base?

sql: add Bloom filters for ATXs and malicious identities #6332

Conversation

ivan4th commented Sep 13, 2024 • edited Loading

Motivation

Description

Test Plan

TODO

codecov bot commented Sep 14, 2024 • edited Loading

Codecov Report

dshulyak commented Sep 14, 2024

dshulyak commented Sep 14, 2024

ivan4th commented Sep 14, 2024

ivan4th commented Sep 14, 2024

ivan4th commented Sep 14, 2024

dshulyak commented Sep 16, 2024 • edited Loading

dshulyak commented Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

poszu commented Sep 16, 2024

dshulyak commented Sep 16, 2024

ivan4th commented Oct 7, 2024

ivan4th commented Sep 13, 2024 •

edited

Loading

codecov bot commented Sep 14, 2024 •

edited

Loading

dshulyak commented Sep 16, 2024 •

edited

Loading

dshulyak commented Sep 16, 2024 •

edited

Loading