Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQLInterface for dynamic replacement of SQLite implementations #1701

Draft
wants to merge 4 commits into
base: development
Choose a base branch
from

Conversation

marcprux
Copy link

This draft PR replaces the GRDBSQLite systemLibrary with a target that declares all the SQLite3 symbols in SQLiteThunk.swift, which delegate all their implementations through a global public var SQLI: SQLiteInterface protocol, which itself declares the functions and variables for the entire SQLite3 interface. There is also a SystemSQLiteInterface.swift implementation that implements the interface and passes them all through to the system SQLite3 module.

The idea is that by enabling the dynamic replacement of the SQLite3 implementation, a custom build of SQLite can be included in a way that is compatible with pure SwiftPM, and doesn't require any of the special build gymnastics that are currently needed. This would enable, for example, swapping in a custom SQLCipher build build at runtime. My personal motivation is to bring GRDB to Android, which doesn't expose any user-accessible SQLite library and so always needs to include a SQLite source build.

This draft PR is minimally code-intrusive in order to illustrate the salient points, but one might imagine that rather than having a single global public var SQLI: SQLiteInterface variable, the SQLiteInterface could be provided at GRDB initialization time, which opens the possibility of having multiple simultaneous SQLite builds in the same app. But I imagine most environments will only care about a single global implementation, so I doubt it would be worth the trouble.

Two rough spots right now are that I had to disable snapshot and FTS5 support, since they both need to use custom structs from the system SQLite3 library, and I didn't bother creating pass-though wrappers for these. I expect that resolving this would be a straightforward exercise. With these tests disabled, all the tests pass on my machine:

Test Suite 'All tests' passed at 2025-01-19 20:02:28.011.
	 Executed 2729 tests, with 98 tests skipped and 0 failures (0 unexpected) in 111.887 (111.985) seconds

Pull Request Checklist

  • CONTRIBUTING: You have read https://github.com/groue/GRDB.swift/blob/master/CONTRIBUTING.md
  • BRANCH: This pull request is submitted against the development branch.
  • DOCUMENTATION: Inline documentation has been updated.
  • DOCUMENTATION: README.md or another dedicated guide has been updated.
  • TESTS: Changes are tested.
  • TESTS: The make smokeTest terminal command runs without failure.

@groue
Copy link
Owner

groue commented Jan 20, 2025

Hello @marcprux,

Wow, that's ambitious :-) This is a much welcomed exploration. Custom SQLite builds and SQLCipher indeed require some uneasy "gymnastics".

There are two important GRDB features that this pull request does not address, and I'm wondering what could be done in order to fulfill the needs of GRDB users:

  • Performance. GRDB is currently specialized for each SQLite implemention, and supports inlining SQLite calls. The var SQLI: SQLiteInterface is an existential that prevents specialization. I expect the GRDBOSXPerformanceTests target of Tests/Performance/GRDBPerformance/GRDBPerformance.xcodeproj to show important performance regressions.

    (If you want to check it, rebase your branch on development: I've just restored the compilation of performance tests.)

  • Availability Cheks. That's FTS5, snapshots, etc. In general, for a given sqlite3_xxx api that is not available in all implementations, or a feature of the SQL language that is not available in all implementations, the current GRDB is able to:

    • For system SQLite builds, we define the @availability of the Swift APIs that use this feature (e.g. iOS 19+)
    • For custom SQLite builds (known SQLite version), we always expose Swift APIs that use this feature, without availability checks. If the presence of the feature depends on an SQLite compilation option, guard those Swift APIS behind a #if SQLITE_XXX.
    • For SQLCipher, for which we don't know the version, usually we never expose the Swift APIs that depend on sqlite3_xxx, in order to avoid link errors, and always expose the Swift APIs that depend on SQL features, and rely on runtime errors if they are called.

…es to control whether to use the new GRDBSQLiteDynamic module and enable performance tests
@marcprux
Copy link
Author

  • Performance. GRDB is currently specialized for each SQLite implemention, and supports inlining SQLite calls. The var SQLI: SQLiteInterface is an existential that prevents specialization. I expect the GRDBOSXPerformanceTests target of Tests/Performance/GRDBPerformance/GRDBPerformance.xcodeproj to show important performance regressions.
    (If you want to check it, rebase your branch on development: I've just restored the compilation of performance tests.)

I've rigged up the Package.swift so this feature can be enabled/disabled with a GRDB_SQLITE_INLINE environment variable, and also added the performance tests to the package when you set GRDB_PERFORMANCE_TESTS=1, so they can be compared side-by-side more easily.

I've run these performance tests a few times, and the result is that using GRDBSQLiteDynamic is between 8%–10% slower than the inline-able system GRDBSQLite on my machine (2021 M1 macBook Pro).

rm -r .build; GRDB_SQLITE_INLINE=1 GRDB_PERFORMANCE_TESTS=1 swift test --filter GRDBPerformanceTests
…
Test Suite 'GRDBPackageTests.xctest' passed at 2025-01-20 11:46:39.391.
	 Executed 23 tests, with 0 failures (0 unexpected) in 306.197 (306.198) seconds

rm -r .build; GRDB_SQLITE_INLINE=0 GRDB_PERFORMANCE_TESTS=1 swift test --filter GRDBPerformanceTests
…
Test Suite 'GRDBPackageTests.xctest' passed at 2025-01-20 12:53:50.814.
	 Executed 23 tests, with 0 failures (0 unexpected) in 329.812 (329.814) seconds

If this isn't a tolerable degradation, one option could be to add another package product, like GRDB-inline that builds directly against the system SQLite:

    products: [
        .library(name: "GRDBSQLite", targets: ["GRDBSQLite"]),
        .library(name: "GRDB", targets: ["GRDB"]),
        .library(name: "GRDB-dynamic", type: .dynamic, targets: ["GRDB"]),
        .library(name: "GRDB-inline", targets: ["GRDBInline"]),
    ],
  • Availability Cheks. That's FTS5, snapshots, etc. In general, for a given sqlite3_xxx api that is not available in all implementations, or a feature of the SQL language that is not available in all implementations, the current GRDB is able to:

    • For system SQLite builds, we define the @availability of the Swift APIs that use this feature (e.g. iOS 19+)

I think this would translate directly, since we could just add the same @availability annotations to SQLiteThunk.swift. A corner case might be the scenario where an older iOS wants to use a custom SQLite build with a newer feature enabled (like FTS5).

  • For custom SQLite builds (known SQLite version), we always expose Swift APIs that use this feature, without availability checks. If the presence of the feature depends on an SQLite compilation option, guard those Swift APIS behind a #if SQLITE_XXX.

I suppose these checks would need to turn into dynamic feature availability checks. So instead of:

#if SQLITE_ENABLE_FTS5

#endif

we would do something like:

if SQLI.hasFeature(.fts5) {
}

SQLiteThunk.swift would expose a superset of all the (latest) available SQLite options, and then the individual implementations would stub out anything that is unavailable, like:

public enum SystemSQLiteInterface : SQLiteInterface {
    public func hasFeature(_ feature: SQLiteInterface.Feature) -> Bool {
        if feature == .snapshots { return false }
        return true
    }

    public func sqlite3_snapshot_recover(_ db: OpaquePointer!, _ zDb: UnsafePointer<CChar>!) -> Int32 {
        fatalError("unsupported in this build")
    }
}
  • For SQLCipher, for which we don't know the version, usually we never expose the Swift APIs that depend on sqlite3_xxx, in order to avoid link errors, and always expose the Swift APIs that depend on SQL features, and rely on runtime errors if they are called.

Do you mean for non-standard API additions in custom builds, like sqlite3_rekey? In these scenarios, we could assume that the customization has some standard mechanism for achieving the same functionality (like PRAGMA rekey does in this case), so we don't need to surface every non-standard API addition through SQLiteThunk.swift.

Happy to explore this further and sketch out of how these availability checks might look.

@groue
Copy link
Owner

groue commented Jan 20, 2025

  • Performance.

I've rigged up the Package.swift so this feature can be enabled/disabled with a GRDB_SQLITE_INLINE environment variable, and also added the performance tests to the package when you set GRDB_PERFORMANCE_TESTS=1, so they can be compared side-by-side more easily.

Thanks for running the tests!

I've run these performance tests a few times, and the result is that using GRDBSQLiteDynamic is between 8%–10% slower than the inline-able system GRDBSQLite on my machine (2021 M1 macBook Pro).

If this isn't a tolerable degradation, one option could be to add another package product, like GRDB-inline that builds directly against the system SQLite:

    products: [
        .library(name: "GRDBSQLite", targets: ["GRDBSQLite"]),
        .library(name: "GRDB", targets: ["GRDB"]),
        .library(name: "GRDB-dynamic", type: .dynamic, targets: ["GRDB"]),
        .library(name: "GRDB-inline", targets: ["GRDBInline"]),
    ],

A 8% regression is more than I expected, for an I/O library...

Anyway. GRDB targets SQLite, not a remote database server: we can't just assume network swallows everything. Performance counts. It's one of the GRDB selling points. We should aim at improving performances, instead of degrading them.

If I'm correct and the existential is the source of the regression, then database connections would need to become generic in order to enable compiler specialization.

Performance is important, and source compatibility is important as well. I guess we'd need to typealias Database to SQLiteConnection<SystemSQLiteInterface> in order to avoid breaking the compilation of the (most frequent) apps that use the system SQLite. Other apps (custom SQLite, SQLCipher, etc.) would need to perform a major refactoring. In all cases, making this refactoring as easy as possible is important (and the best is when no refactoring is needed at all).

Not easy problems to solve.

  • Availability Cheks.

A corner case might be the scenario where an older iOS wants to use a custom SQLite build with a newer feature enabled (like FTS5).

It is not a corner case. It is the main use case. Why do people use a custom SQLite build, if not for a total control on the enabled SQLite features? Availability checks must absolutely be avoided.

This is not trivial:

I suppose these checks would need to turn into dynamic feature availability checks.

You can check for the availability of SQL features at runtime (those are just strings), but you can't check for the availability of sqlite3_xxx symbols: the linker passes, or not.

@available must stay for the users of system SQLite. They provide clear compiler diagnostics, and excellent ergonomics. They are visible in the documentation. There's a whole ecosystem around availability checks.

And @available must be removed for the users of other SQLite builds, because they would not mean anything.

It can probably be achieved with a rich set of protocols instead of the single SQLiteInterface. The amount of necessary work, and the maintenance burden, make me sweat.

Happy to explore this further and sketch out of how these availability checks might look.

That's very nice of you. Take care that I have exposed some serious issues to solve. Maybe you are on a good track, but maybe this is a dead-end as well. I do not want that you think you're spending time in vain, and that's why I have tried to expose as clearly as possible some design goals above.

@marcprux
Copy link
Author

A 8% regression is more than I expected, for an I/O library...

Agreed, I was a bit surprised as well. I haven't delved into the contents of the performance tests and whether they exemplify real-world use, but regardless I agree that it is a significant degradation.

Performance is important, and source compatibility is important as well. I guess we'd need to typealias Database to SQLiteConnection<SystemSQLiteInterface> […]

Not only that, but any type that references any sqlite3_* function (Row, Statement, and dozens more) would also need this generic specialization. It would be a massive change touching every part of GRDB, and I suspect it is a non-starter.

However, one other option could be to instead have two separate GRDB targets (and associated products) with the same sources, but one that links directly to the system SQLite3 library, and the other that goes through the thunks and enables dynamic replacement. So for existing projects that depend on the GRDB product, everything would behave exactly the same as it does now, and dependents on the new GRDBX product would accept the performance reduction in order to be able to swap in a custom SQLite build.

[…] Availability checks must absolutely be avoided.

I'll ponder this issue some more, but it sounds like it is all moot if we can't find a solution to the performance implications of this change.

@groue
Copy link
Owner

groue commented Jan 21, 2025

Performance is important, and source compatibility is important as well. I guess we'd need to typealias Database to SQLiteConnection<SystemSQLiteInterface> […]

Not only that, but any type that references any sqlite3_* function (Row, Statement, and dozens more) would also need this generic specialization. It would be a massive change touching every part of GRDB, and I suspect it is a non-starter.

I was actually contemplating this solution as acceptable. You keep on:

However, one other option could be to instead have two separate GRDB targets [...] and dependents on the new GRDBX product would accept the performance reduction in order to be able to swap in a custom SQLite build.

I wish we could avoid this performance reduction.

Separate targets are not a bad idea, though. They could help choosing between system and custom SQLite builds, so that only the desired SQLite version would be linked. It's very complicated to deal with duplicated symbols. SQLite build system does not support symbol prefixing (maybe Clang can do it (llvm-objcopy?), but now it's beyond my skills). I'm not sure we need to support multiple different SQLite implementations at the same time (in a single app) anyway.

[…] Availability checks must absolutely be avoided.

I'll ponder this issue some more, but it sounds like it is all moot if we can't find a solution to the performance implications of this change.

I think a forest of protocols that encode the presence of SQLite features could help. I'll think about it a little more and come back.

@groue
Copy link
Owner

groue commented Jan 22, 2025

I think a forest of protocols that encode the presence of SQLite features could help. I'll think about it a little more and come back.

We could probably encode SQLite interfaces in a set of protocols:

// Encodes the minimal supported SQLite version
public protocol SQLiteInterfaceMinimal {
    func sqlite3_libversion_number() -> CInt
}

// Encode SQLite versions
public protocol SQLiteInterface_3035000 {
    // APIs enabled from SQLite 3.35.0
}
public protocol SQLiteInterface_3037000 {
    // APIs enabled from SQLite 3.37.0
}

// Encode compilation options
public protocol SQLiteInterface_3035000_HAS_CODEC {
    // APIs enabled from SQLite 3.35.0 with SQLITE_HAS_CODEC
}
public protocol SQLiteInterface_3035000_ENABLE_SNAPSHOT {
    // APIs enabled from SQLite 3.35.0 with SQLITE_ENABLE_SNAPSHOT
}
public protocol SQLiteInterface_3037000_ENABLE_SNAPSHOT {
    // APIs enabled from SQLite 3.37.0 with SQLITE_ENABLE_SNAPSHOT
}

System Interface

The system interface could be defined as:

import SQLite3

public struct SystemSQLiteInterface: SQLiteInterfaceMinimal {
    @inlinable @_transparent @_alwaysEmitIntoClient @inline(__always)
    func sqlite3_libversion_number() -> CInt {
        SQLite3.sqlite3_libversion_number()
    }
}

@available(iOS 14.0, *) // made up numbers!
extension SystemSQLiteInterface: SQLiteInterface_3035000 { }

@available(iOS 14.0, *) // made up numbers!
extension SystemSQLiteInterface: SQLiteInterface_3035000_ENABLE_SNAPSHOT { }

@available(iOS 15.0, *) // made up numbers!
extension SystemSQLiteInterface: SQLiteInterface_3037000 { }

@available(iOS 15.0, *) // made up numbers!
extension SystemSQLiteInterface: SQLiteInterface_3037000_ENABLE_SNAPSHOT { }

// Typealiases for source compatibility
public typealias Database = SQLiteConnection<SystemSQLiteInterface>
public typealias DatabaseQueue = SQLiteDatabaseQueue<SystemSQLiteInterface>
public typealias DatabasePool = SQLiteDatabasePool<SystemSQLiteInterface>

(Note all annotations on sqlite3_libversion_number(): we'd use whatever is needed that guarantees inlining).

SQLCipher

This module would ship with a pre-build SQLCipher amalgamation, with a fixed set of SQLite compilation options. Users could fork and tweaks compilation options. Hence the #if SQLITE_ENABLE_SNAPSHOT.

import SQLCipher

public struct SQLCipherInterface: SQLiteInterfaceMinimal {
    @inlinable @_transparent @_alwaysEmitIntoClient @inline(__always)
    func sqlite3_libversion_number() -> CInt {
        SQLCipher.sqlite3_libversion_number()
    }
}

extension SQLCipherInterface: SQLiteInterface_3035000 { }
extension SQLCipherInterface: SQLiteInterface_3035000_HAS_CODEC { }
extension SQLCipherInterface: SQLiteInterface_3037000 { }

#if SQLITE_ENABLE_SNAPSHOT // customizable by Package.swift
extension SQLCipherInterface: SQLiteInterface_3035000_ENABLE_SNAPSHOT { }
extension SQLCipherInterface: SQLiteInterface_3037000_ENABLE_SNAPSHOT { }
#endif

// Typealiases for source compatibility
public typealias Database = SQLiteConnection<SQLCipherInterface>
public typealias DatabaseQueue = SQLiteDatabaseQueue<SQLCipherInterface>
public typealias DatabasePool = SQLiteDatabasePool<SQLCipherInterface>

Custom SQLite

This module would ship with a pre-build SQLite amalgamation, with a fixed set of SQLite compilation options. Users could fork and tweaks compilation options. Hence the #if SQLITE_ENABLE_SNAPSHOT.

import CustomSQLite

public struct CustomSQLiteInterface: SQLiteInterfaceMinimal {
    @inlinable @_transparent @_alwaysEmitIntoClient @inline(__always)
    func sqlite3_libversion_number() -> CInt {
        CustomSQLite.sqlite3_libversion_number()
    }
}

extension CustomSQLiteInterface: SQLiteInterface_3035000 { }
extension CustomSQLiteInterface: SQLiteInterface_3037000 { }

#if SQLITE_ENABLE_SNAPSHOT // customizable by Package.swift
extension CustomSQLiteInterface: SQLiteInterface_3035000_ENABLE_SNAPSHOT { }
extension CustomSQLiteInterface: SQLiteInterface_3037000_ENABLE_SNAPSHOT { }
#endif

// Typealiases for source compatibility
public typealias Database = SQLiteConnection<CustomSQLiteInterface>
public typealias DatabaseQueue = SQLiteDatabaseQueue<CustomSQLiteInterface>
public typealias DatabasePool = SQLiteDatabasePool<CustomSQLiteInterface>

@marcprux
Copy link
Author

marcprux commented Jan 22, 2025

I've updated the PR by getting rid of the global var SQLI: SQLiteInterface and instead having a new SQLiteAPI protocol:

public protocol SQLiteAPI {
    associatedtype SQLI: SQLiteInterface
}

The new SQLiteInterfaceSupport.swift file provides retroactive conformance for most of the types that call any sqlite3_ functions. E.g.:

extension Database : SQLiteAPI { public typealias SQLI = DefaultSQLiteInterface }
extension Row : SQLiteAPI { public typealias SQLI = DefaultSQLiteInterface }

DefaultSQLiteInterface is aliased to SystemSQLiteInterface, which just implements it like:

public enum SystemSQLiteInterface : SQLiteInterface {
    case system

    public static var SQLITE_VERSION: String { SQLite3.SQLITE_VERSION }
    public static var SQLITE_VERSION_NUMBER: Int32 { SQLite3.SQLITE_VERSION_NUMBER }
    public static func sqlite3_libversion() -> UnsafePointer<CChar>! { SQLite3.sqlite3_libversion() }
    public static func sqlite3_libversion_number() -> Int32 { SQLite3.sqlite3_libversion_number() }
    // et cetera
}

This enables us to experiment without being too high-touch (although I did need to make a bunch of changes to some of the GRDB types). The type extensions in SQLiteInterfaceSupport.swift are the primary ones that would need to become generified. Plus, the foundation types in there that implement SQLiteAPI would need to be refactored somehow so they know where to get the sqlite3_ functions.

Performance can still be compared between the old and new with:

rm -r .build; GRDB_SQLITE_INLINE=1 GRDB_PERFORMANCE_TESTS=1 swift test --filter GRDBPerformanceTests
rm -r .build; GRDB_SQLITE_INLINE=0 GRDB_PERFORMANCE_TESTS=1 swift test --filter GRDBPerformanceTests

(I'm unable to provide good performance comparisons just now because my machine is loaded down with other processes, but you should be able to run those commands against a checkout of this PR to see for yourself.)

Some notes:

  • I haven't tried adding the inlinable annotations and seeing what difference they make.
  • @available annotations for macOS are added to the SQLiteAPI; the equivalent iOS annotations are TODO
  • I haven't broken the protocol down into version- and feature-specific sub-protocols, but I imagine that would be a relatively easy task
  • FTS5 and Snapshots are still disabled, as we would need to determine how to pass the custom structs (fts5_api, sqlite3_snapshot, etc) through to the SQLite3 build
  • Overall, this wasn't as bad as I had feared. I did need to add a few top-level symbols to SQLiteInterfaceSupport.swift that delegate to SystemSQLiteInterface for global functions that reference things like SQLITE_OK, but that's probably fine: I doubt SQLITE_OK will ever be different in any custom build of SQLite.

Let me know if you think this direction is worth pursuing.

@groue
Copy link
Owner

groue commented Jan 23, 2025

I'm really interested in those explorations, and at the same time:

Let me know if you think this direction is worth pursuing.

Breaking changes are probably unavoidable, and there are many details to be ironed out before an eventual merge. So this will probably not ship in GRDB 7.

I honestly don't know what to suggest to you, because I respect your time very much. I understand you want to use GRDB on Android, and maybe you can do it from your fork already.

That would already be a great achievement. And you'd get a lot of experience from it. Experience that could be shared here, for the benefit of all.

By limiting your fork to your own needs, you can ignore features that you do not use, such as FTS5. I recommend supporting SQLite snapshots if you want to use ValueObservation (without snapshots, observations started from a DatabasePool always stutter: they publish their initial value twice, even if there is no database change).

In all cases, this conversation is precious: it's the first time I start to believe that abstracting the SQLite implementation is possible. I fully agree about the "gymnastics" your have described. Since the vanishing of CocoaPods, supporting SQLCipher has become complicated. Supporting custom SQLite builds has always been complicated. Making the life of GRDB users better is a continuous effort. Some features take time, and I'm OK with this (if I weren't, all pleasure maintaining this library would vanish).

@marcprux
Copy link
Author

maybe you can do it from your fork already

I've scaled back my ambitions and I think I've found a good compromise in #1708. This would make GRDB more easily accommodate a GRDB-SQLCipher fork that could be easily be maintained, either by me or (ideally) by you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants