Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to check which records have been updated when liveQuery runs? #1792

Open
secondl1ght opened this issue Aug 20, 2023 · 16 comments
Labels

Comments

@secondl1ght
Copy link

Say I use liveQuery to display records from my Dexie DB in my application and for each record I need to run a decrypt function in order to display the results to a user. With 1000s of records you would want to only run the decrypt function once when liveQuery first loads the data. And then when an update triggers the Observable to run again, you would only want to decrypt the newly added record. Is this type of functionality supported with Dexie? I looked at the liveQuery docs but I didn't see anything that talked about being able to know which records exactly are new when the event is triggered, and then to be able to conditionally run functions on only those records. Thanks!

@dfahlander
Copy link
Collaborator

dfahlander commented Aug 20, 2023

Currently it's only possible to observe the query results and if they are encrypted, the encypted items is what will be observed. Decryption will have to happen on all items whenever a new item is added or updated (unless solving some caching outside dexie). With the new cache/optimistic updates functionality in 4.x, we could find a way to optimize this by offering some kind of hooks into the optimistic updates engine.

@dfahlander
Copy link
Collaborator

One side note: If using the option { cache: 'immutable' } for the Dexie constructor in [email protected], you could implement this optimization on your end because any instance that haven't changed since last time will be the exact same reference. Basically, you could have a static WeakMap that maps encrypted objects to their decrypted versions. Then, instead of decrypting all 1000 items every time, you lookup each item from the WeakMap to check if you already have a decrypted value for it. If not, decrypt it and put it in the WeakMap. This works because when using the { cache: 'immutable' } option, Dexie will return frozen unchanged objects directly from its cache instead of deep-cloning them every time.

@secondl1ght
Copy link
Author

secondl1ght commented Aug 20, 2023

Ok thanks for the response, this WeakMap idea sounds similar to what I had in mind. I was thinking something like storing the decrypted query results in a new variable and then each time the Observable updates I would loop over all the records and see if the record ID is included in the decrypted array. If it is then I do nothing, but if it isn't I decrypt and add it to the array. But I was thinking that this might be very inefficient (loop over 1000s of records to find 1 new one). Especially if there are many records and they update every few seconds. I think I may run into CPU issues if I run this expensive loop every few seconds, do you think this would be a problem? If liveQuery also returned an array of record IDs that were added or updated that would be ideal and then you can only run formatting functions on those few records. I can test the other looping method and see how slow/fast it is to find out if it is a viable option in practice...

@dfahlander
Copy link
Collaborator

dfahlander commented Aug 20, 2023

Well it might be ways to avoid looping the records, but it's almost a no-op to loop over thousands of record to just check against a WeakMap. In this snippet I'm looping over one million records checking against a weak map. It took 11 milliseconds on my laptop - and that is when the weak map is also filled with a million objects.

@secondl1ght
Copy link
Author

secondl1ght commented Aug 20, 2023

Ok interesting, I realized though that this method is not possible with the data structure of my DB. Because checking if the record exists or not is not good enough since my table contains an Array of Objects column (which contains the key/value that is encrypted) that can be updated. So even if a record exists already in my cloned and decrypted copy of the stored data, I would still want to replace the entire nested Array of Objects because that would likely be quicker than trying to check and see which key/value in the Objects was updated. So I could replace everything EXCEPT the encrypted field because I know this field is immutable and cannot be updated. But if I am already replacing basically everything in the cloned decrypted copy of the DB except for the encrypted content then I am basically just re-implementing the functionality of what the Observable does in the first place right?

For this reason I am leaning towards changing how the apps works and seeing if I can just decrypt and update all the records in the DB when the app loads, and then when the app closes encrypt all the records in the DB again. This will actually speed up the initial sync slightly by not having to encrypt the data at that point. And then I can use liveQuery normally and just watch for any changes and render DOM elements without needing to run formatting functions on each update.

I think the only thing I need to figure out to make this work is if it is possible ensure that the encryption function runs and updates the records in the DB even if the user does not do a graceful and explicit logout step when exiting the app. If it is possible to run functions in the background when a user force-closes a Web App by closing the browser tab/window then I think this method can work well. The DB can be in a decrypted state while the app is running as long as it returns to its encrypted state when the app is closed. Running a bulkGet + bulkPut on every record in the DB may also be resource intensive but it might be make more sense to take this approach?

@secondl1ght
Copy link
Author

I thought of a third solution to this problem because after looking into running functions when the browser closes I found that there is not a very reliable way to do so (there is the unload event but it is not reliable according to docs). The third option would be to not use Dexie as the source of truth while the app is active. Meaning that I could have a copy of the same data that is in Dexie, but decrypted, living only in the browser memory so that liveQuery is not required to dynamically update the UI. Whenever a new record needs to be added or updated I run this on the Dexie DB but also add it to the temp data in memory. So Dexie becomes more of a tool to persist the data locally across sessions rather than to be the main data source that the UI relies on for live updates during an active session. Maybe this is the best solution?...

@dfahlander
Copy link
Collaborator

It depends on your concurrancy requirements, size of database and memory consumption limits for small devices. If a user loads everything to memory in one tab, leave it open and opens it again in another tab, it would load older data? And they might overwrite each other. Also, memory consumption could be a problem if you target mobile devices and the data can grow. The good thing with liveQuery (with or without the 4.0 cache) is that it only observes what you query - not the entire database - so it generates less memory footprint than caching the entire database.

Also if you need to work with the data from a worker, you'd need to have a way to communicate that loaded data between worker and browser window. If relying on dexie's liveQuery you are always working against the realtime data from either worker or browser window.

My general advice is to not store large arrays of items in IndexedDB if caring about query performance and also data consistency, but instead adapt a relational model where every item is stored in its own row so it can be efficiently indexed and queried without loading large nested objects. IndexedDB is more similar to a relational database in that way - it is slow in storing large nested data - specifically when updating a single item of an array you have to update the entire object. Better to flatten out the arrays to objects stored in tables so that updating an object is a small operation.

If you'd be storing the data in individual objects (and not in arrays on objects), the encryption performance should not be an issue (as of my humble guess!). Using {cache: 'immutable'} should solve the problem with re-decrypting same data several times because any little change of an object will result in a new object reference so that you get the same reference back when the object is untouched. This would not be possible using a map between id's and decrypted values, because the same ID could have different data on it so you still would need to re-decrypt.

@secondl1ght
Copy link
Author

Oh right great point, I forgot about the concurrancy issue... so yeah the memory idea is not a solution.

Moving my Array of Objects column into it's own table with a relational ID field that I can use to join the data together after querying it from the DB is something I considered when setting up my data structure. BUT I thought that it would be better to get the data into its final format before storing it in the DB rather than having to join the data every time liveQuery updates.

But you are saying that with those Objects in their own Table and using cache: immutable that I can only run the decrypt function once? And this benefit of performance will be greater than the additional processing time of needing to join the tables after each update?

I still don't really understand how the cache: immutable will work, inside my liveQuery function I will be looping over the data and running the decrypt, so how are the non-updated records excluded from this loop?

Also it's worth pointing out that currently my app has the entire DB being used all the time. I thought it would be good to have all the data available because it is a SPA and all the data can be visible within the SPA in one click. I was thinking this would lead to the app loading very fast. But now that I see how some of the data will need to be processed after querying the DB I am thinking it might be smart to refactor and only load the data from the DB that is visible in the UI. This should reduce the impact of having to loop over large amounts of records from the DB and run the formatting functions on them. I also might not need to even join the data if I use this method.

So I am thinking the following steps could be the best path forward:

  1. refactor from having 1 Table into 2 Tables, with the Array of Objects being its own table with a relational ID field
  2. figure out how to use the cache: immutable to only decrypt the encrypted values once
  3. change from querying the entire DB to isolating queries based on the app state

@dfahlander
Copy link
Collaborator

dfahlander commented Aug 22, 2023

...
So I am thinking the following steps could be the best path forward:

  1. refactor from having 1 Table into 2 Tables, with the Array of Objects being its own table with a relational ID field
  2. figure out how to use the cache: immutable to only decrypt the encrypted values once
  3. change from querying the entire DB to isolating queries based on the app state

Sounds like a good plan! I'm not sure you'll need the optimization of only decrypting things once if your responses are few, but in case you have a view that shows 1000 items at the same time and they are updated frequently, you could optimize this using {cache: 'immutable'} in the following way:

const decryptedItems = new WeakMap<Item, DecryptedItem | Promise<DecryptedItem>>(); // Keep this as a global variable.

const dectyptedItemsObservable = liveQuery(async () => {
  const encryptedItems = await db.items.toArray();
  const result = encryptedItems.map(item => {
     let decryptedItem = decryptedItems.get(item);
     if (!decryptedItem) {
       decryptedItem = decrypt(item);
       decryptedItems.set(item, decryptedItem);
     }
     return decryptedItem;
  });
  // If 'decrypt()' is a synchronous function:
  return result;
  // If `decrypt()` is an asynchronous function:
  return await Promise.all(result);
});

Note that if your decrypt function would be async (as it would if you'd use WebCrypto, but not if you'd use TweetNacl or other userland crypto library), then the result would be an array of promises that you'd need to resolve using Promise.all().

You'd probably want to break out all this as a helper function though. It could be generic and the weakmap would work regardless of what type of items you are retrieving - you could put anything there and it will auto garbage-collect itself.

// With helper function

const observable = liveQuery(async () => {
  const specificItems = await db.items.where('foo').equals('bar').toArray();
  return await decryptHelper(specificItems);
});

async function decryptHelper(encryptedItems) {
  const result = encryptedItems.map(item => {
     let decryptedItem = decryptedItems.get(item);
     if (!decryptedItem) {
       decryptedItem = decrypt(item);
       decryptedItems.set(item, decryptedItem);
     }
     return decryptedItem;
  });
  // If 'decrypt()' is a synchronous function:
  return result;
  // If `decrypt()` is an asynchronous function:
  return await Promise.all(result);  
}

@secondl1ght
Copy link
Author

Awesome thanks again for the help! I have begun the refactor, I think this makes much more sense to flatten out the data into multiple Tables and be much more granular with my queries based on the current view. I will likely still try to optimize the decryption even though it will be much less critical now that I am not running it on all the records all the time. :)

@secondl1ght
Copy link
Author

secondl1ght commented Aug 27, 2023

If I store the decrypted items in a Svelte Store and use the update method of this Store inside the liveQuery to run a callback that checks for the record IDs and only runs the decrypt fn if they are not present, would I still need to use the { cache: 'immutable' } option? There would be no WeakMap because I am storing the values in the Svelte Store for reactive use in other components within the app. I basically pass along the liveQuery result into a Svelte Store and use those formatted values instead of the return value of liveQuery directly. Actually the return value of liveQuery wouldn't even need to be stored in a variable because I will not be referencing it anywhere. Inside the liveQuery function I just run svelteStoreVar.update(async (v) => ...) and then in my other files I use $svelteStoreVar. This seems like a great way to handle this within Svelte, I am just not sure if I still need the immutable cache or not? It would look something like this:

liveQuery(async () => {
const query = await db.test.where('test').equals(true)

svelteStorVar.update(async (v) => {
const result = ...
// loop over the query and check current store Array values and only update and decrypt records that are not found in the Store
return result
// the return value updates the Svelte Store
})

// nothing is returned from liveQuery
});

EDIT: Looks like I am getting a type error when trying to use an async function on the Store update method callback function because it will set the store to a Promise value which we don't want. So I think I could adjust the code by just using the get and set methods of the store instead. But I am still unsure if I need the cache option or not...

@secondl1ght
Copy link
Author

UPDATE: I am leaning towards taking your advice and not trying to optimise the query decryption since after my refactor I think at most there should only be a few 100 records visible at any given time in the UI. They can update frequently but I should test the UI first to see how it performs before over optimising I think... Checking that the values exist and updating all non-immutable columns for each record will itself take up some CPU and might even end up being similar to running a decrypt function... (since I dont know which records and columns were updated, I need to an update on every query anyway).

I think I will continue to have the values in a Svelte Store though so I can access them across components instead of using the return value from the liveQuery. This restricts you to using the data in the same file or within components lower in the tree.

So I think if I am not running this optimize check and just formatting the data each query, then I dont need to worry about changing the cache option from its default setting right?

@secondl1ght
Copy link
Author

secondl1ght commented Aug 28, 2023

Here is what I ended up going with if you are interested, I realized after testing that liveQuery will not run if you dont assign it to a variable and use that variable somewhere in the code. So I gave it a var and just console.logged the var below. I tested this and everything seems to be working correctly, one thing I noticed is that the decrypted value is still a Promise but it should always be resolved since we are using Promise.all before setting the Store value... I used your code above as inspiration for how to do this with the map. :)

	let conversationsQuery = liveQuery(async () => {
		try {
			const convos = await db.conversations
				.where('blocked')
				.equals('false')
				.reverse()
				.sortBy('lastUpdate');

			const msgs = await db.messages.bulkGet(convos.map((c) => c.latestMessage));

			const convosFormatted = convos.map((c) => {
				const latestMessage = msgs.find((m) => m?.id === c.latestMessage);

				if (latestMessage) {
					const decryptedMsg = decrypt(latestMessage.iv, latestMessage.message);

					return { ...c, message: decryptedMsg };
				} else {
					return c;
				}
			});

			const result = await Promise.all(convosFormatted);

			conversations.set(result);

			return 'Conversations query complete.';
		} catch (error) {
			console.log(error);
			errorToast('Could not query conversations.');
		}
	});

	$: console.log($conversationsQuery);

UPDATE: My decrypt function is throwing an error but only when used in this liveQuery code, I tested calling it in another file. When I console.log the error there is no message though, it only says Error... 🤔 I am using the browser crypto module for this.

image

It's weird that it says the error is uncaught even though I have it in a try...catch statement.

image

@secondl1ght
Copy link
Author

secondl1ght commented Aug 28, 2023

It seems to somehow be an issue with storing the Uint8Array and ArrayBuffer values that are needed for the encrypted content in Dexie and then retrieving those value and passing them into the decrypt function. Because when I use the same encrypt function without passing the values to Dexie, I am able to decrypt without an error. But when I get them from Dexie there is an error when trying to decrypt... even though they are typed and I have console.logged them to be sure they are in the right format. I followed the docs whens using WebCrypto and it works outside of Dexie so I am stuck as to why its not working. The decrypt function takes an iv and ciphertext and I pass those values right into Dexie after generating them. Something happens in the process after storing and using them again that causes the decrypt to fail. :\

UPDATE: It seems to actually be an issue with calling decrypt inside the map loop but I can't figure out how to get it working. Calling it once doesn't throw an error, so maybe you can only have one WebCrypto decrypt running at a given time?

@dfahlander
Copy link
Collaborator

dfahlander commented Aug 29, 2023

Don't see what error you get. There's a section in the docs of liveQuery you should read - specifically to do Promise.resolve() or Dexie.waitFor() around promises returned from web crypto calls. Else than that, if you don't find a solution, please create a standalone repro of the issue.

@dfahlander
Copy link
Collaborator

Sorry pasted the wrong link. Should be https://dexie.org/docs/liveQuery()#rules-for-the-querier-function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants