Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick repair increasing database size #934

Open
SmarakNayak opened this issue Jan 14, 2025 · 15 comments
Open

Quick repair increasing database size #934

SmarakNayak opened this issue Jan 14, 2025 · 15 comments

Comments

@SmarakNayak
Copy link

I run a fork of ord and have noticed that enabling quick_repair has greatly increased my database size. My index went from 400GB -> 2+TB.

My index with quick repair enabled has ~525m allocated, but ~57m leaf pages & 2m branch, whereas my index with quick repair off looks a lot more reasonable with 45m allocated, 32m lead & 1m branch. Please note they're indexed to slightly different heights so not a perfect comp.

When indexing the standard ord index without my extra tables, there is no size difference between turning quick_repair on & off.

Is this expected behaviour, or am I likely doing something wrong/running into some edge case?

@cberner
Copy link
Owner

cberner commented Jan 16, 2025

Hmm. It's hard to say. Can you call WriteTransaction::stats(), print it, then call Database::check_integrity(), and then print the stats again?

@lhallam
Copy link

lhallam commented Jan 17, 2025

I've noticed something similar since updating & setting quick repair: db files started growing significantly beyond the size of their content (500GB for 100MB of data in the worst case). I found that calling compact repeatedly (ignoring a return value of false & checking fragmented_bytes instead) fixes it.

Here are the table & database stats for one such file before compaction:

TableStats { tree_height: 4, leaf_pages: 33190, branch_pages: 485, stored_leaf_bytes: 113678978, metadata_bytes: 3847212, fragmented_bytes: 67129682 }
TableStats { tree_height: 1, leaf_pages: 1, branch_pages: 0, stored_leaf_bytes: 50, metadata_bytes: 8, fragmented_bytes: 4038 }

DatabaseStats { tree_height: 5, allocated_pages: 45086, leaf_pages: 33191, branch_pages: 486, stored_leaf_bytes: 113679028, metadata_bytes: 3847821, fragmented_bytes: 515278544575, page_size: 4096 }

check_integrity had no impact.

@cberner
Copy link
Owner

cberner commented Jan 18, 2025

Hmm, that sounds like a bug. What's the easiest way to reproduce this?

@cberner
Copy link
Owner

cberner commented Jan 18, 2025

Also CC'ing @mconst and @raphjaph who might have ideas

@SmarakNayak
Copy link
Author

I had already called check_integrity on the db (compact didn't work due to a TransactionInProgress error), so I reindexed to get db stats pre and post integrity.

Now it's gone back to normal size - so the large file size is not consistent.

The large DB stats post integrity check were:

DatabaseStats { 
  tree_height: 9, 
  allocated_pages: 524796201, 
  leaf_pages: 57751073, 
  branch_pages: 1850448, 
  stored_leaf_bytes: 191970331840, 
  metadata_bytes: 11581058874, 
  fragmented_bytes: 153144026118, 
  page_size: 4096 
}

@mconst
Copy link
Contributor

mconst commented Jan 19, 2025

Hmm, my guess is this is another manifestation of #654 and #810. Those two look like separate issues, but actually the underlying cause is exactly the same: whenever there's an old commit that needs to be kept alive (because the commit contains a savepoint, or because it's the latest durable commit, or because it has a live read), redb becomes too conservative about freeing pages. Specifically, it fails to free pages that are both allocated and released after the commit that's being preserved, even though those pages definitely aren't reachable from it. When this happens, writes to the database continue to allocate new pages like normal but they don't free old pages, so the file grows very quickly.

The issue isn't caused by quick-repair, but quick-repair exacerbates it by causing additional write traffic. Specifically, when quick-repair is enabled, each commit needs to write additional data equal to about 0.02% of the total file size, rounded up to the nearest megabyte. This data gets overwritten each time, so it doesn't normally accumulate -- but whenever you have an old commit that's being kept alive for any reason, that blocks page freeing due to the issue above! So if you (say) create a savepoint and then make a bunch of tiny commits with quick-repair enabled, each commit will cause the file to grow by at least 1 MB, and that space won't be reused until you delete the savepoint. The exact same thing happens without quick-repair if you create a savepoint and then make a bunch of commits that overwrite 1 MB of data each.

Note that in @lhallam's case, the pages had already been freed by the time you ran stats(); that's why they showed up as "fragmented bytes" (i.e. empty space in the file). It would be nice if redb were more aggressive about shrinking the file in this case, but that's a separate issue, and you already found the workaround: call compact() repeatedly until the file stops shrinking.

In @SmarakNayak's case, the pages haven't been freed because the old commit that's keeping them around is still live. That's why compact() is failing, and why redb isn't able to reuse the space yet. Perhaps there's an old savepoint that hasn't been deleted?

@lhallam
Copy link

lhallam commented Jan 19, 2025

Not sure if this deserves it's own issue, but I figured out the write amplification I observed was caused by Table::retain:

This sequence of transactions results in a 500MB file for 11MB of data. Replacing retain with an iter to gather the keys (and evaluate the predicate) then removeing them all is much faster and uses ~70MB.

#[test]
fn redb_retain_write_amplification() -> anyhow::Result<()> {
    use redb::*;

    fastrand::seed(0);
    let keys = (0..10_000).map(|_| fastrand::u64(..)).collect::<Vec<_>>();

    let _ = std::fs::remove_file("test.redb");
    let db = redb::Database::create("test.redb")?;

    for _ in 0..5 {
        let tx = db.begin_write()?;
        println!("{:#?}", tx.stats()?);

        let mut table = tx.open_table(TableDefinition::<u64, &[u8]>::new("table"))?;
        table.retain(|_, _| false)?;

        for key in &keys {
            const MAX: usize = 1024 * 10;
            let n = (MAX as f64).powf(fastrand::f64()) as usize;
            table.insert(key, &[0u8; MAX][..n])?;
        }

        drop(table);
        tx.commit()?;
    }

    Ok(())
}

@cberner
Copy link
Owner

cberner commented Jan 21, 2025

Ah yes, @mconst 's explanation seems right.

@lhallam thanks for the report. I see what's wrong with it. retain() is poorly optimized

@cberner
Copy link
Owner

cberner commented Jan 21, 2025

@lhallam can you try the latest master?

@lhallam
Copy link

lhallam commented Jan 21, 2025

Yes that reduces the size significantly, thank you!

@emilcondrea
Copy link

emilcondrea commented Feb 2, 2025

I am wondering if the fix would solve the issue I encountered today with ord. During a reorg, a rollback from savepoint was initiated and it took 2 hours to rollback and consumed a lot more of RAM on ord22 (redb 2.3.0) vs on ord19 (redb 2.0.0). Ord19 restored the savepoint in a couple of seconds, ord22 took 2 hours.

@mconst
Copy link
Contributor

mconst commented Feb 3, 2025

@emilcondrea I think that's a separate issue. Restoring from a savepoint got much slower in redb 2.1.2 (due to baa86e7); it looks like this is fixed in 2.4.0 (by 51ca54c). Could you try with redb 2.4.0 and see if that helps?

@emilcondrea
Copy link

Thank you! Will give it a try

@emilcondrea
Copy link

@mconst thanks for the suggestion, tried 2.4.0 and it fixes the problem with slow restoring from savepoint

@mconst
Copy link
Contributor

mconst commented Feb 10, 2025

Yay, glad to hear it's working!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants