Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use u32 for log array indexing and length #42

Open
spl opened this issue Mar 6, 2020 · 2 comments
Open

Use u32 for log array indexing and length #42

spl opened this issue Mar 6, 2020 · 2 comments

Comments

@spl
Copy link
Contributor

spl commented Mar 6, 2020

The log array buffer format uses a u32 in the control word to determine the length (number of elements) in the log array. But this is exposed as a usize publicly for indexing and getting the length.

I'm guessing you have no plans to develop terminus-store for an architecture in which std::mem::size_of::<usize>() < 4 (i.e. the bus width is < 32 bits), so you won't have a problem with usize being too small. But maybe another problem will appear due to casting or running on a 32-bit platform.

It seems like it would be better to more accurately represent the index and length types. For one thing, given that a log array is read from a disk and (I think) should be portable across systems, it will never have a length determined by a particular system's architecture.

Unfortunately, changing usize to u32 in the appropriate places in logarray.rs introduces a cascade of errors around the code. I haven't looked into these. Maybe a more thorough audit of of data structure lengths and indexing is needed.

This is possibly a minor issue, but I just wanted to make a note of it before I moved on and forgot about it.

spl added a commit to spl/terminusdb-store that referenced this issue Mar 6, 2020
The log array expects a 32-bit or greater architecture. This is just a
compiler check to confirm that.

Related to terminusdb#42.
@matko
Copy link
Member

matko commented Mar 14, 2020

Right, I have no intention of supporting smaller architectures.

Much of rust's api dealing with slices or offsets or indexes expect an usize. Exposing len as an usize is therefore convenient. The alternative would be to cast whatever comes out of len in such situations. The problem with smaller architectures doesn't go away by moving the cast to the caller. Making logarray work on 16 bit would require some more effort. So I'm in favor of keeping this a usize.

Your assert is a good idea and should let us get away with this. That said, does rust even support small architectures?

@spl
Copy link
Contributor Author

spl commented Mar 14, 2020

I agree with not supporting anything less than 32-bit. Yes, I believe Rust does support (or would not be averse to supporting) some architectures with smaller bus sizes. There's a lot of movement toward making it work well in embedded systems.

Having looked more at the data structures used in terminus-store, I think they are definitely optimized for 64-bit architectures. I think this is the right way to go. I'm rather concerned surprising things might happen on a 32-bit architecture. I suppose there are too options:

  1. Attempt to support 32-bit and 64-bit. This should involve added 32-bit to CI.
  2. Forget about 32-bit for now and focus on 64-bit. If needed, come back to 32-bit later.

I'm guessing 32-bit would be a rather minor thing to focus on, and so option 2 is probably the better way to go. If that's the case, just in case anybody who happened to try running terminus-store on a 32-bit machine, I think we should add a static check similar to #43. I would put it in bitarray.rs, since that's an obvious case where you have a 64-bit length that is cast to usize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants