-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion error when using posit2 #359
Comments
@rkriemann Yes, that is to be expected: posit2 is still in development and not functional. It is to replace posit when it is done, and it was never intended to be 'native' number system. I needed a non-clashing name to coexist in the same tree. posit2 is a multi-limb implementation so should be a lot faster than the bit-level implementation of the current posit class. All the other number systems in Universal are now limb-based, but posits are showing their age as it was the first number system implemented way back in 2017. In the posit tree, we used specializations to provide fast implementations of the standard posits so that they were useful in actual application codes. But that left the non-standard posit performance two orders of magnitude slower, and that is what the new limb-based implementation is trying to fix. I don't have any time to complete the limb-based posit implementation, and I am looking for somebody interested to complete the work. Would you be interested to help out? |
I thought so but had the hope that it at least works reasonably well for representing floating point numbers which is all I need (for now). In fact, I'm interested in the optimal storage of arrays of floats in a given precision, i.e., bitsize not necessarily a multiple of 8. Right now I have an IEEE754 derived scheme but was also experimenting with posits.
Time is scarce also on my side but I may be inclined to look into the conversion part and make sure that the resulting bits are identical to the standard posit implementation (assuming that makes sense). However, I can not guarantee anything :-( . RGDS |
the It is sad to say but the posit is the only number system that isn't optimized to be limb-based, all the other types are. The community has been asking for types like FP8, FP16, TensorFloat, and BF16 as well as the more advanced lns and dbns but nobody has been asking for posits. |
|
@rkriemann if you have a reference to compare I'd love to hear if you can leverage I was planning to add a fast |
I implemented a storage scheme based on cfloats and compared it to the closest existing implementation (aflp). Source code can be found at https://gitlab.mis.mpg.de/rok/hlr/-/blob/master/include/hlr/utils/detail/cfloat.hh As for the application (hierarchical matrices): a given dense matrix is partitioned into many blocks (of different sizes) and for most blocks a lowrank approximation is performed and the dense data replaced by the approximation, thereby already introducing an error. The data in all blocks is then represented (independently) via cfloat/aflp with (minimal) precision bits chosen such that the overall error is not increased. The mantissa bits are then increased such that the total number of bits is a multiple of 8 for byte aligned storage. The result is a data representation with a very small memory footprint still permitting full matrix arithmetic (arithmetic is still done in FP64!). The implementation in both cases is very similar: first determine the dynamic range for the exponent bits, then scale data to fit into chosen exponent and finally write truncated results into memory. I picked a standard model problem with a matrix size of 524.288 x 524.288 with different errors (overall uncompressed memory is between 7.5GB and 20GB). Hardware is a 2-CPU AMD Epyc 9554. Compiler is GCC 12.1 (full optimization activated). Timings are median of 10 runs. The compression ratio in both cases is identical as the chosen precision/exponent bits are equal. Resultseps defines the error. double -> aflp/cfloat speed in sec.
cfloat is about two times slower compared to aflp. I did not optimize aflp too much but tried to arrange everything such that the compiler is able to auto vectorize. Maybe this explains the difference. aflp/cfloat -> double speed in sec.
Decompression speed should be similar (or smaller) compared to compression speed. However, cfloat is much slower here. This seems strange. overall error vs uncompressed data
Only minor differences in the error (as was expected). |
Hi,
I'm experimenting with the posit2 implementation in an approximate storage setup (no arithmetic needed) and the error when converting from double/float seems to be way off.
Example:
with output
I'm using universal v3.72.1.e6ef6d76 with g++ v13.2
RGDS
Ronald
The text was updated successfully, but these errors were encountered: