-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#15889: Fix handling of mantissa rounding to respect ties round to even #16997
base: main
Are you sure you want to change the base?
Conversation
ad64313
to
842747f
Compare
float float_input; | ||
uint32_t expected_mantissa; | ||
float expected_float_output; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please init
// Set shared exponent as original float exponent (ie. skip logic for handling shared exponents) | ||
auto shared_exp = uint32_input >> 23 & 0xFF; | ||
|
||
auto output_mantissa = convert_u32_to_bfp<tt::DataFormat::Bfp8_b, false>(uint32_input, shared_exp, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what false
means here, in both cases?
auto output_mantissa = convert_u32_to_bfp<tt::DataFormat::Bfp8_b, false>(uint32_input, shared_exp, false); | ||
EXPECT_EQ(output_mantissa, expected_mantissa); | ||
|
||
uint32_t uint32_output = convert_bfp_to_u32(tt::DataFormat::Bfp8_b, output_mantissa, shared_exp, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder why one function is templated and the other accepts bfp type as argument
|
||
void roundtrip_test_for_mantissa_rounding_with_bfp8( | ||
float float_input, uint8_t expected_mantissa, float expected_float_output) { | ||
auto uint32_input = *reinterpret_cast<const uint32_t*>(&float_input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we allow conversion functions to accept/return or float or uint32t?
- Add tt_metal gtest to test convert_u32_to_bfp rounding logic for bfp8 - Remove unused functions in tt_metal/api/tt-metalium/bfloat8.hpp
842747f
to
6828830
Compare
Ticket
Link to Github Issue
Problem description
See issue for more context and examples. When we handle rounding of mantissa when going from float to block float, we always round up, which doesn't folow the "ties round to even" rule. This PR fixes that and adds lower c++ tests to test the different cases.
What's changed
Checklist