Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating Point Square Root Module with Variable Odd Mantissa and No Rounding #188

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lucasphillips
Copy link

Description & Motivation

Sometimes we want to be able to do square root for floating points.
This is especially true for ML hardware.
This PR addresses that by adding a Floating Point Square Root module and extending
Fixed Point to have multiplication

Related Issue(s)

#171

Testing

Test cases were created for the Fixed Point Square Root and Floating Point.

Fixed Square Root tested:

  1. Tested case of 1.0
  2. Tested case of 1.5
  3. Tested case of 1.7
  4. Tested case of 1.125
  5. Tested case of 2.25
  6. Tested case of 3.999
  7. Tested case of 3.999998

Floating Point Tested:

  1. Non-normalized numbers
  2. Error conditions (negative numbers and negative infinity)
  3. Targeted normalized numbers
  4. Randomized 4Ki values within rounding error

Backwards-compatibility

Is this a breaking change that will not be backwards-compatible? If yes, how so?

No.

Documentation

Does the change require any updates to documentation? If so, where? Are they included?

Yes. This required changes to the Floating Point and Fixed Point docs within the HCL library. They are included within this change

Add a base class for floating point square root.

Implement a basic floating point square root module. Mantissa is
variable, but limited to odd numbers from 1 to 51. No rounding
implemented.

Implement a fixed point square root for values with three integral bits
and up to 51 fractional bits.

Add a fixed point square root method.

Add tests for fixed-point and floating-point square root.

Implements intel#171

Co-authored-by: James Farwell <[email protected]>
Co-authored-by: Curtis Anderson <[email protected]>
Signed-off-by: Stephen Weeks <[email protected]>
Signed-off-by: James Farwell <[email protected]>
Signed-off-by: Curtis Anderson <[email protected]>
@mkorbel1 mkorbel1 requested review from desmonddak and mkorbel1 March 7, 2025 23:49
@mkorbel1 mkorbel1 linked an issue Mar 7, 2025 that may be closed by this pull request
/// Abstract base class
abstract class FixedPointSqrtBase extends Module {
/// Width of the input and output fields.
final int numWidth;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just 'width'?

final expError = Const(0);

expect(compResult.floatingPointValue.withinRounding(expResult), true,
reason: '\t${fp.floatingPointValue} '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an opportunity to use

reason:   '''
${fp.floatingPointValue} 
<avoiding having to inject newlines into a string>
'''
);


final sqrtDUT = FloatingPointSqrtSimple(fp);
final rand = Random(513);
for (var i = 0; i < 4096; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you compute the limit (4096) from eWidth and mWidth?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly? Not 100% the question here. I wanted to choose something reasonably large since we do end up discarding negative numbers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking ahead to a variable width component for narrow and wider sqrts -- it would be great if the testing framework scaled.

// An abstract base class defining the API for floating-point square root.
//
// 2025 March 5
// Authors: James Farwell <[email protected]>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Align your names to get top billing -- like to see your names clearly.

: super(
definitionName: 'FloatingPointSquareRootSimple_'
'E${a.exponent.width}M${a.mantissa.width}') {
final outputSqrt = FloatingPoint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed the 'normal' tests: are subnormals supported? If not, you should check and throw.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I forgot that! I'll fix that this weekend though I may need pointers on how to do that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have an error flag, so you would just need to assign it to ~fp.isNormal();


// use fixed sqrt unit
final aFixed = FixedPoint(signed: false, m: 3, n: a.mantissa.width);
aFixed <= [Const(1, width: 3), a.mantissa.getRange(0)].swizzle();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a chance, in a test you can check out your naming of Verilog upon output. You can create good names by attaching .named('aFixed'); especially after swizzles.

File('name.sv').writeAsStringSync(mod.generateSynth());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if that was "too many names" when writing this, so tried to keep it to named structures.

Copy link
Contributor

@mkorbel1 mkorbel1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome work! Got me thinking about expanding for multi-cycle or pipelined versions :)

// Weeks <[email protected]>

/// An abstract API for fixed point square root.
library;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this library keyword here has no effect and is unnecessary? Or is there a reason?

import 'package:rohd/rohd.dart';
import 'package:rohd_hcl/rohd_hcl.dart';

/// Abstract base class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these doc comments go into API docs, in general it's best to describe more about what the class does rather than why the class is here in the context of the file. i.e. this is not just an "abstract base class", but a base class for fixed-point square root

some tips: https://dart.dev/effective-dart/documentation

late final FixedPoint a;

/// getter for the computed output.
late final FixedPoint sqrtF = a.clone(name: 'sqrtF')..gets(output('sqrtF'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why the name "sqrtF" rather than just "sqrt"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this answers your question below! We did originally have both do sqrt. However, dart was having naming conflicts with math.sqrt() that we needed to run our tests. And honestly, sqrt didn't seem great after that. So I opted for "sqrtF" for sqrt fixed and "sqrtR" which is sqrt result. Though that one should probably be "sqrtFP".

late final FpType a;

/// getter for the computed [FloatingPoint] output.
late final FloatingPoint sqrtR = (a.clone(name: 'sqrtR') as FpType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what's the "R" mean? why not "sqrt"?

late final FloatingPoint sqrtR = (a.clone(name: 'sqrtR') as FpType)
..gets(output('sqrtR'));

/// getter for the [error] output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this setup here with error and errorSig. Since the error is already a single-bit Logic, there's no need for this style with an internal and external version of the signal, like there is with the "casting" of LogicStructures. Unless I'm misunderstanding, I think you can just do a single late final Logic error = output('error'); and use it for both internal and external purposes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally was planning to use the internal error signal to also be used for logic testing throughout when we had our first rough draft of the block diagram. I think I just missed this.

final int mantissaWidth;

/// The [clk] : if a non-null clock signal is passed in, a pipestage is added
/// to the square root to help optimize frequency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the clk and reset used anywhere? No sequentials in here (yet?)? Did you mean to use it or is this plumbed for future upgrades?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make future work easier. In most cases, like all ALUs, you get your best bang for buck when it is pipelined. I wasn't sure how to do that yet, but I wanted to have the ground work. It also keeps it consistent with the other FloatingPoint modules that are also capable of pipeline features. So setup for future work to make it a bit easier (and once I figure out how to do it in ROHD).

@@ -123,6 +123,13 @@ class FixedPoint extends Logic {
}
}

/// Multiply
Logic fpMultiply(dynamic other) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can either move this directly into the implementation of operator *, or at least make this private. If you keep it as a separate function, why not just _multiply since it's already applicable to FixedPoint?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I will probably need some Dart education on. I like how we can use .eq() and .lte() methods or the override. That could just be my preference so not married to it. Would like to set it private though.

Logic fpMultiply(dynamic other) {
_verifyCompatible(other);
final product = Multiply(this, other).out;
return FixedPoint.of(product, signed: false, m: product.width - n, n: n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since Multiply has the same output width as inputs, product.width - n should be the same as m right? Or is it 1 bigger due to sign bit (if signed)?

Something to consider: do we want FixedPoint to widen or keep the same width when multiplied by another FixedPoint? To keep with API consistency, it seems reasonable to keep the same width, since FixedPoint is a type of Logic. But do we need a way to appropriately "expand" a FixedPoint, sort of like a signExtend but that makes it the "right" shape (m and n) for an output of a multiplication?

This is probably beyond the scope of this PR, but since this change is here and you're the first user of this new *, thought I'd ask for thoughts :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I spent a lot of time on this last night. You do want your Fixed point to grow, or you loose A LOT of precision. Specifically, when we multiply to N values we end up with 2*N - 1 result. Because our n doesn't move, that means our m grows by the difference of the output width and n: m = total_width - n. So you could sign extend, but you will lose data. If you look at most multipliers in general your product usually grows with it. You can than leave it to the user to truncate how they will.

I'm hesitant to force it to the same size as the incoming products because you just loose so much information. Instead of opting to give precise precision to the user to then truncate as they want (or make an override to do this?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did my math here wrong (darn being out of practice for so long). It is still 2*N-1 but the results are 2*m and 2*n. So yes, we either need to sign extend or not. Going to push a fix for that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for example, if we do 3.5 * 1.25 we expect 4.375. These numbers are (11.10 *01.01) = 0100.0110. If we truncate that from the Q4.4 format back to Q2.2 format we get 00.01 which is 0.5 and inaccurate. So I DO need to fix this to be m: 2*m, n: 2*n but I think we should keep it widened. (Never not double your check after coding late at night so I'm glad you called this out).

tearDown(() async {
await Simulator.reset();
});
test('sqrt(1)', () async {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: since most of these tests look very similar, you could consider making a list of "test vectors" and then generate all these tests, to avoid copy-paste

final fpvExpected = FixedPointValue.ofDouble(sqrt(3.9999998),
signed: fixed.signed, m: fixed.m, n: fixed.n);
expect(fpvResult, fpvExpected);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: how did you arrive at these numbers for the tests? are there any special corner cases to consider? what if there's a negative input?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Floating point square root
4 participants