Reducing the high LUT utilization in StreamingMaxPool_hls #1264

Anuki16 · 2025-01-17T19:34:07Z

Anuki16
Jan 17, 2025

The following is mentioned in the FINN-R paper:

The BRAM and LUT requirements for the actual compute of the max pooling layers is very little. The block is basically implementing C parallel comparators, one for each channel whereby each sequentially compares two A-bit words holding onto the maximum of its pooling window. The total computational LUT costs are roughly equivalent to the product of A and C.

The paper also says that a sliding window unit is included in a max pool layer, but its LUT cost should be fairly small.

However, in the model I am working on, the LUT utilization of the StreamingMaxPool_hls layers is significantly larger than the above estimate. The layers use 4-bit data and the resource utilization after synthesis with PE = 1 is as follows.

Input shape (H, W, C)	Kernel size	A * C	Logic LUT usage
(32, 64, 56)	(2, 2)	224	1417
(16, 32, 112)	(2, 2)	448	2701
(8, 16, 216)	(2, 2)	864	5533
(4, 8, 296)	(4, 8)	1184	4205

In the first 3 layers, the actual utilization is almost 7x the A * C estimate. I'm not sure whether a sliding window unit is included in this or not, but still the utilization is fairly higher than what we can estimate using the paper. I'm guessing that it comes from other logic that are implemented with the parallel comparators.

Do you know whether there are any methods to reduce this LUT usage? A way to control the number of parallel comparators would be useful, but I didn't see such an option anywhere.

Answered by Anuki16

Feb 8, 2025

I found a solution to this. As far as I understand from the source code, StreamingMaxPool does not have a way to reduce folding below the number of channels. However, this layer can be replaced by a Pool layer preceded by a ConvolutionInputGenerator. These two layers support any folding factor.

To make this change, I modified the following part of step_convert_to_hw in the build dataflow steps:

model = model.transform(to_hw.InferConvInpGen())
model = model.transform(to_hw.InferStreamingMaxPool())

as follows.

model = model.transform(to_hw.InferConvInpGen())
model = model.transform(to_hw.InferPool())
model = model.transform(to_hw.InferConvInpGen())

View full answer

Anuki16 · 2025-02-08T12:24:44Z

Anuki16
Feb 8, 2025
Author

I found a solution to this. As far as I understand from the source code, StreamingMaxPool does not have a way to reduce folding below the number of channels. However, this layer can be replaced by a Pool layer preceded by a ConvolutionInputGenerator. These two layers support any folding factor.

To make this change, I modified the following part of step_convert_to_hw in the build dataflow steps:

model = model.transform(to_hw.InferConvInpGen())
model = model.transform(to_hw.InferStreamingMaxPool())

as follows.

model = model.transform(to_hw.InferConvInpGen())
model = model.transform(to_hw.InferPool())
model = model.transform(to_hw.InferConvInpGen())

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing the high LUT utilization in StreamingMaxPool_hls #1264

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Reducing the high LUT utilization in StreamingMaxPool_hls #1264

Anuki16 Jan 17, 2025

Replies: 1 comment

Anuki16 Feb 8, 2025 Author

Anuki16
Jan 17, 2025

Anuki16
Feb 8, 2025
Author