-
The following is mentioned in the FINN-R paper:
The paper also says that a sliding window unit is included in a max pool layer, but its LUT cost should be fairly small. However, in the model I am working on, the LUT utilization of the StreamingMaxPool_hls layers is significantly larger than the above estimate. The layers use 4-bit data and the resource utilization after synthesis with PE = 1 is as follows.
In the first 3 layers, the actual utilization is almost 7x the A * C estimate. I'm not sure whether a sliding window unit is included in this or not, but still the utilization is fairly higher than what we can estimate using the paper. I'm guessing that it comes from other logic that are implemented with the parallel comparators. Do you know whether there are any methods to reduce this LUT usage? A way to control the number of parallel comparators would be useful, but I didn't see such an option anywhere. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I found a solution to this. As far as I understand from the source code, StreamingMaxPool does not have a way to reduce folding below the number of channels. However, this layer can be replaced by a Pool layer preceded by a ConvolutionInputGenerator. These two layers support any folding factor. To make this change, I modified the following part of
as follows.
|
Beta Was this translation helpful? Give feedback.
I found a solution to this. As far as I understand from the source code, StreamingMaxPool does not have a way to reduce folding below the number of channels. However, this layer can be replaced by a Pool layer preceded by a ConvolutionInputGenerator. These two layers support any folding factor.
To make this change, I modified the following part of
step_convert_to_hw
in the build dataflow steps:as follows.