-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nextpnr-himbaechel placement failes on small design #1425
Comments
Because the GW1NR-9 has 45 columns (without 2x for IO), each cell accommodates 6 ALUs, so one row fits ~270 ALUs. |
It fails even if --freq is set to a low value.
Is a total failure more rational than a successful completion with higher
propagation delay?
EDIT: The Gowin IDE has no problem handling this successfully.
|
It has nothing to do with frequency - we just don't make ALU chains longer than fits in a row in the chip. |
FYI, the Gowin IDE is able to fit with N=1000 (I didn't try a higher
number), so there must be a way to fit it.
…On Sat, Jan 18, 2025 at 9:44 PM YRabbit ***@***.***> wrote:
It has nothing to do with frequency - we just don't make ALU chains longer
than fits in a row in the chip.
—
Reply to this email directly, view it on GitHub
<#1425 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQVMQI26VYSWCCHUW5AHTD2LM3T5AVCNFSM6AAAAABVOJGAPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBQGYZTMNRTGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
All other architectures will do this if they have to. |
Not necessarily all the way to the beginning of another row ; only to the leftmost part of the next row such that this non-carry, slow wire is shortest. Only if the chain is longer than 2 full rows should this be routed up to the beginning of the next row. (such hard-coded decision of splitting long chains is the obvious starting point, bu ultimately there will be conflicts with other long chains to be tightly placed in a small device... tasks for the future) Another way of solving the issue consists in changing the RTL code to adapt to the targeted device, such that there are only chains short enough in the design. This is, for the cases where logic synthesis does not it automatically, or for reliable portability of source code. |
A couple of layman questions:
1. Why use a chain (linear time) and not a tree (logarithmic time)?
2. Why does each element in the chain process only 1 input and not let's
say 3 (using a LUT)?
…On Sun, Jan 19, 2025 at 7:03 AM Adrien Prost-Boucle < ***@***.***> wrote:
Dragging the carry from the right end to the beginning of the row on the
left edge with regular wires seems irrational to me.
Not necessarily all the way to the beginning of another row ; only to the
leftmost part of the next row such that this non-carry, slow wire is
shortest.
Only if the chain is longer than 2 full rows should this be routed up to
the beginning of the next row.
(such hard-coded decision of splitting long chains is the obvious starting
point, bu ultimately there will be conflicts with other long chains to be
tightly placed in a small device... tasks for the future)
Another way of solving the issue consists in changing the RTL code to
adapt to the targeted device, such that there are only chains short enough
in the design. This is, for the cases where logic synthesis does not it
automatically, or for reliable portability of source code.
—
Reply to this email directly, view it on GitHub
<#1425 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQVMQNQPJ4K5EAV2EA2T5L2LO5FVAVCNFSM6AAAAABVOJGAPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBQHA4TOMRWHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
A ripple-carry adder is much simpler than carry-lookahead adder and the performance is still mostly sufficient, so a lot of digital logic devices implement hardware support for the RCA. A single 4-LUT can process addition of one position since it requires three inputs (A, B, Ci); output O is provided by the LUT and output Co is provided by the hardware carry block. |
This is a case of the hidden factor of big-O mattering. Because the ripple-carry has dedicated logic and routing between LUTs, carry-in to carry-out propagation time is about a tenth the time of logic propagating through a LUT (not even including routing delay, which is probably double the LUT propagation delay at minimum). That means your adder needs to be huge for the tree to pay for itself. |
@whitequark, if my math is correct, each 4 inputs LUT contributes in average about 3 inputs. The table is for a tree but should be about the same for 4 inputs LUT in a linear chain. |
Averages mean nothing; you should look into how adder synthesis works. |
Why adder? The operation here is not sum but a wide exlusive-or.
…On Sun, Jan 19, 2025 at 11:19 AM Catherine ***@***.***> wrote:
Averages mean nothing; you should look into how adder synthesis works.
—
Reply to this email directly, view it on GitHub
<#1425 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQVMQO7W7FOJQDHFKS6YO32LP3F5AVCNFSM6AAAAABVOJGAPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBQHE4DQMJXGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
This is a 260-bit adder. |
I see.
I assumed that the difficulty here is due to the wide exclusive-or
operation rather than the counter but you are right, there is also a wide
addition. I will try with a wide counter alone and see what happens.
…On Sun, Jan 19, 2025 at 11:57 AM Catherine ***@***.***> wrote:
counter <= counter + 1;
This is a 260-bit adder.
—
Reply to this email directly, view it on GitHub
<#1425 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQVMQIIZPCNCRUDIKKJH332LP7RJAVCNFSM6AAAAABVOJGAPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBRGAYDAMBZGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I'm fairly sure the wide XOR does get synthesized as a tree. At least, abc should do that. |
Consider the design below, it includes N flip flops and a parity tree with N inputs and one output. When running with N=260, the build completes successfully and the pnr reports very low utilization
However, when increasing N from 260 to 270, the placement fails with this error:
The sample design used:
parity.v
parity.cst
Software version used (on macosx arm 64)
The text was updated successfully, but these errors were encountered: