-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Lookahead map status #517
Comments
I have resumed the work on getting the lookahead in a mergeable state with upstream. First I successfully reproduced the results I have obtained a couple of weeks ago, than I have moved to use also the latest yosys master+wip and got improvements on placer run-time and quality of results as well. By further analyzing the router behaviour with debugging and profiler enabled I still see a non-trivial amount of connections that take way too long compared to others as well as several backtracking step due to incorrect lookahead costs retrieval. I need to collect more data to have a clearer view of what is happening. One behaviour that I have seen, which should not happen IMO, is that the placer produces a solution with a ~1.5 ns lower than the final CPD obtained after the routing step. |
By analyzing the routing debug information when running some of the symbiflow-arch-defs tests, I could see that the
One thing that I need to do to have a clearer view of what is happening is to have a version of VPR that implements both the connection box lookahead and the new lookahead map. This because I have tested the whole flow (packing, placement and routing) with two different VtR versions. In order to better compare the results, it is needed to have same .place and .net files in input to the routing step. |
After some debugging of the router behaviour, I have applied some fixes to the new lookahead map code. The fixes I have added are:
I have run the baselitex test, once with the current master+wip and the arch-defs-master (baseline) and the second time with the new lookahead VPR code (https://github.com/acomodi/vtr-verilog-to-routing/tree/new-lookahead-map-fixes) and the archdefs master with some fixes to have it running with the new lookahead map (https://github.com/antmicro/symbiflow-arch-defs/tree/new-lookahead-map-integration) (changes). Both the tests were run with the latest yosys master+wip which introduced a worsening in CPD and runtime as well. Baseline:
Router run-time: 252.15 seconds Changes:
Router run-time: 183.15 seconds To sum up, the tests performed show a reduce in run-time equal to 27.36% and an increase in CPD equal to 3.10% w.r.t. the baseline.
|
I think this is close enough that we really should work on getting the lookahead changes on upstream. I expect that the reduction is runtime could be converted into the same runtime with same or better CPD by lowing the astar factor. Overall, great work. Let's start to get your changes in upstream VPR. |
To be clear, the new lookahead is based on an extension of the upstream map lookahead, comparing against the downstream connection box lookahead? |
Indeed. Actually the new lookahead uses code from both upstream map and connection box, where the connection boxes are not used anymore. From upstream
From downstream connection boxes:
|
Okay, so the sampling method and Dijkstra expansion (for place delay) should be both go upstream, yes? The penalty cost stuff is something we need to handle separately. |
For my own clarity, all the changes you have that are not upstream only affect the generation of the lookahead files, not in their use? |
Djikstra expansion is actually related to the lookahead generation, when expanding the sample nodes to find the costs. The upstream lookahead uses a different way of expanding nodes which is less performant than the connection box expansion.
To be clear, the penalty cost stuff is related to assigning a cost to the lookahead map entry based on their distance from the origin node, here
The only change in the use of the lookahead file is an addition to get the CHAN -> IPIN delay cost as well as the OPIN -> CHAN. Apart from that, the ways the lookahead is used do not change. |
PR upstream open here: verilog-to-routing#1449 Additional note: the fixes resulted in an increase of the |
Update: I have encountered one issue though that regards QoR and runtime with the extended lookahead and symbiflow targets. The problem is that I am not able to reproduce the results reported in the above comment. |
Results differences I was able to identify the issue related to the differences in the results obtained recently with the one reported in the above comment The difference is due to the place delay matrix calculation. For some reason, while running the test that got me the results of the new lookahead, the delay matrix that was used was the one generated with the For what I have seen. the dijkstra delay matrix generation, instead, uses no lookahead to get the delays, and I think it makes sense as otherwise we would fall back to use the astar method to calculate the delays. Other issues:
During debugging, I also found a bug with the new lookahead code. The lookahead matrix is indexed as follows:
The problem was that CHAN type was always the same. This was resulting by the fact that I have erroneously got the rr_type from the target node instead of the source node. This bug got undetected as the node type was wrong both when generating the lookahead and when querying it to get the cost entry. By fixing this issue and having a disctinction between CHANX and CHANY of the source node, the router performs worse than the case in which CHANX and CHANY were merged in a unique entry.
The other thing I debugged was the aster method to get the delay matrix huge runtime. For instance, I get the following unroutable connection:
Clearly, there CLBs cannot be connected to the REFCLK of the IDELAYCTRL as that pin is within the clocking network and can be connected to only a clock tile. In the case of using the connection box lookahead, the excpected cost returned is equal to infinite. This happens here vtr-verilog-to-routing/vpr/src/route/connection_box_lookahead_map.cpp Lines 417 to 431 in 8980e46
I suppose that the for loop never gets executed and the lookahead results to be infinite, with the router exiting and returning an In the case of the new lookahead, instead, the expected cost does not result to be infinite and the cost entry appears to be valid. Therefore the router tries to find a path, taking a huge amount of time before understanding that the connection is unroutable. Under these newly discovered data, I am unsure how to proceed. |
@litghost I think that, with the new information I got and summarized above, there are still issues to be solved with the lookahead if we want to stop using the connection boxes. Currently, I am unsure of how to deal with the place_delay_matrix issue. |
The existing baseline usage of the connection box lookahead should be using Dijkstra for computing the place_delay_matrix, so we should leave that variable fixed. |
Ok, we should then enable that in archdefs, as it is not currently set. From what I have seen, this will likely result in a worsening of QoR and runtime |
It already is enabled, see https://github.com/SymbiFlow/symbiflow-arch-defs/blob/f0e7b4212544e1d55da776fb7a2ff79117e01454/xc/common/cmake/arch_define.cmake#L64
|
Actually I think we might have missed to upgrade also the device_define parameters, hence the behaviour I have seen:
|
Good catch, let me review the CI logs. |
You are right, but this was an oversight. I believe the cache args were overlooked when we incorporated the Dijkstra expansion upstream :(
|
I am currently running tests and trying to get better results out of the extended lookahead map. Even though the VTR version is the same for both To overcome this issue, and given that the packed netlist is the same in both the test sets, I could use the placer output of the Below there is a summary of these tests:
The run-time hit is still present, with the exception of the baselitex design, while the CPD is mostly similar in both the tests set. Regarding the VTR regression tests, even by fixing the CHANX/CHANY collapse issue, the some of the tests end up failing the QoR checks. |
Related issues/PRs
There have been efforts to push the lookahead used in this fork upstream. All the related work/information can be found in the following links:
verilog-to-routing#1325
verilog-to-routing#1351
verilog-to-routing#1367
Current status
The integration of the connection box based lookahead and the upstream lookahead has been partially done. To reproduce the results obtained so far, the following versions of the tools are needed:
The VTR branch is still under development as it is not yet 100% functional.
In fact, there still are two assertions that do fail and have been temporarily disabled to continue in the P&R flow.
They are probably due to some incorrect lookahead values for some delta_x/y and some segment types and/or channels.
The results obtained by using the above commits are the following for the base litex design:
The text was updated successfully, but these errors were encountered: