-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate signal strength to all outputs #129
Conversation
Wow wouldn't have thought that such huge performance improvements were still possible. Confirmed similar performance increases (37-40% on my windows intel i7-6700HQ machine). |
Btw on line 272 of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I'm happy to merge this after a quick cargo fmt
.
@@ -565,33 +599,26 @@ fn schedule_tick( | |||
scheduler.schedule_tick(node_id, delay, priority); | |||
} | |||
|
|||
fn link_strength(link: DirectLink, nodes: &Nodes) -> u8 { | |||
nodes[link.to].output_power.saturating_sub(link.weight) | |||
const INPUT_MASK: u128 = u128::from_ne_bytes([0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe a better name being BOOL_INPUT_MASK
?
Better to be specific since it's module scoped.
alright I'l do that there is also one other minor fix I will add shortly |
It's normally good practice not to, but I'll allow it in this case. Eventually I'd like to have CI fail if code isn't formatted properly so that it stops seeping into the repo. |
alright that should be it then I also changed ForwardLink::new and tested it with chungus2 and all is fine and well (or at least if you ignore the known issue with chungus2). |
Maybe just putting a |
Like in the ClampWeights pass? |
Please rebase and I'll merge |
You're right, completly forgot about that pass. In that case I think making the ClampWeights pass mandatory (or just merging it into the InputSearch pass) and keeping the assert would be a good idea, not that it will make much of a difference. |
I don't really know how rebase works so I merged it instead. |
Your previous merge commit was empty, there are still conflicts with master. |
f995fba
to
5f9d510
Compare
I do not understand what exactly went wrong, but I got the rebase to work, turnsout I just had to force push to make it work. |
This allows input states to be checked with simd because the input nodes don't need to be fetched
when creating a forward link
This isn't really neccesary right now but if we ever add optimizations which combine nodes it might become likely that a node will have more than 255 inputs.
56a75a0
to
9c9e22f
Compare
* decrease size of DirectLink to 4 bytes * duplicate signal strength to all outputs This allows input states to be checked with simd because the input nodes don't need to be fetched * cleanup old code * remove erroring line * make index of NodeId private again * Make TickScheduler::NUM_QUEUES constant * remove from_ne_bytes from last_index_positive * only get needed repeater inputs * replace magic number with NUM_QUEUES * don't crash the program when ss is larger than 15 when creating a forward link * fix minor typo * rename INPUT_MASK to BOOL_INPUT_MASK * Run cargo fmt * address nits * Add check for maximum inputs. This isn't really neccesary right now but if we ever add optimizations which combine nodes it might become likely that a node will have more than 255 inputs. * make poper use of input counters in from_compile_node * Make clamp_weights pass mandatory * Use get_unchecked_mut() for incrementing signal strength counters * skip updating if output power did not change * wrap input arrays into a struct to ensure they are aligned * run cargo fmt
We duplicate the signal strength of a component to all outputs, so we can get the input signal strength without having to fetch the inputs each time. Additionally by having separate buckets for each signal strength the combined input strength is calculated with just a few u128 instructions instead of having to loop over all the input.
This results in the first tenth of the chungus benchmark to only take ~7.6 on my machine while without this change it takes ~9.9 seconds, so ~30% faster.
Tested on: Ubuntu 22.04.3 LTS, intel core I7-10750H