Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return of ondemand governor #6507

Merged
merged 5 commits into from
Apr 26, 2024
Merged

Conversation

lanefu
Copy link
Member

@lanefu lanefu commented Apr 21, 2024

Description

  • This reverts switch default cpu governor to schedutil #6120 after extensive feedback from @ThomasKaiser by bringing back ondemand as the default cpu scheduler
  • This also implements @ThomasKaiser's adjustments in armbian-hardware-optimize to better apply the needed optimizations to ondemand governor.
  • The changes rely on /etc/default/cpufrequtils as a config file, but not the deprecated service itself.

future considerations

Although prudent, the above should be considered band-aids as the following realities remain:

  • cpufrequtils is deprecated and a long-term solution is needed
  • armbian-hardware-optimize is an unmaintained dumpster fire that should be replaced with something config driven
  • performance customizations should be more intuitively driven by board and family configs.
  • armbian-config or its replacement also needs to be updated due to cpufrequtils deprecation.

@lanefu
Copy link
Member Author

lanefu commented Apr 23, 2024

🏳️ Other than take a sworn oath of maintainership for armbian-hardware-optimize and immediately demonstrate extreme levels of competence and commitment, there's nothing I can do to address the remaining concerns from @ThomasKaiser.

6a41aa9 points some arrows for others to possibly be inspired.

@EvilOlaf EvilOlaf added the Needs review Seeking for review label Apr 23, 2024
@lanefu lanefu force-pushed the BUGFIX/tk-ondemand branch from 6a41aa9 to 303e486 Compare April 24, 2024 21:59
@ThomasKaiser
Copy link
Contributor

https://github.com/ThomasKaiser/Knowledge/blob/master/articles/schedutil-and-EAS.md

@igorpecovnik
Copy link
Member

Any other changes planned or we are ready to merge?

@lanefu
Copy link
Member Author

lanefu commented Apr 26, 2024

Any other changes planned or we are ready to merge?

No changes planned. Ready

@igorpecovnik igorpecovnik added Ready to merge Reviewed, tested and ready for merge 05 Milestone: Second quarter release and removed Needs review Seeking for review labels Apr 26, 2024
@igorpecovnik igorpecovnik merged commit 6f33e4f into armbian:main Apr 26, 2024
@SteeManMI SteeManMI mentioned this pull request May 5, 2024
3 tasks
@SteeManMI
Copy link
Contributor

There is a report (https://forum.armbian.com/topic/30074-helios64-armbian-2308-bookworm-issues-solved/page/7/#comment-189993) of the setting of io_is_busy to 1 causing problems on the Helios64. So that user is disabling the armbian-hardware-optimization service.
The problem I see with how this was implemented is that there is no way to override the values being set for io_is_busy and the other parameters. I'm thinking we need to write these parameters to /etc/default/cpufrequtils and then load them from there once set. Therefore the end user can change these values if necessary by editing that file.

@lanefu
Copy link
Member Author

lanefu commented May 6, 2024

There is a report (https://forum.armbian.com/topic/30074-helios64-armbian-2308-bookworm-issues-solved/page/7/#comment-189993) of the setting of io_is_busy to 1 causing problems on the Helios64. So that user is disabling the armbian-hardware-optimization service.

Disabling the service is a reasonable solution given its unmaintained state.

The problem I see with how this was implemented is that there is no way to override the values being set for io_is_busy and the other parameters. I'm thinking we need to write these parameters to /etc/default/cpufrequtils and then load them from there once set. Therefore the end user can change these values if necessary by editing that file.

Refactoring the script to be more config driven would be ideal. If going that far, it might be worth changing the name of the config file to be something other than /etc/default/cpufrequtils to eliminate confusion about the old tool.

@ThomasKaiser
Copy link
Contributor

ThomasKaiser commented May 7, 2024

There is a report (https://forum.armbian.com/topic/30074-helios64-armbian-2308-bookworm-issues-solved/page/7/#comment-189993) of the setting of io_is_busy to 1 causing problems on the Helios64.

So still nobody of you guys tries to wrap their head around this.

1st nonsense quoting 'SteeMan': 'This is now included in armbian-hardware-optimization as of the 24.05 release of Armbian'. This is not 'now included' but this was there since I added it back in 2015 or even earlier (functionality moved into this script here back in 2017) and it was active all the time on any RK3399 device as long as Armbian shipped with ondemand default governor. So exactly nothing has changed other than defaulting to broken schedutil for a few months.

Thanks to your labeling of the old functionality being brand new now users come to the conclusion: 'The reason why I am disabling armbian-hardware-optimize is that I need a stable system and I want to avoid any random and untested changes on my system.'

Unbelievable.

As for 'The above cpufreqpolicy does not work if io_is_busy is set to 1 on Helios64 at least if the 2.5G interface is used (transmit queue timeouts)'. All this 'io_is_busy=1' thing is doing is ramping up clockspeeds in case there's higher %iowait percentage instead of remaining at low clockspeeds. If this (speeding up the CPUs) is causing 'transmit queue timeouts' then this sounds like a race condition that should be investigated. Or more likely 'testing went wrong' and user confused blaming 'random and untested changes'.

@ebin-dev
Copy link
Contributor

ebin-dev commented May 7, 2024

Here is what is going on:

The cpufreqpolicy sets a generic value for the sampling_rate of 200000 (this is how often the governor’s worker routine should run, in microseconds). The sampling_rate should be set to a nominal value close to the cpuinfo_transition_latency (49000 nanoseconds for rk3399) such that it is effectively about 1000 times as large (have a look at the linked document, section sampling_rate).

For rk3399 boards the sampling_rate of 50000 is very responsive and not too demanding for the cpu. In that case no problems occur if io_is_busy is set to 1 since a shitty generic default value does not prohibit proper ramping up clockspeeds!

So the 'transmit queue timeouts' were in fact caused by a wrong sampling rate (200000) - it should not be set to a generic value but to the cpuinfo_transition_latency per cpu.

for cpufreqpolicy in 0 4 ; do
echo 1 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/io_is_busy
echo 25 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/sampling_down_factor
echo 50000 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/sampling_rate
done

Edit: clarification of the issue ...

@ThomasKaiser
Copy link
Contributor

So the issue was in fact a wrong sampling rate (200000)

Which 'issue'? 'Transmit queue timeouts'?

it should not be set to a generic value but to the cpuinfo_transition_latency per cpu.

Well known, see especially the first link to the forum: 52bef7d#diff-2c19c701b3de842a09c7974bbec333fc9e605b18d05c0ca6be939de0c918d7be

Zero feedback happened and I left the project at around that time. Since then nobody seems to care any more.

@ThomasKaiser
Copy link
Contributor

So the 'transmit queue timeouts' were in fact caused by a wrong sampling rate (200000)

How? This is cpufreq stuff. Isn't it more likely that $something happened which has then been quickly attributed to the 'random and untested changes' in Armbian (that are nothing new but exactly the same for over half a decade now)?

Does a simple switch between 50000 and 200000 (which should only slightly increase CPU load and nothing else) w/o any other modifications generate 'transmit queue timeouts'? I don't think so.

@ebin-dev
Copy link
Contributor

ebin-dev commented May 7, 2024

@ThomasKaiser
"Does a simple switch between 50000 and 200000 (which should only slightly increase CPU load and nothing else) w/o any other modifications generate 'transmit queue timeouts'? I don't think so."

What kind of substantial argument is that ? With a sampling_rate of 50000, cpufreq ondemand looks at the current load every 50 ms instead of every 200 ms. By changing this single parameter back and forth you receive 'transmit queue timeouts' (200ms) or alternatively everything runs smoothly (50ms).
Edit: This was tested various times with linux 6.6.29 and 6.6.30 and the result was without any doubt.

I think that the timeouts may be caused by buffers getting too populated before the core frequency is increased. But anyway, the important fact is that with this simple change it works. It even delivers the expected 280MB/s to a MBP (via netatalk).

Edit: Have a look at the explanation of the tunable sampling_rate in the document mentioned above (it is from 2017).

@ThomasKaiser
Copy link
Contributor

What kind of substantial argument is that ?

I was just asking whether you properly retested. Since according to the forums you tried to hunt down bugs introduced by a 'change recently in Armbian' which was none since exactly nothing has changed wrt Armbian and ondemand governor since July 2018 and now. First you identified io_is_busy and now sampling_rate set to a value resulting in slower cpufreq adoptions should be the culprit.

I was asking for a proper retest since this is something I personally experience quite often: some problem occurs, latest change I remember is blamed for the change, quick test 'confirms that' just to realize later that it was something else.

Again: the Armbian ondemand settings are unchanged for almost six years now and you seem to be the first to report a problem with them (don't really know because I don't follow the forum any more since it has been more or less destroyed by the Blockwart)?

Other than that if you think the sampling_rate should be coupled in some way to cpuinfo_transition_latency (hint) why not coming up with a modified script, a test setup, and then try to check it on maybe 5 different SoCs of different performance grade (Allwinner A20, Phytium, Snapdragon) and most importantly on SoCs with only a single cluster but also big.LITTLE as well (since sysfs nodes differ)?

@ebin-dev
Copy link
Contributor

ebin-dev commented May 8, 2024

@ThomasKaiser

I was just asking whether you properly retested. Since according to the forums you tried to hunt down bugs introduced by a 'change recently in Armbian' which was none since exactly nothing has changed wrt Armbian and ondemand governor since July 2018 and now.

Nobody said that the issues observed are due to changes recently introduced into armbian-hardware-optimization. The issues existed before. But changing governors, deleting performance critical code (for a good reason) and reintroducing it obviously without testing upsets me. This triggered me to disable armbian-hardware-optimization and to having a closer look at it in order to determine the essential bits to take over. Thereby I observed that a more responsive cpufreq/ondemand sampling_rate made the current kernels 6.6.29 and 6.6.30 usable on Helios64, resolved the timeout issues (2.5G interface specific) and made the entire OS more responsive to server tasks (some more details below).

The current cpufreqpolicy sets one value (200000) for the cpufreq ondemand sampling_rate for all clusters for all processors. That can't be right. To set it for each cluster and each processor separately is a no-brainer.
However the default values configured for the cpuinfo_transition_latency have to be correct (that is the case i.e. for Helios64 but unlikely for the Rock-5B (kernel 5.10.160 outputs: 'cpuinfo_transition_latency:324000').

I have tested that again on Helios64 and to just copy the cpuinfo_transition_latency to the sampling_rate per cluster (without any further scaling) gives the best result.

To replace armbian-hardware-optimize for the Helios64 I just need to run the following code (currently in /etc/rc.local)
(the kernel defaults are good otherwise)

`for cpufreqpolicy in 0 4 ; do
echo 1 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/io_is_busy
echo 25 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/sampling_down_factor
echo$(cat /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/cpuinfo_transition_latency) > /sys/devices/system/cpu/cpufreq/policy${cpufreqpolicy}/ondemand/sampling_rate
done

for i in $(awk -F":" "/xhci/ {print $1}" < /proc/interrupts | sed 's/\ //g'); do
echo 20 > /proc/irq/$i/smp_affinity
done
for i in $(awk -F":" "/ahci/ {print $1}" < /proc/interrupts | sed 's/\ //g'); do
echo 10 > /proc/irq/$i/smp_affinity
done`

Sampling rates are set to the following values on Helios64:
cat /sys/devices/system/cpu/cpufreq/policy*/ondemand/sampling_rate
40000
49000

@PeterFalken
Copy link
Contributor

@ebin-dev , @ThomasKaiser - will there be an update the sampling_rate value? Will it be set by CPU cluster (in the case of big.LITTLE SOCs)?

@ebin-dev
Copy link
Contributor

ebin-dev commented May 21, 2024

Current kernel 6.9.0(-collabora-rockchip-rk3588) outputs the following values for the cpuinfo_transition_latency per cpu cluster on a rock 5b:

cat /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_transition_latency
62000
182000
182000

However, there is only one tunable /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate available with the following default value:

cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
6666

So tuning the sampling_rate per cluster would not appear possible on the current mainline kernel anymore (and not necessary). Please correct me if I am wrong.

Still the following settings would make some sense:

cd /sys/devices/system/cpu/cpufreq/ondemand
echo 1 > io_is_busy
echo 25 > up_threshold
echo 10 > sampling_down_factor

@ThomasKaiser
Copy link
Contributor

With RK's 6.1 BSP kernel it's as follows:

root@rock-5c:~# cat /sys/devices/system/cpu/cpufreq/policy?/cpuinfo_transition_latency
84000
324000
324000
root@rock-5c:~# cat /sys/devices/system/cpu/cpufreq/policy?/ondemand/sampling_rate
10000
10000
10000

Not surprising for anyone who followed the references to Armbian forum from 6 years ago since stuff like this happens from time to time and after upgrading your kernel some 'nice algorithm' relying on kernel defaults doesn't work any more (which is the reason I set sampling_rate to 200000 back then). And then I lost interest or simply joined the 'nobody gives a sh*t about important stuff on ARM' team.

Talking about this I reached out to Rockchip asking them how they determined their DT properties needed to let schedutil/EAS work properly. First answer documented here. Since I was away for a few weeks from those ARM thingies there's a bit more to explore but right now the simple conclusion is: nobody gives a sh*t.

@ebin-dev
Copy link
Contributor

ebin-dev commented Jun 23, 2024

Just a little update: the sampling_rate can be set per cluster in linux 6.9.6 again in exactly the same way as before:

uname -r
6.9.6-edge-rockchip64

cd /sys/devices/system/cpu/cpufreq
for cpufreqpolicy in 0 4 ; do
    echo 1 > policy${cpufreqpolicy}/ondemand/io_is_busy
    echo 25 > policy${cpufreqpolicy}/ondemand/up_threshold
    echo 10 > policy${cpufreqpolicy}/ondemand/sampling_down_factor
    echo $(cat policy${cpufreqpolicy}/cpuinfo_transition_latency) > policy${cpufreqpolicy}/ondemand/sampling_rate
done

cat policy?/ondemand/sampling_rate
40000
51000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
05 Milestone: Second quarter release Hardware Hardware related like kernel, U-Boot, ... Ready to merge Reviewed, tested and ready for merge size/large PR with 250 lines or more
Development

Successfully merging this pull request may close these issues.

8 participants