From 0f6966490981be103a3df0779b35cf5725b29b45 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 6 Nov 2024 15:36:23 +0100 Subject: [PATCH] ath9k: Add RX inactivity detection and reset chip when it occurs Some ath9k chips can, seemingly at random, end up in a state which can be described as "deaf". No or nearly no interrupts are generated anymore for incoming packets. Existing links either break down after a while and new links will not be established. The circumstances leading to this "deafness" is still unclear, but some particular chips (especially 2-stream 11n SoCs, but also others) can go 'deaf' when running AP or mesh (or both) after some time. It's probably a hardware issue, and doing a channel scan to trigger a chip reset (which one normally can't do on an AP interface) recovers the hardware. The only way the driver can detect this state, is by detecting if there has been no RX activity for a while. In this case we can proactively reset the chip (which only takes a small number of milliseconds, so shouldn't interrupt things too much if it has been idle for several seconds), which functions as a workaround. OpenWrt, and various derivatives, have been carrying versions of this workaround for years, that were never upstreamed. One version[0], written by Felix Fietkau, used a simple counter and only reset if there was precisely zero RX activity for a long period of time. This had the problem that in some cases a small number of interrupts would appear even if the device was otherwise not responsive. For this reason, another version[1], written by Simon Wunderlich and Sven Eckelmann, used a time-based approach to calculate the average number of RX interrupts over a longer (four-second) interval, and reset the chip when seeing less than one interrupt per second over this period. However, that version relied on debugfs counters to keep track of the number of interrupts, which means it didn't work at all if debugfs was not enabled. This patch unifies the two versions: it uses the same approach as Felix' patch to count the number of RX handler invocations, but uses the same time-based windowing approach as Simon and Sven's patch to still handle the case where occasional interrupts appear but the device is otherwise deaf. [0] https://patchwork.kernel.org/project/linux-wireless/patch/20170125163654.66431-3-nbd@nbd.name/ [1] https://patchwork.kernel.org/project/linux-wireless/patch/20161117083614.19188-2-sven.eckelmann@open-mesh.com/ Signed-off-by: Sven Eckelmann --- ...41ND-N-8M-and-16M-variants-to-ath79.patch} | 0 ...k-around-AR_CFG-0xdeadbeef-chip-hang.patch | 362 ------------------ ...ection-and-reset-chip-when-it-occurs.patch | 221 +++++++++++ ...k-Reset-chip-on-potential-deaf-state.patch | 145 ------- ...9k-Disable-HW-support-for-group-keys.patch | 49 --- 5 files changed, 221 insertions(+), 556 deletions(-) rename patches/openwrt/{0013-add-TL-WR841ND-N-8M-and-16M-variants-to-ath79.patch => 0010-add-TL-WR841ND-N-8M-and-16M-variants-to-ath79.patch} (100%) delete mode 100644 patches/openwrt/0010-ath9k-work-around-AR_CFG-0xdeadbeef-chip-hang.patch create mode 100644 patches/openwrt/0011-ath9k-Add-RX-inactivity-detection-and-reset-chip-when-it-occurs.patch delete mode 100644 patches/openwrt/0011-ath9k-Reset-chip-on-potential-deaf-state.patch delete mode 100644 patches/openwrt/0012-mac80211-ath9k-Disable-HW-support-for-group-keys.patch diff --git a/patches/openwrt/0013-add-TL-WR841ND-N-8M-and-16M-variants-to-ath79.patch b/patches/openwrt/0010-add-TL-WR841ND-N-8M-and-16M-variants-to-ath79.patch similarity index 100% rename from patches/openwrt/0013-add-TL-WR841ND-N-8M-and-16M-variants-to-ath79.patch rename to patches/openwrt/0010-add-TL-WR841ND-N-8M-and-16M-variants-to-ath79.patch diff --git a/patches/openwrt/0010-ath9k-work-around-AR_CFG-0xdeadbeef-chip-hang.patch b/patches/openwrt/0010-ath9k-work-around-AR_CFG-0xdeadbeef-chip-hang.patch deleted file mode 100644 index 2fde8d304a..0000000000 --- a/patches/openwrt/0010-ath9k-work-around-AR_CFG-0xdeadbeef-chip-hang.patch +++ /dev/null @@ -1,362 +0,0 @@ -From: Sven Eckelmann -Date: Fri, 2 Dec 2016 15:36:08 +0100 -Subject: ath9k: work around AR_CFG 0xdeadbeef chip hang - -QCA 802.11n chips (especially AR9330/AR9340) sometimes end up in a state in -which a read of AR_CFG always returns 0xdeadbeef. This should not happen -when when the power_mode of the device is ATH9K_PM_AWAKE. - -This problem is not yet detected by any other workaround in ath9k. No way -is known to reproduce the problem easily. - -Signed-off-by: Simon Wunderlich -[sven.eckelmann@open-mesh.com: port to recent ath9k, add commit message] -Signed-off-by: Sven Eckelmann - -diff --git a/package/kernel/mac80211/patches/ath/950-Revert-355-ath9k_hw-check-if-the-chip-failed-to-wake.patch b/package/kernel/mac80211/patches/ath/950-Revert-355-ath9k_hw-check-if-the-chip-failed-to-wake.patch -new file mode 100644 -index 0000000000000000000000000000000000000000..146d0a9cf8ada26a6ff7d3ed32b4041fbadba822 ---- /dev/null -+++ b/package/kernel/mac80211/patches/ath/950-Revert-355-ath9k_hw-check-if-the-chip-failed-to-wake.patch -@@ -0,0 +1,19 @@ -+From: Sven Eckelmann -+Date: Thu, 23 Apr 2020 18:43:56 +0200 -+Subject: Revert 355-ath9k_hw-check-if-the-chip-failed-to-wake.patch -+ -+diff --git a/drivers/net/wireless/ath/ath9k/hw.c b/drivers/net/wireless/ath/ath9k/hw.c -+index f0a773850289e8d9478d28d33f29e8d02102053a..e4e7271e70d66c1bd0499db377a7be6ae1bdd7ec 100644 -+--- a/drivers/net/wireless/ath/ath9k/hw.c -++++ b/drivers/net/wireless/ath/ath9k/hw.c -+@@ -1680,10 +1680,6 @@ bool ath9k_hw_check_alive(struct ath_hw *ah) -+ int count = 50; -+ u32 reg, last_val; -+ -+- /* Check if chip failed to wake up */ -+- if (REG_READ(ah, AR_CFG) == 0xdeadbeef) -+- return false; -+- -+ if (AR_SREV_9300(ah)) -+ return !ath9k_hw_detect_mac_hang(ah); -+ -diff --git a/package/kernel/mac80211/patches/ath/951-Revert-354-ath9k-rename-tx_complete_work-to-hw_check.patch b/package/kernel/mac80211/patches/ath/951-Revert-354-ath9k-rename-tx_complete_work-to-hw_check.patch -new file mode 100644 -index 0000000000000000000000000000000000000000..e9f45c7398c8be5241e15e848d540146756b7668 ---- /dev/null -+++ b/package/kernel/mac80211/patches/ath/951-Revert-354-ath9k-rename-tx_complete_work-to-hw_check.patch -@@ -0,0 +1,168 @@ -+From: Sven Eckelmann -+Date: Thu, 23 Apr 2020 18:44:45 +0200 -+Subject: Revert 354-ath9k-rename-tx_complete_work-to-hw_check.patch -+ -+--- a/drivers/net/wireless/ath/ath9k/ath9k.h -++++ b/drivers/net/wireless/ath/ath9k/ath9k.h -+@@ -108,7 +108,7 @@ int ath_descdma_setup(struct ath_softc * -+ #define ATH_AGGR_MIN_QDEPTH 2 -+ /* minimum h/w qdepth for non-aggregated traffic */ -+ #define ATH_NON_AGGR_MIN_QDEPTH 8 -+-#define ATH_HW_CHECK_POLL_INT 1000 -++#define ATH_TX_COMPLETE_POLL_INT 1000 -+ #define ATH_TXFIFO_DEPTH 8 -+ #define ATH_TX_ERROR 0x01 -+ -+@@ -738,7 +738,7 @@ void ath9k_csa_update(struct ath_softc * -+ #define ATH_PAPRD_TIMEOUT 100 /* msecs */ -+ #define ATH_PLL_WORK_INTERVAL 100 -+ -+-void ath_hw_check_work(struct work_struct *work); -++void ath_tx_complete_poll_work(struct work_struct *work); -+ void ath_reset_work(struct work_struct *work); -+ bool ath_hw_check(struct ath_softc *sc); -+ void ath_hw_pll_work(struct work_struct *work); -+@@ -1040,7 +1040,7 @@ struct ath_softc { -+ #ifdef CPTCFG_ATH9K_DEBUGFS -+ struct ath9k_debug debug; -+ #endif -+- struct delayed_work hw_check_work; -++ struct delayed_work tx_complete_work; -+ struct delayed_work hw_pll_work; -+ struct timer_list sleep_timer; -+ -+--- a/drivers/net/wireless/ath/ath9k/init.c -++++ b/drivers/net/wireless/ath/ath9k/init.c -+@@ -787,7 +787,6 @@ static int ath9k_init_softc(u16 devid, s -+ INIT_WORK(&sc->hw_reset_work, ath_reset_work); -+ INIT_WORK(&sc->paprd_work, ath_paprd_calibrate); -+ INIT_DELAYED_WORK(&sc->hw_pll_work, ath_hw_pll_work); -+- INIT_DELAYED_WORK(&sc->hw_check_work, ath_hw_check_work); -+ -+ ath9k_init_channel_context(sc); -+ -+--- a/drivers/net/wireless/ath/ath9k/link.c -++++ b/drivers/net/wireless/ath/ath9k/link.c -+@@ -20,13 +20,20 @@ -+ * TX polling - checks if the TX engine is stuck somewhere -+ * and issues a chip reset if so. -+ */ -+-static bool ath_tx_complete_check(struct ath_softc *sc) -++void ath_tx_complete_poll_work(struct work_struct *work) -+ { -++ struct ath_softc *sc = container_of(work, struct ath_softc, -++ tx_complete_work.work); -+ struct ath_txq *txq; -+ int i; -++ bool needreset = false; -++ -+ -+- if (sc->tx99_state) -+- return true; -++ if (sc->tx99_state) { -++ ath_dbg(ath9k_hw_common(sc->sc_ah), RESET, -++ "skip tx hung detection on tx99\n"); -++ return; -++ } -+ -+ for (i = 0; i < IEEE80211_NUM_ACS; i++) { -+ txq = sc->tx.txq_map[i]; -+@@ -34,36 +41,25 @@ static bool ath_tx_complete_check(struct -+ ath_txq_lock(sc, txq); -+ if (txq->axq_depth) { -+ if (txq->axq_tx_inprogress) { -++ needreset = true; -+ ath_txq_unlock(sc, txq); -+- goto reset; -++ break; -++ } else { -++ txq->axq_tx_inprogress = true; -+ } -+- -+- txq->axq_tx_inprogress = true; -+ } -+ ath_txq_unlock(sc, txq); -+ } -+ -+- return true; -+- -+-reset: -+- ath_dbg(ath9k_hw_common(sc->sc_ah), RESET, -+- "tx hung, resetting the chip\n"); -+- ath9k_queue_reset(sc, RESET_TYPE_TX_HANG); -+- return false; -+- -+-} -+- -+-void ath_hw_check_work(struct work_struct *work) -+-{ -+- struct ath_softc *sc = container_of(work, struct ath_softc, -+- hw_check_work.work); -+- -+- if (!ath_hw_check(sc) || -+- !ath_tx_complete_check(sc)) -++ if (needreset) { -++ ath_dbg(ath9k_hw_common(sc->sc_ah), RESET, -++ "tx hung, resetting the chip\n"); -++ ath9k_queue_reset(sc, RESET_TYPE_TX_HANG); -+ return; -++ } -+ -+- ieee80211_queue_delayed_work(sc->hw, &sc->hw_check_work, -+- msecs_to_jiffies(ATH_HW_CHECK_POLL_INT)); -++ ieee80211_queue_delayed_work(sc->hw, &sc->tx_complete_work, -++ msecs_to_jiffies(ATH_TX_COMPLETE_POLL_INT)); -+ } -+ -+ /* -+--- a/drivers/net/wireless/ath/ath9k/main.c -++++ b/drivers/net/wireless/ath/ath9k/main.c -+@@ -184,7 +184,7 @@ void ath9k_ps_restore(struct ath_softc * -+ static void __ath_cancel_work(struct ath_softc *sc) -+ { -+ cancel_work_sync(&sc->paprd_work); -+- cancel_delayed_work_sync(&sc->hw_check_work); -++ cancel_delayed_work_sync(&sc->tx_complete_work); -+ cancel_delayed_work_sync(&sc->hw_pll_work); -+ -+ #ifdef CPTCFG_ATH9K_BTCOEX_SUPPORT -+@@ -201,8 +201,7 @@ void ath_cancel_work(struct ath_softc *s -+ -+ void ath_restart_work(struct ath_softc *sc) -+ { -+- ieee80211_queue_delayed_work(sc->hw, &sc->hw_check_work, -+- msecs_to_jiffies(ATH_HW_CHECK_POLL_INT)); -++ ieee80211_queue_delayed_work(sc->hw, &sc->tx_complete_work, 0); -+ -+ if (AR_SREV_9340(sc->sc_ah) || AR_SREV_9330(sc->sc_ah)) -+ ieee80211_queue_delayed_work(sc->hw, &sc->hw_pll_work, -+@@ -2200,7 +2199,7 @@ void __ath9k_flush(struct ieee80211_hw * -+ int timeout; -+ bool drain_txq; -+ -+- cancel_delayed_work_sync(&sc->hw_check_work); -++ cancel_delayed_work_sync(&sc->tx_complete_work); -+ -+ if (ah->ah_flags & AH_UNPLUGGED) { -+ ath_dbg(common, ANY, "Device has been unplugged!\n"); -+@@ -2238,8 +2237,7 @@ void __ath9k_flush(struct ieee80211_hw * -+ ath9k_ps_restore(sc); -+ } -+ -+- ieee80211_queue_delayed_work(hw, &sc->hw_check_work, -+- msecs_to_jiffies(ATH_HW_CHECK_POLL_INT)); -++ ieee80211_queue_delayed_work(hw, &sc->tx_complete_work, 0); -+ } -+ -+ static bool ath9k_tx_frames_pending(struct ieee80211_hw *hw) -+--- a/drivers/net/wireless/ath/ath9k/xmit.c -++++ b/drivers/net/wireless/ath/ath9k/xmit.c -+@@ -2837,6 +2837,8 @@ int ath_tx_init(struct ath_softc *sc, in -+ return error; -+ } -+ -++ INIT_DELAYED_WORK(&sc->tx_complete_work, ath_tx_complete_poll_work); -++ -+ if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) -+ error = ath_tx_edma_init(sc); -+ -diff --git a/package/kernel/mac80211/patches/ath/952-ath9k-work-around-deadbeef-wifi-chip-hang.patch b/package/kernel/mac80211/patches/ath/952-ath9k-work-around-deadbeef-wifi-chip-hang.patch -new file mode 100644 -index 0000000000000000000000000000000000000000..b35eb842b75eb4f97d1f10e13caad3cb7c0d846a ---- /dev/null -+++ b/package/kernel/mac80211/patches/ath/952-ath9k-work-around-deadbeef-wifi-chip-hang.patch -@@ -0,0 +1,142 @@ -+From: Simon Wunderlich -+Date: Thu, 17 Nov 2016 09:36:13 +0100 -+Subject: ath9k: work around AR_CFG 0xdeadbeef chip hang -+ -+QCA 802.11n chips (especially AR9330/AR9340) sometimes end up in a state in -+which a read of AR_CFG always returns 0xdeadbeef. This should not happen -+when when the power_mode of the device is ATH9K_PM_AWAKE. -+ -+This problem is not yet detected by any other workaround in ath9k. No way -+is known to reproduce the problem easily. -+ -+Signed-off-by: Simon Wunderlich -+[sven.eckelmann@open-mesh.com: port to recent ath9k, add commit message] -+Signed-off-by: Sven Eckelmann -+ -+diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h -+index e38065b2e79f3df92c4679c310a1de871134460f..7fb33b64ec7e24be444c59b22af0b5f526ddfc67 100644 -+--- a/drivers/net/wireless/ath/ath9k/ath9k.h -++++ b/drivers/net/wireless/ath/ath9k/ath9k.h -+@@ -746,11 +746,13 @@ void ath9k_csa_update(struct ath_softc *sc); -+ #define ATH_ANI_MAX_SKIP_COUNT 10 -+ #define ATH_PAPRD_TIMEOUT 100 /* msecs */ -+ #define ATH_PLL_WORK_INTERVAL 100 -++#define ATH_HANG_WORK_INTERVAL 30000 -+ -+ void ath_tx_complete_poll_work(struct work_struct *work); -+ void ath_reset_work(struct work_struct *work); -+ bool ath_hw_check(struct ath_softc *sc); -+ void ath_hw_pll_work(struct work_struct *work); -++void ath_hw_hang_work(struct work_struct *work); -+ void ath_paprd_calibrate(struct work_struct *work); -+ void ath_ani_calibrate(struct timer_list *t); -+ void ath_start_ani(struct ath_softc *sc); -+@@ -1082,6 +1084,7 @@ struct ath_softc { -+ #endif -+ struct delayed_work tx_complete_work; -+ struct delayed_work hw_pll_work; -++ struct delayed_work hw_hang_work; -+ struct timer_list sleep_timer; -+ -+ #ifdef CPTCFG_ATH9K_BTCOEX_SUPPORT -+diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c -+index f20364bd20273acc0023394ca55a00b35407494c..1aeabb58dc022127ba1857262b82f52403c8e4ac 100644 -+--- a/drivers/net/wireless/ath/ath9k/debug.c -++++ b/drivers/net/wireless/ath/ath9k/debug.c -+@@ -765,6 +765,7 @@ static int read_file_reset(struct seq_file *file, void *data) -+ [RESET_TYPE_CALIBRATION] = "Calibration error", -+ [RESET_TX_DMA_ERROR] = "Tx DMA stop error", -+ [RESET_RX_DMA_ERROR] = "Rx DMA stop error", -++ [RESET_TYPE_DEADBEEF] = "deadbeef hang", -+ }; -+ int i; -+ -+diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h -+index 80cf8b3782d419ea7de5315fa148d6ccea9f306f..60ada5dd21c9d456f15c8c1f785aeeba4e1acc60 100644 -+--- a/drivers/net/wireless/ath/ath9k/debug.h -++++ b/drivers/net/wireless/ath/ath9k/debug.h -+@@ -52,6 +52,7 @@ enum ath_reset_type { -+ RESET_TYPE_CALIBRATION, -+ RESET_TX_DMA_ERROR, -+ RESET_RX_DMA_ERROR, -++ RESET_TYPE_DEADBEEF, -+ __RESET_TYPE_MAX -+ }; -+ -+diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c -+index 67b4457d0019325ca4919e96d5a5820b357cd9cc..de1c6ba12390c307e28af3ac3f54b83da27aadc7 100644 -+--- a/drivers/net/wireless/ath/ath9k/init.c -++++ b/drivers/net/wireless/ath/ath9k/init.c -+@@ -744,6 +744,7 @@ static int ath9k_init_softc(u16 devid, struct ath_softc *sc, -+ INIT_WORK(&sc->hw_reset_work, ath_reset_work); -+ INIT_WORK(&sc->paprd_work, ath_paprd_calibrate); -+ INIT_DELAYED_WORK(&sc->hw_pll_work, ath_hw_pll_work); -++ INIT_DELAYED_WORK(&sc->hw_hang_work, ath_hw_hang_work); -+ -+ ath9k_init_channel_context(sc); -+ -+diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c -+index 321136767c99951682374a632f94476089d57219..1ed6f2fd27b4db1adc9a0e8489b7e937ca068f22 100644 -+--- a/drivers/net/wireless/ath/ath9k/link.c -++++ b/drivers/net/wireless/ath/ath9k/link.c -+@@ -138,6 +138,38 @@ void ath_hw_pll_work(struct work_struct *work) -+ msecs_to_jiffies(ATH_PLL_WORK_INTERVAL)); -+ } -+ -++static bool ath_hw_hang_deadbeef(struct ath_softc *sc) -++{ -++ struct ath_common *common = ath9k_hw_common(sc->sc_ah); -++ u32 reg; -++ -++ /* check for stucked MAC */ -++ ath9k_ps_wakeup(sc); -++ reg = REG_READ(sc->sc_ah, AR_CFG); -++ ath9k_ps_restore(sc); -++ -++ if (reg != 0xdeadbeef) -++ return false; -++ -++ ath_dbg(common, RESET, -++ "0xdeadbeef hang is detected. Schedule chip reset\n"); -++ -++ ath9k_queue_reset(sc, RESET_TYPE_DEADBEEF); -++ -++ return true; -++} -++ -++void ath_hw_hang_work(struct work_struct *work) -++{ -++ struct ath_softc *sc = container_of(work, struct ath_softc, -++ hw_hang_work.work); -++ -++ ath_hw_hang_deadbeef(sc); -++ -++ ieee80211_queue_delayed_work(sc->hw, &sc->hw_hang_work, -++ msecs_to_jiffies(ATH_HANG_WORK_INTERVAL)); -++} -++ -+ /* -+ * PA Pre-distortion. -+ */ -+diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c -+index 1434cebd2ed0d8573ca664d27103c52473959e72..f08e22d6ea17b2876e2e1ca102f0c083f78330dd 100644 -+--- a/drivers/net/wireless/ath/ath9k/main.c -++++ b/drivers/net/wireless/ath/ath9k/main.c -+@@ -186,6 +186,7 @@ static void __ath_cancel_work(struct ath_softc *sc) -+ cancel_work_sync(&sc->paprd_work); -+ cancel_delayed_work_sync(&sc->tx_complete_work); -+ cancel_delayed_work_sync(&sc->hw_pll_work); -++ cancel_delayed_work_sync(&sc->hw_hang_work); -+ -+ #ifdef CPTCFG_ATH9K_BTCOEX_SUPPORT -+ if (ath9k_hw_mci_is_enabled(sc->sc_ah)) -+@@ -207,6 +208,9 @@ void ath_restart_work(struct ath_softc *sc) -+ ieee80211_queue_delayed_work(sc->hw, &sc->hw_pll_work, -+ msecs_to_jiffies(ATH_PLL_WORK_INTERVAL)); -+ -++ ieee80211_queue_delayed_work(sc->hw, &sc->hw_hang_work, -++ msecs_to_jiffies(ATH_HANG_WORK_INTERVAL)); -++ -+ ath_start_ani(sc); -+ } -+ diff --git a/patches/openwrt/0011-ath9k-Add-RX-inactivity-detection-and-reset-chip-when-it-occurs.patch b/patches/openwrt/0011-ath9k-Add-RX-inactivity-detection-and-reset-chip-when-it-occurs.patch new file mode 100644 index 0000000000..5a22e6ede3 --- /dev/null +++ b/patches/openwrt/0011-ath9k-Add-RX-inactivity-detection-and-reset-chip-when-it-occurs.patch @@ -0,0 +1,221 @@ +From: Toke Høiland-Jørgensen +Date: Wed, 6 Nov 2024 13:41:44 +0100 +Subject: ath9k: Add RX inactivity detection and reset chip when it occurs + +Some ath9k chips can, seemingly at random, end up in a state which can +be described as "deaf". No or nearly no interrupts are generated anymore +for incoming packets. Existing links either break down after a while and +new links will not be established. + +The circumstances leading to this "deafness" is still unclear, but some +particular chips (especially 2-stream 11n SoCs, but also others) can go +'deaf' when running AP or mesh (or both) after some time. It's probably +a hardware issue, and doing a channel scan to trigger a chip +reset (which one normally can't do on an AP interface) recovers the +hardware. + +The only way the driver can detect this state, is by detecting if there +has been no RX activity for a while. In this case we can proactively +reset the chip (which only takes a small number of milliseconds, so +shouldn't interrupt things too much if it has been idle for several +seconds), which functions as a workaround. + +OpenWrt, and various derivatives, have been carrying versions of this +workaround for years, that were never upstreamed. One version[0], +written by Felix Fietkau, used a simple counter and only reset if there +was precisely zero RX activity for a long period of time. This had the +problem that in some cases a small number of interrupts would appear +even if the device was otherwise not responsive. For this reason, +another version[1], written by Simon Wunderlich and Sven Eckelmann, used +a time-based approach to calculate the average number of RX interrupts +over a longer (four-second) interval, and reset the chip when seeing +less than one interrupt per second over this period. However, that +version relied on debugfs counters to keep track of the number of +interrupts, which means it didn't work at all if debugfs was not +enabled. + +This patch unifies the two versions: it uses the same approach as Felix' +patch to count the number of RX handler invocations, but uses the same +time-based windowing approach as Simon and Sven's patch to still handle +the case where occasional interrupts appear but the device is otherwise +deaf. + +Since this is based on ideas by all three people, but not actually +directly derived from any of the patches, I'm including Suggested-by +tags from Simon, Sven and Felix below, which should hopefully serve as +proper credit. + +[0] https://patchwork.kernel.org/project/linux-wireless/patch/20170125163654.66431-3-nbd@nbd.name/ +[1] https://patchwork.kernel.org/project/linux-wireless/patch/20161117083614.19188-2-sven.eckelmann@open-mesh.com/ + +Suggested-by: Simon Wunderlich +Suggested-by: Sven Eckelmann +Suggested-by: Felix Fietkau +Signed-off-by: Toke Høiland-Jørgensen +Reviewed-by: Sven Eckelmann +Signed-off-by: Sven Eckelmann + +diff --git a/package/kernel/mac80211/patches/ath9k/554-ath9k-Add-RX-inactivity-detection-and-reset-chip-whe.patch b/package/kernel/mac80211/patches/ath9k/554-ath9k-Add-RX-inactivity-detection-and-reset-chip-whe.patch +new file mode 100644 +index 0000000000000000000000000000000000000000..17b3ae9df72d20dfe4ac9070d3fb20cc66b716de +--- /dev/null ++++ b/package/kernel/mac80211/patches/ath9k/554-ath9k-Add-RX-inactivity-detection-and-reset-chip-whe.patch +@@ -0,0 +1,158 @@ ++From: Toke Høiland-Jørgensen ++Date: Wed, 6 Nov 2024 13:41:44 +0100 ++Subject: ath9k: Add RX inactivity detection and reset chip when it occurs ++ ++Some ath9k chips can, seemingly at random, end up in a state which can ++be described as "deaf". No or nearly no interrupts are generated anymore ++for incoming packets. Existing links either break down after a while and ++new links will not be established. ++ ++The circumstances leading to this "deafness" is still unclear, but some ++particular chips (especially 2-stream 11n SoCs, but also others) can go ++'deaf' when running AP or mesh (or both) after some time. It's probably ++a hardware issue, and doing a channel scan to trigger a chip ++reset (which one normally can't do on an AP interface) recovers the ++hardware. ++ ++The only way the driver can detect this state, is by detecting if there ++has been no RX activity for a while. In this case we can proactively ++reset the chip (which only takes a small number of milliseconds, so ++shouldn't interrupt things too much if it has been idle for several ++seconds), which functions as a workaround. ++ ++OpenWrt, and various derivatives, have been carrying versions of this ++workaround for years, that were never upstreamed. One version[0], ++written by Felix Fietkau, used a simple counter and only reset if there ++was precisely zero RX activity for a long period of time. This had the ++problem that in some cases a small number of interrupts would appear ++even if the device was otherwise not responsive. For this reason, ++another version[1], written by Simon Wunderlich and Sven Eckelmann, used ++a time-based approach to calculate the average number of RX interrupts ++over a longer (four-second) interval, and reset the chip when seeing ++less than one interrupt per second over this period. However, that ++version relied on debugfs counters to keep track of the number of ++interrupts, which means it didn't work at all if debugfs was not ++enabled. ++ ++This patch unifies the two versions: it uses the same approach as Felix' ++patch to count the number of RX handler invocations, but uses the same ++time-based windowing approach as Simon and Sven's patch to still handle ++the case where occasional interrupts appear but the device is otherwise ++deaf. ++ ++Since this is based on ideas by all three people, but not actually ++directly derived from any of the patches, I'm including Suggested-by ++tags from Simon, Sven and Felix below, which should hopefully serve as ++proper credit. ++ ++[0] https://patchwork.kernel.org/project/linux-wireless/patch/20170125163654.66431-3-nbd@nbd.name/ ++[1] https://patchwork.kernel.org/project/linux-wireless/patch/20161117083614.19188-2-sven.eckelmann@open-mesh.com/ ++ ++Suggested-by: Simon Wunderlich ++Suggested-by: Sven Eckelmann ++Suggested-by: Felix Fietkau ++Signed-off-by: Toke Høiland-Jørgensen ++Reviewed-by: Sven Eckelmann ++Signed-off-by: Sven Eckelmann ++ ++diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h ++index 49c9596e6edbedb6f0a298573ff5b1518b60b300..dea42d3ad15d613451f6618a04a43636d240a931 100644 ++--- a/drivers/net/wireless/ath/ath9k/ath9k.h +++++ b/drivers/net/wireless/ath/ath9k/ath9k.h ++@@ -1039,6 +1039,8 @@ struct ath_softc { ++ ++ u8 gtt_cnt; ++ u32 intrstatus; +++ u32 rx_active_check_time; +++ u32 rx_active_count; ++ u16 ps_flags; /* PS_* */ ++ bool ps_enabled; ++ bool ps_idle; ++diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c ++index 4665a20b9b0666f23ef190ed4e2387cea350af1a..5b504a5dfbded71710f80e1299bcd30584d95ab8 100644 ++--- a/drivers/net/wireless/ath/ath9k/debug.c +++++ b/drivers/net/wireless/ath/ath9k/debug.c ++@@ -765,6 +765,7 @@ static int read_file_reset(struct seq_file *file, void *data) ++ [RESET_TYPE_CALIBRATION] = "Calibration error", ++ [RESET_TX_DMA_ERROR] = "Tx DMA stop error", ++ [RESET_RX_DMA_ERROR] = "Rx DMA stop error", +++ [RESET_TYPE_RX_INACTIVE] = "Rx path inactive", ++ }; ++ int i; ++ ++diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h ++index 2a122e0db8af057acb64937f20f4d38788990d71..fdf992deb90be5677d4302dd4785ec4c797fc09d 100644 ++--- a/drivers/net/wireless/ath/ath9k/debug.h +++++ b/drivers/net/wireless/ath/ath9k/debug.h ++@@ -53,6 +53,7 @@ enum ath_reset_type { ++ RESET_TYPE_CALIBRATION, ++ RESET_TX_DMA_ERROR, ++ RESET_RX_DMA_ERROR, +++ RESET_TYPE_RX_INACTIVE, ++ __RESET_TYPE_MAX ++ }; ++ ++diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c ++index 9d84003db800e9513f07ee85a4203dbb7636edf0..fcc346c46d01ff4bedb3695b14c3e53873931f83 100644 ++--- a/drivers/net/wireless/ath/ath9k/link.c +++++ b/drivers/net/wireless/ath/ath9k/link.c ++@@ -50,7 +50,36 @@ reset: ++ "tx hung, resetting the chip\n"); ++ ath9k_queue_reset(sc, RESET_TYPE_TX_HANG); ++ return false; +++} ++ +++#define RX_INACTIVE_CHECK_INTERVAL (4 * MSEC_PER_SEC) +++ +++static bool ath_hw_rx_inactive_check(struct ath_softc *sc) +++{ +++ struct ath_common *common = ath9k_hw_common(sc->sc_ah); +++ u32 interval, count; +++ +++ interval = jiffies_to_msecs(jiffies - sc->rx_active_check_time); +++ count = sc->rx_active_count; +++ +++ if (interval < RX_INACTIVE_CHECK_INTERVAL) +++ return true; /* too soon to check */ +++ +++ sc->rx_active_count = 0; +++ sc->rx_active_check_time = jiffies; +++ +++ /* Need at least one interrupt per second, and we should only react if +++ * we are within a factor two of the expected interval +++ */ +++ if (interval > RX_INACTIVE_CHECK_INTERVAL * 2 || +++ count >= interval / MSEC_PER_SEC) +++ return true; +++ +++ ath_dbg(common, RESET, +++ "RX inactivity detected. Schedule chip reset\n"); +++ ath9k_queue_reset(sc, RESET_TYPE_RX_INACTIVE); +++ +++ return false; ++ } ++ ++ void ath_hw_check_work(struct work_struct *work) ++@@ -58,8 +87,8 @@ void ath_hw_check_work(struct work_struct *work) ++ struct ath_softc *sc = container_of(work, struct ath_softc, ++ hw_check_work.work); ++ ++- if (!ath_hw_check(sc) || ++- !ath_tx_complete_check(sc)) +++ if (!ath_hw_check(sc) || !ath_tx_complete_check(sc) || +++ !ath_hw_rx_inactive_check(sc)) ++ return; ++ ++ ieee80211_queue_delayed_work(sc->hw, &sc->hw_check_work, ++diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c ++index 9b186ab30214658fa5b26b1f0ca778309af34600..697a9658490950df64418f392db885c5b087cc97 100644 ++--- a/drivers/net/wireless/ath/ath9k/main.c +++++ b/drivers/net/wireless/ath/ath9k/main.c ++@@ -454,6 +454,7 @@ void ath9k_tasklet(struct tasklet_struct *t) ++ ath_rx_tasklet(sc, 0, true); ++ ++ ath_rx_tasklet(sc, 0, false); +++ sc->rx_active_count++; ++ } ++ ++ if (status & ATH9K_INT_TX) { diff --git a/patches/openwrt/0011-ath9k-Reset-chip-on-potential-deaf-state.patch b/patches/openwrt/0011-ath9k-Reset-chip-on-potential-deaf-state.patch deleted file mode 100644 index 3ff1ae7803..0000000000 --- a/patches/openwrt/0011-ath9k-Reset-chip-on-potential-deaf-state.patch +++ /dev/null @@ -1,145 +0,0 @@ -From: Sven Eckelmann -Date: Fri, 2 Dec 2016 15:36:33 +0100 -Subject: ath9k: Reset chip on potential deaf state - -The chip is switching seemingly random into a state which can be described -as "deaf". No or nearly no interrupts are generated anymore for incoming -packets. Existing links either break down after a while and new links will -not be established. - -The driver doesn't know if there is no other device available or if it -ended up in an "deaf" state. Resetting the chip proactively avoids -permanent problems in case the chip really was in its "deaf" state but -maybe causes unnecessary resets in case it wasn't "deaf". - -Signed-off-by: Simon Wunderlich -[sven.eckelmann@open-mesh.com: port to recent ath9k, add commit message] -Signed-off-by: Sven Eckelmann - -diff --git a/package/kernel/mac80211/patches/ath/953-ath9k-add-workaround-for-hanging-chip-not-enough-int.patch b/package/kernel/mac80211/patches/ath/953-ath9k-add-workaround-for-hanging-chip-not-enough-int.patch -new file mode 100644 -index 0000000000000000000000000000000000000000..7604ea0849072b4299d89d98b946b9286ee2bca9 ---- /dev/null -+++ b/package/kernel/mac80211/patches/ath/953-ath9k-add-workaround-for-hanging-chip-not-enough-int.patch -@@ -0,0 +1,121 @@ -+From: Simon Wunderlich -+Date: Thu, 17 Nov 2016 09:36:14 +0100 -+Subject: ath9k: Reset chip on potential deaf state -+ -+The chip is switching seemingly random into a state which can be described -+as "deaf". No or nearly no interrupts are generated anymore for incoming -+packets. Existing links either break down after a while and new links will -+not be established. -+ -+The driver doesn't know if there is no other device available or if it -+ended up in an "deaf" state. Resetting the chip proactively avoids -+permanent problems in case the chip really was in its "deaf" state but -+maybe causes unnecessary resets in case it wasn't "deaf". -+ -+Signed-off-by: Simon Wunderlich -+[sven.eckelmann@open-mesh.com: port to recent ath9k, add commit message] -+Signed-off-by: Sven Eckelmann -+ -+diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h -+index 7fb33b64ec7e24be444c59b22af0b5f526ddfc67..62072e4dbb5825f0d7d16ae6317071614278cad1 100644 -+--- a/drivers/net/wireless/ath/ath9k/ath9k.h -++++ b/drivers/net/wireless/ath/ath9k/ath9k.h -+@@ -1061,6 +1061,9 @@ struct ath_softc { -+ -+ u16 airtime_flags; /* AIRTIME_* */ -+ -++ unsigned long last_check_time; -++ u32 last_check_interrupts; -++ -+ struct ath_rx rx; -+ struct ath_tx tx; -+ struct ath_beacon beacon; -+diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c -+index 1aeabb58dc022127ba1857262b82f52403c8e4ac..c9e7aeb765fa649344340ee3880f38ee9f9ef40e 100644 -+--- a/drivers/net/wireless/ath/ath9k/debug.c -++++ b/drivers/net/wireless/ath/ath9k/debug.c -+@@ -766,6 +766,7 @@ static int read_file_reset(struct seq_file *file, void *data) -+ [RESET_TX_DMA_ERROR] = "Tx DMA stop error", -+ [RESET_RX_DMA_ERROR] = "Rx DMA stop error", -+ [RESET_TYPE_DEADBEEF] = "deadbeef hang", -++ [RESET_TYPE_DEAF] = "deaf hang", -+ }; -+ int i; -+ -+diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h -+index 60ada5dd21c9d456f15c8c1f785aeeba4e1acc60..d8e618aead7f70211456767fdcef77fb6b634fa4 100644 -+--- a/drivers/net/wireless/ath/ath9k/debug.h -++++ b/drivers/net/wireless/ath/ath9k/debug.h -+@@ -53,6 +53,7 @@ enum ath_reset_type { -+ RESET_TX_DMA_ERROR, -+ RESET_RX_DMA_ERROR, -+ RESET_TYPE_DEADBEEF, -++ RESET_TYPE_DEAF, -+ __RESET_TYPE_MAX -+ }; -+ -+diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c -+index 1ed6f2fd27b4db1adc9a0e8489b7e937ca068f22..3302a30adc98f52bcb4f2e79f1ab4f0bf0e96e68 100644 -+--- a/drivers/net/wireless/ath/ath9k/link.c -++++ b/drivers/net/wireless/ath/ath9k/link.c -+@@ -159,13 +159,59 @@ static bool ath_hw_hang_deadbeef(struct ath_softc *sc) -+ return true; -+ } -+ -++static bool ath_hw_hang_deaf(struct ath_softc *sc) -++{ -++#if !defined(CPTCFG_ATH9K_DEBUGFS) || defined(CPTCFG_ATH9K_TX99) -++ return false; -++#else -++ struct ath_common *common = ath9k_hw_common(sc->sc_ah); -++ u32 interrupts, interrupt_per_s; -++ unsigned int interval; -++ -++ /* get historic data */ -++ interval = jiffies_to_msecs(jiffies - sc->last_check_time); -++ if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) -++ interrupts = sc->debug.stats.istats.rxlp + sc->debug.stats.istats.rxhp; -++ else -++ interrupts = sc->debug.stats.istats.rxok; -++ -++ interrupts -= sc->last_check_interrupts; -++ -++ /* save current data */ -++ sc->last_check_time = jiffies; -++ if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) -++ sc->last_check_interrupts = sc->debug.stats.istats.rxlp + sc->debug.stats.istats.rxhp; -++ else -++ sc->last_check_interrupts = sc->debug.stats.istats.rxok; -++ -++ /* sanity check, should be 30 seconds */ -++ if (interval > 40000 || interval < 20000) -++ return false; -++ -++ /* should be at least one interrupt per second */ -++ interrupt_per_s = interrupts / (interval / 1000); -++ if (interrupt_per_s >= 1) -++ return false; -++ -++ ath_dbg(common, RESET, -++ "RX deaf hang is detected. Schedule chip reset\n"); -++ ath9k_queue_reset(sc, RESET_TYPE_DEAF); -++ -++ return true; -++#endif -++} -++ -+ void ath_hw_hang_work(struct work_struct *work) -+ { -+ struct ath_softc *sc = container_of(work, struct ath_softc, -+ hw_hang_work.work); -+ -+- ath_hw_hang_deadbeef(sc); -++ if (ath_hw_hang_deadbeef(sc)) -++ goto requeue_worker; -+ -++ ath_hw_hang_deaf(sc); -++ -++requeue_worker: -+ ieee80211_queue_delayed_work(sc->hw, &sc->hw_hang_work, -+ msecs_to_jiffies(ATH_HANG_WORK_INTERVAL)); -+ } diff --git a/patches/openwrt/0012-mac80211-ath9k-Disable-HW-support-for-group-keys.patch b/patches/openwrt/0012-mac80211-ath9k-Disable-HW-support-for-group-keys.patch deleted file mode 100644 index 2d048c6777..0000000000 --- a/patches/openwrt/0012-mac80211-ath9k-Disable-HW-support-for-group-keys.patch +++ /dev/null @@ -1,49 +0,0 @@ -From: Sven Eckelmann -Date: Sat, 17 Apr 2021 20:08:53 +0200 -Subject: mac80211: ath9k: Disable HW support for group keys - -It is known that the HW key "cache" gets destroyed from time to time when -group keys are modified. This was observed in an extreme way when having -two devices connected via 11s then switching to wds mode and after that -switching back to 11s again. - -It can also happen easily when more devices are in 11s and the keys are -refreshed. - -Signed-off-by: Sven Eckelmann - -diff --git a/package/kernel/mac80211/patches/ath/954-ath9k-Disable-HW-support-for-group-keys.patch b/package/kernel/mac80211/patches/ath/954-ath9k-Disable-HW-support-for-group-keys.patch -new file mode 100644 -index 0000000000000000000000000000000000000000..b75b859003c69520eaee52f94c027716086d30f1 ---- /dev/null -+++ b/package/kernel/mac80211/patches/ath/954-ath9k-Disable-HW-support-for-group-keys.patch -@@ -0,0 +1,29 @@ -+From: Sven Eckelmann -+Date: Fri, 7 Dec 2018 12:06:49 +0100 -+Subject: ath9k: Disable HW support for group keys -+ -+It is known that the HW key "cache" gets destroyed from time to time when -+group keys are modified. This was observed in an extreme way when having -+two devices connected via 11s then switching to wds mode and after that -+switching back to 11s again. -+ -+It can also happen easily when more devices are in 11s and the keys are -+refreshed. -+ -+diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c -+index f08e22d6ea17b2876e2e1ca102f0c083f78330dd..dc228325d1367bf3ab17723eac1548130d160a54 100644 -+--- a/drivers/net/wireless/ath/ath9k/main.c -++++ b/drivers/net/wireless/ath/ath9k/main.c -+@@ -1698,11 +1698,7 @@ static int ath9k_set_key(struct ieee80211_hw *hw, -+ if (ath9k_modparam_nohwcrypt) -+ return -ENOSPC; -+ -+- if ((vif->type == NL80211_IFTYPE_ADHOC || -+- vif->type == NL80211_IFTYPE_MESH_POINT) && -+- (key->cipher == WLAN_CIPHER_SUITE_TKIP || -+- key->cipher == WLAN_CIPHER_SUITE_CCMP) && -+- !(key->flags & IEEE80211_KEY_FLAG_PAIRWISE)) { -++ if (!(key->flags & IEEE80211_KEY_FLAG_PAIRWISE)) { -+ /* -+ * For now, disable hw crypto for the RSN IBSS group keys. This -+ * could be optimized in the future to use a modified key cache