π 10 Dec 2023
Last week we walked through the Serial Console for Pine64 Ox64 BL808 64-bit RISC-V Single-Board Computer (pic below)...
And we hit some illogical impossible problems on Apache NuttX RTOS (Real-Time Operating System)...
-
Console Input is always empty
(Can't enter any Console Commands)
-
Interrupt Claim is forever 0
(Ox64 won't tell us which Interrupt was fired!)
-
Leaky Writes are mushing up adjacent Interrupt Registers
(Or maybe Leaky Reads?)
Today we discover the One Single Culprit behind all this rowdy mischief...
Weak Ordering in the MMU! (Memory Management Unit)
Here's how we solved the baffling mystery...
Sorry TLDR: What's this PLIC? What's Serial Console gotta do with it?
Platform-Level Interrupt Controller (PLIC) is the hardware inside our SBC that controls the forwarding of Peripheral Interrupts to our 64-bit RISC-V CPU.
(Like Interrupts for UART, I2C, SPI, ...)
Why should we bother with PLIC?
Suppose we're typing something in the Serial Console on Ox64 SBC...
-
Every single key that we press...
(Pic above)
-
Is received by the UART Controller in our RISC-V SoC...
(Bouffalo Lab BL808 SoC)
-
Which fires an Interrupt through the PLIC to our RISC-V CPU
(T-Head C906 RISC-V Core)
Without the PLIC, it's impossible to enter commands in the Serial Console!
Tell me more...
Let's run through the steps to handle a UART Interrupt on a RISC-V SBC...
-
At Startup: We set Interrupt Priority to 1.
(Lowest Priority)
-
And Interrupt Threshold to 0.
(Allow all Interrupts to fire later)
-
We flip Bit 20 of Interrupt Enable Register to 1.
(To enable RISC-V IRQ 20 for UART3)
-
Suppose we press a key on the Serial Console...
Our UART Controller will fire an Interrupt for IRQ 20.
(IRQ means Interrupt Request Number)
-
Our Interrupt Handler will read the Interrupt Number (20) from the Interrupt Claim Register...
Call the UART Driver to read the keypress...
Then write the Interrupt Number (20) back into the same old Interrupt Claim Register...
Which will Complete the Interrupt.
-
Non-Essential But Useful: Interrupt Pending Register says which Interrupts are awaiting Claiming and Completion.
(We'll use it for troubleshooting)
That's the Textbook Recipe for PLIC, according to the Official RISC-V PLIC Spec. (If Julia Child wrote a PLIC Textbook)
But it doesn't work on Ox64 BL808 SBC and T-Head C906 Core...
What happens when we run the PLIC Recipe on Ox64?
Absolute Disaster! (Pic above)
-
Interrupt Priorities get mushed into 0
(Instead of 1)
-
When we set the Interrupt Enable Register...
The value leaks over into the next 32-bit word
(Hence the "Leaky Write")
-
Interrupt Claim Register is always 0
(Can't read the Actual Interrupt Number!)
-
Our UART Driver says that the UART Input is Empty
(We verified the UART Registers)
Our troubles are all Seemingly Unrelated. However there's actually only One Sinister Culprit causing all these headaches...
BL808 UART Receive Status (Page 405)
How to track down the culprit?
We begin with the simplest bug: UART Input is always Empty.
In our UART Driver, this is how we read the UART Input: bl808_serial.c
// Receive one character from the UART Port.
// Called (indirectly) by the UART Interrupt Handler: __uart_interrupt
int bl808_receive(...) {
...
// If there's Pending UART Input...
// (FIFO_CONFIG_1 is 0x30002084)
if (getreg32(BL808_UART_FIFO_CONFIG_1(uart_idx)) & UART_FIFO_CONFIG_1_RX_CNT_MASK) {
// Then read the Actual UART Input
// (FIFO_RDATA is 0x3000208c)
rxdata = getreg32(BL808_UART_FIFO_RDATA(uart_idx)) & UART_FIFO_RDATA_MASK;
Which says that we...
-
Check if there's any Pending UART Input...
(At address
0x3000_2084
) -
Before reading the Actual UART Input
(At address
0x3000_208C
)
Or simply...
// Check for Pending UART Input
uintptr_t pending = getreg32(0x30002084);
// Read the Actual UART Input
uintptr_t rx = getreg32(0x3000208c);
// Dump the values
_info("pending=%p, rx=%p\n", pending, rx);
What happens when we run this?
Something strange happens...
// Yep there's Pending UART Input...
pending=0x7070120
// But Actual UART Input is empty!
rx=0
UART Controller says there's UART Input to be read... And it's totally empty!
How is that possible?
The only logical explanation: Someone has already read the UART Input!
UART Input gets Auto-Reset to 0, right after it's read. Someone must have read it, unintentionally.
Hmmm this sounds like a Leaky Read...
Exactly! (Pic below)
-
When we check if there's any Pending UART Input...
(At address
0x3000_2084
) -
It causes the neighbouring Actual UART Input to be read unintentionally...
(At address
0x3000_208C
) -
Which auto-erases the Actual UART Input...
Before we actually read it!
Yep indeed we have Leaky Read + Leaky Write that are causing all our UART + PLIC woes.
Things are looking mighty illogical and incoherent. Why oh why?
But Linux runs OK on Ox64 BL808...
Something special about Linux on T-Head C906?
We search for "T-Head" in the Linux Kernel Repo. And we see this vital clue: errata_list.h
// T-Head Errata for Linux
#ifdef CONFIG_ERRATA_THEAD_PBMT
// IO/NOCACHE memory types are handled together with svpbmt,
// so on T-Head chips, check if no other memory type is set,
// and set the non-0 PMA type if applicable.
...
asm volatile(... _PAGE_MTMASK_THEAD ...)
(Svpbmt Extension defines Page-Based Memory Types)
Aha! A Linux Errata for T-Head CPU!
We track down PAGE_MTMASK_THEAD: pgtable-64.h
// T-Head Memory Type Definitions in Linux
#define _PAGE_PMA_THEAD ((1UL << 62) | (1UL << 61) | (1UL << 60))
#define _PAGE_NOCACHE_THEAD ((1UL < 61) | (1UL << 60))
#define _PAGE_IO_THEAD ((1UL << 63) | (1UL << 60))
#define _PAGE_MTMASK_THEAD (_PAGE_PMA_THEAD | _PAGE_IO_THEAD | (1UL << 59))
Which is annotated with...
[63:59] T-Head Memory Type definitions:
Bit[63] SO - Strong Order
Bit[62] C - Cacheable
Bit[61] B - Bufferable
Bit[60] SH - Shareable
Bit[59] Sec - Trustable
00110 - NC: Weakly-Ordered, Non-Cacheable, Bufferable, Shareable, Non-Trustable
01110 - PMA: Weakly-Ordered, Cacheable, Bufferable, Shareable, Non-Trustable
10010 - IO: Strongly-Ordered, Non-Cacheable, Non-Bufferable, Shareable, Non-Trustable
Something sus about I/O Memory?
The last line suggests we should configure the T-Head Memory Type specifically to support I/O Memory: PAGE_IO_THEAD
Memory Attribute | Page Table Entry |
---|---|
Strongly-Ordered | Bit 63 is 1 |
Non-Cacheable | Bit 62 is 0 (Default) |
Non-Bufferable | Bit 61 is 0 (Default) |
Shareable | Bit 60 is 1 |
Non-Trustable | Bit 59 is 0 (Default) |
With the above evidence, we deduce that "Strong Order" is the Magical Bit that we need for UART and PLIC!
What's "Strong Order"?
"Strong Order" means "All Reads and All Writes are In-Order".
Apparently T-Head C906 will (by default) Disable Strong Order and read / write memory Out-of-Sequence. (So that it performs better)
Which will surely mess up our UART and PLIC Registers!
They should've warned us about Strong Order and I/O Memory!
Ahem they did...
"A Device Driver written to rely on I/O Strong Ordering rules will not operate correctly if the Address Range is mapped with PBMT=NC [Weakly Ordered]"
"As such, this configuration is discouraged"
Though that warning comes from the New Svpbmt Extension. Which isn't supported by T-Head C906.
(Svpbmt Bits 6162 will conflict with T-Head Bits 5963. Oh boy)
How to enable Strong Order?
We do it in the T-Head C906 MMU...
(Strong Order appears briefly in C906 User Manual, Pages 24 & 53)
(What's "Shareable"? It's not documented)
UPDATE: Shareable might support Strong Ordering across Multiple Cores
Level 1 Page Table for Ox64 MMU
Wow the soup gets too salty. What's MMU?
Memory Management Unit (MMU) is the hardware inside our SBC that does...
-
Memory Protection: Prevent Applications (and Kernel) from meddling with things (in System Memory) that they're not supposed to
-
Virtual Memory: Allow Applications to access chunks of "Imaginary Memory" at Exotic Addresses (
0x8000_0000
!)But in reality: They're System RAM recycled from boring old addresses (like
0x5060_4000
)(Kinda like "The Matrix")
For Ox64: We switched on the MMU to protect the Kernel Memory from the Apps. And to protect the Apps from each other.
How does it work?
The pic above shows the Level 1 Page Table that we configured for our MMU. The Page Table has a Page Table Entry that says...
-
V: It's a Valid Page Table Entry
-
G: It's a Global Mapping
-
R: Allow Kernel Reads for
0x0
to0x3FFF_FFFF
-
W: Allow Kernel Writes for
0x0
to0x3FFF_FFFF
(Including the UART Registers at
0x3000_2000
)
What about PAGE_IO_THEAD and Strong Order?
Memory Attribute | Page Table Entry |
---|---|
SO: Strongly-Ordered | Bit 63 is 1 |
SH: Shareable | Bit 60 is 1 |
We'll set the SO and SH Bits in our Page Table Entries. Hopefully UART and PLIC won't get mushed up no more...
We need to set the Strong Order Bit...
How will we enable it in our Page Table Entry?
Memory Attribute | Page Table Entry |
---|---|
SO: Strongly-Ordered | Bit 63 is 1 |
SH: Shareable | Bit 60 is 1 |
For testing, we patched our MMU Code to set the Strong Order Bit in our Page Table Entries (pic above): riscv_mmu.c
// Set a Page Table Entry in a Page Table for the MMU
void mmu_ln_setentry(
uint32_t ptlevel, // Level of Page Table: 1, 2 or 3
uintptr_t lntable, // Page Table Address
uintptr_t paddr, // Physical Address
uintptr_t vaddr, // Virtual Address (For Kernel: Same as Physical Address)
uint32_t mmuflags // MMU Flags (V / G / R / W)
) {
...
// Set the Page Table Entry:
// Physical Page Number and MMU Flags (V / G / R / W)
lntable[index] = (paddr | mmuflags);
// Now we set the T-Head Memory Type in Bits 59 to 63.
// For I/O and PLIC Memory, we set...
// SO (Bit 63): Strong Order
// SH (Bit 60): Shareable
#define _PAGE_IO_THEAD ((1UL << 63) | (1UL << 60))
// If this is a Leaf Page Table Entry
// for I/O Memory or PLIC Memory...
if ((mmuflags & PTE_R) && // Leaf Page Table Entry
(vaddr < 0x40000000UL || // I/O Memory
vaddr >= 0xe0000000UL)) { // PLIC Memory
// Then set the Strong Order
// and Shareable Bits
lntable[index] = lntable[index]
| _PAGE_IO_THEAD;
}
The code above will set the Strong Order and Shareable Bits for...
-
I/O Memory:
0x0
to0x3FFF_FFFF
(Including the UART Registers at
0x3000_2000
) -
PLIC Memory:
0xE000_0000
to0xEFFF_FFFF
map I/O regions
vaddr=0, lntable[index]=0x90000000000000e7
// "0x9000..." means Strong Order (Bit 63) and Shareable (Bit 60) are set
map PLIC as Interrupt L2
vaddr=0xe0000000, lntable[index]=0x90000000380000e7
vaddr=0xe0200000, lntable[index]=0x90000000380800e7
vaddr=0xe0400000, lntable[index]=0x90000000381000e7
vaddr=0xe0600000, lntable[index]=0x90000000381800e7
...
vaddr=0xefc00000, lntable[index]=0x900000003bf000e7
vaddr=0xefe00000, lntable[index]=0x900000003bf800e7
// "0x9000..." means Strong Order (Bit 63) and Shareable (Bit 60) are set
If we don't specify MMU Caching for T-Head C906... Is MMU Caching enabled by default?
Nope, we need to explicitly enable MMU Caching ourselves! Otherwise Memory Accesses (Kernel and Apps) will become really slooooow...
We test our patched code...
NOTE: T-Head MMU Flags (Strong Order / Shareable) are available only if OpenSBI has set the MAEE Bit in the MXSTATUS Register to 1. Otherwise the MMU will crash when we set the flags!
UPDATE: NuttX Mainline now supports T-Head C906 Memory Types
(Shareable Bit doesn't effect anything. We're keeping it to be consistent with Linux)
What happens when we run our patched MMU code?
Our UART and PLIC Troubles are finally over!
-
Interrupt Priorities are set correctly to 1
PLIC Interrupt Priority: After (0xe0000004): 0000 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00 ................ 0010 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00 ................ 0020 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00 ................
-
Interrupt Enable doesn't leak to the next word
PLIC Hart 0 S-Mode Interrupt Enable (0xe0002080): 0000 00 00 10 00 00 00 00 00 ........
-
Interrupt Claim returns the correct Interrupt Number
riscv_dispatch_irq: claim=0x14
-
Our UART Driver returns the correct UART Input
bl808_receive: rxdata=0x31
Is NuttX usable on Ox64?
Yep! NuttX RTOS on Ox64 now boots OK to the NuttX Shell (NSH).
And happily accepts commands through the Serial Console yay! (Pic above)
NuttShell (NSH) NuttX-12.0.3
nsh> uname -a
NuttX 12.0.3 fd05b07 Nov 24 2023 07:42:54 risc-v star64
nsh> ls /dev
/dev:
console
null
ram0
zero
nsh> hello
Hello, World!!
Phew that was some quick intense debugging...
Yeah we're really fortunate to get NuttX RTOS running OK on Ox64. Couple of things that might have helped...
-
Write up Everything about our troubles
(And share them publicly)
-
(They might inspire the solution!)
-
Re-Read and Re-Think everything we wrote
(Challenge all our Assumptions)
-
Head to the Beach. Have a Picnic.
(Never know when the solution might pop up!)
-
Sounds like an Agatha Christie Mystery...
But sometimes it's indeed One Single Culprit (Weak Ordering) behind all the Seemingly Unrelated Problems!
Will NuttX officially support Ox64?
We plan to...
-
Take a brief break from writing
(No new article next week)
-
Clean up our code
(Rename the JH7110 things to BL808)
-
Upstream our code to NuttX Mainline
(Delicate Regression Operation because we're adding MMU Flags)
And Apache NuttX RTOS shall officially support Ox64 BL808 SBC real soon!
UPDATE: NuttX officially supports Ox64 BL808 SBC!
Are we hunky dory with Ox64 BL808 and T-Head C906?
We said this last time...
"If RISC-V ain't RISC-V on SiFive vs T-Head: We'll find out!"
As of Today: Yep RISC-V is indeed RISC-V on SiFive vs T-Head... Just beware of C906 MMU, C906 PLIC and T-Head Errata!
(New T-Head Cores will probably migrate to Svpbmt Extension)
Thank you so much for reading my adventures of NuttX on Ox64... You're my inspiration for solving this sticky mystery! π
-
Previously the Console Input was always empty
(Couldn't enter any Console Commands)
-
And Interrupt Claim wasn't working correctly
(Ox64 wouldn't say which Interrupt was fired)
-
Because Leaky Reads and Writes were contaminating our UART and PLIC Registers
(Something was doing phantom reads and writes)
-
But when we Enabled Strong Ordering in the T-Head C906 MMU...
(Memory Management Unit)
-
Everything becomes OK
(No more worries!)
Apache NuttX RTOS for Ox64 BL808 shall be Upstreamed to Mainline real soon. Stay tuned for updates!
UPDATE: NuttX officially supports Ox64 BL808 SBC!
Many Thanks to my GitHub Sponsors (and the awesome NuttX Community) for supporting my work! This article wouldn't have been possible without your support.
Got a question, comment or suggestion? Create an Issue or submit a Pull Request here...
lupyuen.github.io/src/plic3.md
If we don't specify MMU Caching for T-Head C906... Is MMU Caching enabled by default?
Nope, we need to explicitly enable MMU Caching ourselves! Otherwise Memory Accesses (Kernel and Apps) will become really slooooow.
According to Linux Kernel, this is how we define the Cache Flags for T-Head C906: bl808_mm_init.c
// T-Head C906 MMU Extensions
#define MMU_THEAD_SHAREABLE (1ul << 60)
#define MMU_THEAD_BUFFERABLE (1ul << 61)
#define MMU_THEAD_CACHEABLE (1ul << 62)
// T-Head C906 MMU requires Kernel Memory
// to be explicitly cached with these flags
#define MMU_THEAD_PMA_FLAGS \
(MMU_THEAD_SHAREABLE | \
MMU_THEAD_BUFFERABLE | \
MMU_THEAD_CACHEABLE)
Then we cache the Kernel Text, Data and Heap, by passing MMU_THEAD_PMA_FLAGS: bl808_mm_init.c
// Cache the Kernel Text, Data and Page Pool
map_region(KFLASH_START, KFLASH_START, KFLASH_SIZE,
MMU_KTEXT_FLAGS | MMU_THEAD_PMA_FLAGS);
map_region(KSRAM_START, KSRAM_START, KSRAM_SIZE,
MMU_KDATA_FLAGS | MMU_THEAD_PMA_FLAGS);
mmu_ln_map_region(2, PGT_L2_VBASE, PGPOOL_START, PGPOOL_START, PGPOOL_SIZE,
MMU_KDATA_FLAGS | MMU_THEAD_PMA_FLAGS);
(See the Pull Request for Ox64 and SG2000)
What about User Text and Data? For NuttX Apps?
Yep they need to be explicitly cached too!
This is how we cache the User Text and Data, by setting the Extra MMU Flags: arch/risc-v/src/common/riscv_mmu.h
// T-Head MMU needs Text and Data to be Shareable, Bufferable, Cacheable
#ifdef CONFIG_ARCH_MMU_EXT_THEAD
# define PTE_SEC (1UL << 59) /* Security */
# define PTE_SHARE (1UL << 60) /* Shareable */
# define PTE_BUF (1UL << 61) /* Bufferable */
# define PTE_CACHE (1UL << 62) /* Cacheable */
# define PTE_SO (1UL << 63) /* Strong Order */
# define EXT_UTEXT_FLAGS (PTE_SHARE | PTE_BUF | PTE_CACHE)
# define EXT_UDATA_FLAGS (PTE_SHARE | PTE_BUF | PTE_CACHE)
#else
# define EXT_UTEXT_FLAGS (0)
# define EXT_UDATA_FLAGS (0)
#endif
// Flags for user FLASH (RX) and user RAM (RW)
#define MMU_UTEXT_FLAGS (PTE_R | PTE_X | PTE_U | EXT_UTEXT_FLAGS)
#define MMU_UDATA_FLAGS (PTE_R | PTE_W | PTE_U | EXT_UDATA_FLAGS)
Then we enable ARCH_MMU_EXT_THEAD for SG2000 and BL808: arch/risc-v/Kconfig
config ARCH_CHIP_SG2000
select ARCH_MMU_TYPE_SV39
select ARCH_MMU_EXT_THEAD
...
config ARCH_CHIP_BL808
select ARCH_MMU_TYPE_SV39
select ARCH_MMU_EXT_THEAD
(See the Pull Request for SG2000)
(See the Pull Request for BL808)
Does MMU Caching affect NuttX Performance?
Really it does!
-
SG2000 CoreMark without MMU Caching:
21
-
SG2000 CoreMark with MMU Caching:
2,422
Will we have issues with MMU Flags: T-Head vs Svpbmt?
Well eventually we need to handle (non-standard) T-Head MMU Flags and (standard) Svpbmt MMU Flags. According to Linux Kernel...
-
T-Head MMU Flags are in Bits 59 to 63 (upper 5 bits)
-
Svpbmt MMU Flags are in Bits 61 and 62 (upper 3 bits)
T-Head and Svpbmt disagree on the MMU Bits. (And we may have more MMU Bits in future)
Thankfully Svpbmt already caches by default (because PMA=0). So we can ignore Svpbmt for now.
In this article, we ran a Work-In-Progress Version of Apache NuttX RTOS for Ox64, with PLIC and Console Input working OK.
This is how we download and build NuttX for Ox64 BL808 SBC...
## Download the WIP NuttX Source Code
git clone \
--branch ox64c \
https://github.com/lupyuen2/wip-nuttx \
nuttx
git clone \
--branch ox64c \
https://github.com/lupyuen2/wip-nuttx-apps \
apps
## Build NuttX
cd nuttx
tools/configure.sh star64:nsh
make
## Export the NuttX Kernel
## to `nuttx.bin`
riscv64-unknown-elf-objcopy \
-O binary \
nuttx \
nuttx.bin
## Dump the disassembly to nuttx.S
riscv64-unknown-elf-objdump \
--syms --source --reloc --demangle --line-numbers --wide \
--debugging \
nuttx \
>nuttx.S \
2>&1
(Remember to install the Build Prerequisites and Toolchain)
Then we build the Initial RAM Disk that contains NuttX Shell and NuttX Apps...
## Build the Apps Filesystem
make -j 8 export
pushd ../apps
./tools/mkimport.sh -z -x ../nuttx/nuttx-export-*.tar.gz
make -j 8 import
popd
## Generate the Initial RAM Disk `initrd`
## in ROMFS Filesystem Format
## from the Apps Filesystem `../apps/bin`
## and label it `NuttXBootVol`
genromfs \
-f initrd \
-d ../apps/bin \
-V "NuttXBootVol"
## Prepare a Padding with 64 KB of zeroes
head -c 65536 /dev/zero >/tmp/nuttx.pad
## Append Padding and Initial RAM Disk to NuttX Kernel
cat nuttx.bin /tmp/nuttx.pad initrd \
>Image
Next we prepare a Linux microSD for Ox64 as described in the previous article.
(Remember to flash OpenSBI and U-Boot Bootloader)
Then we do the Linux-To-NuttX Switcheroo: Overwrite the microSD Linux Image by the NuttX Kernel...
## Overwrite the Linux Image
## on Ox64 microSD
cp Image \
"/Volumes/NO NAME/Image"
diskutil unmountDisk /dev/disk2
Insert the microSD into Ox64 and power up Ox64.
Ox64 boots OpenSBI, which starts U-Boot Bootloader, which starts NuttX Kernel and the NuttX Shell (NSH).
NuttX Commands will run OK in NuttX Shell. (Pic above)
Quick dip in the sea + Picnic on the beach... Really helps with NuttX + Ox64 troubleshooting! π