Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Host Driver support for synopsys dwc2 #2870

Merged
merged 49 commits into from
Nov 8, 2024
Merged

Implement Host Driver support for synopsys dwc2 #2870

merged 49 commits into from
Nov 8, 2024

Conversation

hathach
Copy link
Owner

@hathach hathach commented Nov 8, 2024

Describe the PR
Add host driver for synopsys dwc2 (esp32-s2/s3/p4, stm32, etc..)

  • Support cpu slave and dma architecture
  • Support both OTG full speed and highspeed
  • Support Control/Bulk/Interrupt . ISO isn't tested
  • Support HS/FS/LS devices
  • Support hub with multiple devices
  • Support split control/bulk/interrupt i.e FS/LS device attached via highspeed hub
  • Following family is updated and test: stm32 f4/f7/h7, esp32p4
  • Add an new dwc2_common.c (required for both device/host)
  • API: adding new tusb_time_millis_api() and tusb_time_delay_ms_api() (default to blocking with millis_api()` which is reuired for no rtos when using with some port/configuration e.g hcd dwc2.
  • fix an duplicated attach issue which cause USBH Defer Attach until current enumeration complete message

Known issue: For esp32-p4, buffer DMA doees not work (use slave for now). Not entirely sure why, chip support desc dma, but we haven't enabled it yet. I notice hcddma address is increased corretly when sending setup, but data is random (not correct). fix or implement ddma later

image

check request queue available before making usb attempt. Though there is no handling when queue is full.
device_info example work well
…asier to manage. Fix channel disable/deallocated.
…orer example read work, but write10 still wip
src/portable/synopsys/dwc2/dcd_dwc2.c Dismissed Show dismissed Hide dismissed
src/portable/synopsys/dwc2/dcd_dwc2.c Fixed Show fixed Hide fixed
src/portable/synopsys/dwc2/dwc2_common.c Dismissed Show dismissed Hide dismissed
Comment on lines +235 to +246
// void dwc2_core_handle_common_irq(uint8_t rhport, bool in_isr) {
// (void) in_isr;
// dwc2_regs_t * const dwc2 = DWC2_REG(rhport);
// const uint32_t int_mask = dwc2->gintmsk;
// const uint32_t int_status = dwc2->gintsts & int_mask;
//
// // Device disconnect
// if (int_status & GINTSTS_DISCINT) {
// dwc2->gintsts = GINTSTS_DISCINT;
// }
//
// }

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
uint32_t speed : 2;
uint32_t next_pid : 2;
uint32_t do_ping : 1;
// uint32_t : 9;

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +821 to +825
// if (channel->hcsplt_bm.split_en) {
// if (edpt->hcchar_bm.ep_num == 1) {
// TU_LOG1("Frame %u, ch %u: ep %u, hcint 0x%04lX ", dwc2->hfnum_bm.num, ch_id, channel->hcchar_bm.ep_num, hcint);
// print_hcint(hcint);
// }

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.

bool is_done = false;

// TU_LOG1("in hcint = %02lX\r\n", hcint);

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.

bool is_done = false;

// TU_LOG1("out hcint = %02lX\r\n", hcint);

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
// Port enable
port0_enable(dwc2);
} else {
// TU_ASSERT(false, );

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
const uint32_t gintmsk = dwc2->gintmsk;
const uint32_t gintsts = dwc2->gintsts & gintmsk;

// TU_LOG1_HEX(gintsts);

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
@roma-jam
Copy link
Contributor

roma-jam commented Nov 8, 2024

@hathach

For esp32-p4, buffer DMA doees not work (use slave for now). Not entirely sure why

Hint: cache synchronization is absent.

@hathach
Copy link
Owner Author

hathach commented Nov 8, 2024

@hathach

For esp32-p4, buffer DMA doees not work (use slave for now). Not entirely sure why

Hint: cache synchronization is absent.

Thank you for the hint, I am a bit exausted with hcd dwc2 feature for now and not used with debugging esp32p4 (got openocd run, but one it hit breakpoint any futher steps cause it to panic), also printf in isr may also cause panic. I will call it a day for now, since it still function well with slave mode. I will come back to this later on.

@hathach hathach merged commit 9d86ca1 into master Nov 8, 2024
166 checks passed
@hathach hathach deleted the hcd-dwc2 branch November 8, 2024 17:10
@cleverca22
Copy link

of note, cache coherency is also a problem with my own dwc2 hcd on the rpi (need to test this PR out and see how it behaves)

in my case, i have cache management hooks in the HCD, but one of my buffers was stack allocated, and some activity on the stack caused it to get pulled back into the cache before a control-in could complete
so the ARM core saw a stale version in the cache, despite the HCD flushing the cache before the operation started

@HiFiPhile
Copy link
Collaborator

HiFiPhile commented Nov 10, 2024

@cleverca22 for ARM cores please use MPU to mark the buffer area as non cacheable instead of doing cache clean/invalidate, similar to what #2865 does.

PS: MCHP has nice write-up on cache https://ww1.microchip.com/downloads/en/DeviceDoc/Managing-Cache-Coherency-on-Cortex-M7-Based-MCUs-DS90003195A.pdf

@cleverca22
Copy link

cleverca22 commented Nov 11, 2024

@HiFiPhile in my case, i happen to be on cortex-a, so thats handled in the MMU

the problem is more, that tinyusb and other applications, are putting the DMA buffers in places like the stack or .bss
and you cant just permanently flag that whole page to never cache

so you need to get everything using tinyusb to allocate buffers from a fixed pool

edit:
ah yes, when searching for an example within tinyusb, i ran into CFG_TUH_MEM_SECTION
so if i define that to put things into a special pool, that will solve the tinyusb half of the buffers

@HiFiPhile
Copy link
Collaborator

the problem is more, that tinyusb and other applications, are putting the DMA buffers in places like the stack or .bss
and you cant just permanently flag that whole page to never cache

Doesn't using CFG_TUSB_MEM_SECTION help ?

@hathach
Copy link
Owner Author

hathach commented Nov 11, 2024

@HiFiPhile actually, we can also support dcache clean/invalidate as well, application only need to have the CFG_TUH_MEM_ALIGN to 64 for cache line. It can cost ram, but for some port such as broadcom bcm2711 (pi4) it does not matter. So yeah, I don't rule out option for dcache support. Though I haven't added dcahe for dwc2 yet (I think we did it for chipidea), will do later.

@HiFiPhile
Copy link
Collaborator

@HiFiPhile actually, we can also support dcache clean/invalidate as well, application only need to have the CFG_TUH_MEM_ALIGN to 64 for cache line. It can cost ram, but for some port such as broadcom bcm2711 (pi4) it does not matter. So yeah, I don't rule out option for dcache support. Though I haven't added dcahe for dwc2 yet (I think we did it for chipidea), will do later.

The buffer size also has to be multiple of cache line size, otherwise cached data will be crashed doing cache clean. There is an example in my linked document.

@hathach
Copy link
Owner Author

hathach commented Nov 11, 2024

yeah, if it is application buffer, it should also be cachline aligned. For us, we need to make sure any buffer for communicated must be CFG_TUH_MEM_ALIGN.

PS: I test this dcache later on. really exausted with hcd dwc2 for now .

@cleverca22
Copy link

something ive been thinking, is that a cache discard (no clean allowed) would also be of use

basically, you have buffers in 2 pools, cpu->device, and device->cpu

any time you write to a cpu->device buffer, you can do a cache clean, to flush the changes to dram, because the region is only ever written by the cpu, there is never dirty data at risk of being caught up, so you dont have to align by cache line size

any time the device is done doing dma and your informing tinyusb, you discard the cache for the entire area
because this second region is only ever written by the device, no dirty data is at risk of loss, your just causing a cache miss on the first hit of any buffers in the same line

but this idea now complicates things, because you need 2 versions of CFG_TUH_MEM_SECTION, to allocate things into different pools

@roma-jam
Copy link
Contributor

@hathach @HiFiPhile
Hi,
sorry for asking this in current PR, but is there any quick&easy way to make a speed measurement for cdc and ncm device classes?

@HiFiPhile
Copy link
Collaborator

@hathach @HiFiPhile Hi, sorry for asking this in current PR, but is there any quick&easy way to make a speed measurement for cdc and ncm device classes?

For CDC there is a script #920
For NCM it has a iperf2 server #2227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants