DRAM: more cold functions #9850

lyakh · 2025-02-21T15:42:51Z

Move all of IPC and some initialisation code to DRAM.

Move several initialisation functions to run from DRAM directly. Signed-off-by: Guennadi Liakhovetski <[email protected]>

Mark all IPC functions as "cold" to run them directly in DRAM. Signed-off-by: Guennadi Liakhovetski <[email protected]>

marcinszkudlinski · 2025-02-24T13:48:52Z

I understand why init functions should go to DRAM, but why IPC?

lyakh · 2025-02-24T14:46:09Z

I understand why init functions should go to DRAM, but why IPC?

@marcinszkudlinski the idea is that only audio protocols are "hot" - only schedulers and audio processing threads. Everything else can be "cold" and IPC processing is one of such large code areas. But if you have concerns that this can break something, let's discuss, maybe we're overlooking some use-cases?

marcinszkudlinski · 2025-02-24T14:57:37Z

@lyakh not really
we're already facing problems with performance - when starting multiple sophisticated pipelines it happens that some of LL cycles are lost - because of long operations like "prepare" for each component
We need to be careful what goes to DRAM, it is slower, and worse, the access time is not guaranteed - as the physical memory is shared with linux/windows/chrome and our requests go last.

I think - as long as we do have enough HPSRAM, use it.

abonislawski · 2025-02-24T15:10:03Z

IPC part looks really suspicious, do you have any data what is the profit and perf drop? Especially when main CPU is under high load and we will lag more with DRAM access

lgirdwood · 2025-02-24T17:17:46Z

HPSRAM is precious, agree need to be really careful what we put in DRAM it should only be parts of IPC that are not time critical. i.e. trigger is time critical, but load module is not time critical. We need to find this balance, Linux only really cares about prepare()/trigger() driver ops and any associated IPCs. Don't know about Windows ?

lgirdwood

Some functions are really obvious pipeline construction/free APIs, but some utility APIs could be used in the stream triggering flow. Best to check.

lgirdwood · 2025-02-24T17:19:28Z

src/ipc/ipc-helper.c

@@ -197,7 +198,7 @@ int comp_buffer_connect(struct comp_dev *comp, uint32_t comp_core,
 	return pipeline_connect(comp, buffer, dir);
 }

-int ipc_pipeline_complete(struct ipc *ipc, uint32_t comp_id)
+__cold int ipc_pipeline_complete(struct ipc *ipc, uint32_t comp_id)


Can you check this, not sure if done in prepare() ? maybe for IPC3 only ?

@lgirdwood this is only called from

ipc4_pipeline_complete() ipc4_pipeline_prepare() ipc4_set_pipeline_state() idc_ppl_state() ipc4_process_glb_message() idc_cmd() ipc_cmd() idc_handler() ipc_platform_do_cmd() idc_ipc() P4WQ ipc_do_cmd() idc_cmd() EDF scheduler idc_handler() P4WQ

so, it's only called from the EDF scheduler or from IDC P4WQ, both of which use the EDF_ZEPHYR_PRIORITY priority (currently 1)

ok, so are you confirming its not called as part of a LL or DP process() ? I assume its only used in EDF for non process() usage.

@lgirdwood correct, as it stands there don't seem to be any paths leading to this function being called from audio processing context

lyakh · 2025-02-25T07:37:40Z

@lgirdwood @marcinszkudlinski @abonislawski as far as I understand the worst would be cases when we're running close to 100% performance capacity and at that moment the user is issuing some IPCs - maybe to start an additional light stream. In principle we still have a couple of free DSP cycles to run an additional stream, but while preparing it, IPC processing adds significant DSP load. So, if we process IPCs in DRAM, that processing becomes slower. As long as we don't disable interrupts during IPC processing for too long, we still shouldn't disturb higher priority audio processing, running in parallel, but IPC response time will become longer. Is that what we're worried about? Is that important? Replying to @marcinszkudlinski - do we really lose LL cycles because of IPC processing? That shouldn't happen AFAICS? If we have code, locking interrupts, we have to identify and improve it...

lgirdwood · 2025-02-25T14:14:10Z

Replying to @marcinszkudlinski - do we really lose LL cycles because of IPC processing? That shouldn't happen AFAICS? If we have code, locking interrupts, we have to identify and improve it...

We don't lose LL cycles since LL preempts low priority workloads/threads (even if workload TEXT is in DRAM, stack/heap will be SRAM). @jsarha can you share some data soon. Thanks

kv2019i

Hmm, @lrgirdwo you mention in comments that " trigger is time critical, but load module is not time critical". The current PR doesn't seem to make any provision to keep trigger related code in hot memory. Not sure how to review this, is this intentional or not?

kv2019i · 2025-02-27T12:24:17Z

src/ipc/ipc4/handler.c

@@ -404,7 +405,7 @@ int ipc4_pipeline_prepare(struct ipc_comp_dev *ppl_icd, uint32_t cmd)
 	return ret;
 }

-int ipc4_pipeline_trigger(struct ipc_comp_dev *ppl_icd, uint32_t cmd, bool *delayed)
+__cold int ipc4_pipeline_trigger(struct ipc_comp_dev *ppl_icd, uint32_t cmd, bool *delayed)


Wasn't trigger ops supposed to be kept on the warm path?

kv2019i · 2025-02-27T12:24:56Z

src/ipc/ipc4/handler.c

@@ -496,15 +497,15 @@ int ipc4_pipeline_trigger(struct ipc_comp_dev *ppl_icd, uint32_t cmd, bool *dela
 	return ret;
 }

-static void ipc_compound_pre_start(int msg_id)
+__cold static void ipc_compound_pre_start(int msg_id)


Part of the trigger/start/stop set...?

kv2019i · 2025-02-27T12:25:02Z

src/ipc/ipc4/handler.c

 {
 	/* ipc thread will wait for all scheduled tasks to be complete
 	 * Use a reference count to check status of these tasks.
 	 */
 	atomic_add(&msg_data.delayed_reply, 1);
 }

-static void ipc_compound_post_start(uint32_t msg_id, int ret, bool delayed)
+__cold static void ipc_compound_post_start(uint32_t msg_id, int ret, bool delayed)


Part of the trigger/start/stop set...?

kv2019i · 2025-02-27T12:28:16Z

src/ipc/ipc4/handler.c

 {
 	struct ipc4_message_request in;

 	in.primary.dat = msg_data.msg_in.pri;
 	ipc_compound_msg_done(in.primary.r.type, reply->error);
 }

-void ipc_cmd(struct ipc_cmd_hdr *_hdr)
+__cold void ipc_cmd(struct ipc_cmd_hdr *_hdr)


If we want to separate trigger/start from less timing critical IPCs, then we need to keep this top-level ipc_cmd as warm.

jsarha · 2025-02-27T14:30:35Z

Replying to @marcinszkudlinski - do we really lose LL cycles because of IPC processing? That shouldn't happen AFAICS? If we have code, locking interrupts, we have to identify and improve it...

We don't lose LL cycles since LL preempts low priority workloads/threads (even if workload TEXT is in DRAM, stack/heap will be SRAM). @jsarha can you share some data soon. Thanks

There is indeed some impact to MCPS at least in 44.1kHz playback trough SRC. SRC playback was chosen because its readily available on nocodec topology and SRC has a lot of __cold tagged functions in its configuration code. In addition to this PR I also merged #9844 on top of it. The test is a 5min 44.1kHz playback using the branch built with xcc using both CONFIG_COLD_STORE_EXECUTE_DRAM=n and y. It was run on LNL RVP using nocodec topology. The original mtrace files are here:
testb-dram-y-hw02-300s-mtrace.log
testb-dram-n-hw02-300s-mtrace.log

lgirdwood · 2025-02-27T17:13:52Z

There is indeed some impact to MCPS at least in 44.1kHz playback trough SRC. SRC playback was chosen because its readily available on nocodec topology and SRC has a lot of __cold tagged functions in its configuration code. In addition to this PR I also merged #9844 on top of it. The test is a 5min 44.1kHz playback using the branch built with xcc using both CONFIG_COLD_STORE_EXECUTE_DRAM=n and y. It was run on LNL RVP using nocodec topology. The original mtrace files are here:
testb-dram-y-hw02-300s-mtrace.log
testb-dram-n-hw02-300s-mtrace.log

Thanks @jsarha - there is a 20kcps delta with DRAM=y and this PR on LNL. I think the Peaks are related to L1 exit work, I think the 20kcps is due the the relocatable code used for llext. @lyakh do you concur ?
@jsarha btw - can you upstream the script that scrapes the logs and produces the plots :)

lyakh added 2 commits February 21, 2025 16:41

init: move several functions to DRAM

c7f6a7a

Move several initialisation functions to run from DRAM directly. Signed-off-by: Guennadi Liakhovetski <[email protected]>

ipc: move all functions to run from DRAM

61fe5a7

Mark all IPC functions as "cold" to run them directly in DRAM. Signed-off-by: Guennadi Liakhovetski <[email protected]>

lyakh requested review from bardliao, marcinszkudlinski, pblaszko, lgirdwood, plbossart, mmaka1, lbetlej, dbaluta and kv2019i as code owners February 21, 2025 15:42

lgirdwood reviewed Feb 24, 2025

View reviewed changes

lgirdwood approved these changes Feb 27, 2025

View reviewed changes

kv2019i reviewed Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAM: more cold functions #9850

DRAM: more cold functions #9850

lyakh commented Feb 21, 2025

marcinszkudlinski commented Feb 24, 2025

lyakh commented Feb 24, 2025

marcinszkudlinski commented Feb 24, 2025

abonislawski commented Feb 24, 2025

lgirdwood commented Feb 24, 2025

lgirdwood left a comment

lgirdwood Feb 24, 2025

lyakh Feb 25, 2025

lgirdwood Feb 26, 2025

lyakh Feb 27, 2025

lyakh commented Feb 25, 2025

lgirdwood commented Feb 25, 2025

kv2019i left a comment

kv2019i Feb 27, 2025

kv2019i Feb 27, 2025

kv2019i Feb 27, 2025

kv2019i Feb 27, 2025

jsarha commented Feb 27, 2025 •

edited

Loading

lgirdwood commented Feb 27, 2025

DRAM: more cold functions #9850

Are you sure you want to change the base?

DRAM: more cold functions #9850

Conversation

lyakh commented Feb 21, 2025

marcinszkudlinski commented Feb 24, 2025

lyakh commented Feb 24, 2025

marcinszkudlinski commented Feb 24, 2025

abonislawski commented Feb 24, 2025

lgirdwood commented Feb 24, 2025

lgirdwood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lyakh commented Feb 25, 2025

lgirdwood commented Feb 25, 2025

kv2019i left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsarha commented Feb 27, 2025 • edited Loading

lgirdwood commented Feb 27, 2025

jsarha commented Feb 27, 2025 •

edited

Loading