Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ershephe/update tuning error handling 6.6 nr #190

Draft
wants to merge 4 commits into
base: nilrt/master/6.6
Choose a base branch
from

Conversation

erickshepherdNI
Copy link
Contributor

@erickshepherdNI erickshepherdNI commented Jan 21, 2025

Fixes cases in the MMC driver where an SD card incapable of tuning is requested to tune more than once, which would result in I/O errors. This required a change in sdhci_execute_tuning() to return the error code returned by __sdhci_execute_tuning(), which disables any further tuning. I believe this error not being returned is an oversight since the error from platform_execute_tuning() in the same function is returned this way and the calling function, mmc_sd_init_uhs_card(), has an if statement handling the returned error for the DDR50 card we are using.

The new field in the mmc_host struct is used to prevent a card incapable of tuning from being tuned during reinitialization. This prevents I/O errors for DDR50 cards that are reset for any reason. Currently this field is only set for DDR50 cards that fail to tune to limit the impact of these changes.

Commits to revert our fix for tuning error interrupts and reapply the upstream fix are also included.

AB#2805437

Testing:

I tested these changes on a cRIO 9053 with a DDR50 SD card by building the linux kernel locally and applying it to the device. I confirmed that no I/O errors were thrown, even upon a hardware reset of the card, and that the card was properly mounted and could be written to.

Fixes I/O errors when using a DDR50 SD card that does not support tuning.

@erickshepherdNI erickshepherdNI force-pushed the ershephe/update-tuning-error-handling-6.6-nr branch from a745fe4 to a99a2bc Compare January 23, 2025 21:14
Copy link
Contributor

@chaitu236 chaitu236 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the patch would be submitted to upstream master, more specifically the development tree of whoever maintains these patches this part of codebase, did you check if these changes apply cleanly on that tree and build?

drivers/mmc/core/sd.c Outdated Show resolved Hide resolved
@@ -681,6 +682,7 @@ static int mmc_sd_init_uhs_card(struct mmc_card *card)
if (err && card->host->ios.timing == MMC_TIMING_UHS_DDR50) {
pr_warn("%s: ddr50 tuning failed\n",
mmc_hostname(card->host));
card->host->never_retune = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check that the only reason mmc_execute_tuning can fail is when the card doesn't support tuning?
Or can there be a situation where a card is capable of tuning but fails tuning for another reason or in just one instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmc_execute_tuning has two error codes it can return. One is a timeout code and the other is a try again code. I'm not sure if the only way the tuning can time out is if it is unsupported but that seems to be the assumption in place for this if-statement since it was added to handle the case when tuning is not supported on some DDR50 cards by clearing the error.

I could update this to only set the never_retune field in the case of a timeout error code. It may even be correct to apply that to the whole if-statement since I'm not sure if we should be clearing a try again code. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it was added to handle the case when tuning is not supported on some DDR50 cards by clearing the error.

Do we know this for certain? Couldn't it also be because the tuning may sometimes fail and retrying would fix it?
I guess even if we're not sure about this, we could send this upstream and see if we get any pushback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could also be the case. Unfortunately the commit that added the if-statement, as well as tuning for DDR50 cards, doesn't specify it's exact purpose and the comment above it doesn't either. This is the change I'm most unsure about but I think sending it upstream and including the people who added the commits that enabled DDR50 tuning will provide more clarity.

@erickshepherdNI erickshepherdNI force-pushed the ershephe/update-tuning-error-handling-6.6-nr branch from a99a2bc to 9c4ebaa Compare January 27, 2025 19:14
@erickshepherdNI
Copy link
Contributor Author

Since the patch would be submitted to upstream master, more specifically the development tree of whoever maintains these patches this part of codebase, did you check if these changes apply cleanly on that tree and build?

The changes do apply cleanly and I was able to build the upstream kernel with them. I've attached the patch files I tested to the original user story and can get them sent upstream today.

@chaitu236
Copy link
Contributor

Since the patch would be submitted to upstream master, more specifically the development tree of whoever maintains these patches this part of codebase, did you check if these changes apply cleanly on that tree and build?

The changes do apply cleanly and I was able to build the upstream kernel with them. I've attached the patch files I tested to the original user story and can get them sent upstream today.

GitHub's UI doesn't make this obvious but just noticed this from the patch files - commit msg line lengths seem a little too long.

Try to follow 50/72 rule i.e., first line 50 chars, followed by new line followed by lines within 70-75 chars. You may also want to check https://docs.kernel.org/process/submitting-patches.html.

Updates the sdhci_execute_tuning function to return the error code
that was returned by the __sdhci_execute_tuning function.
Previously this code was only stored in host->tuning_err and not
actually returned.

Signed-off-by: Erick Shepherd <[email protected]>
Add a new field to the mmc_host struct to track when the card should
skip the initial tuning and use it to conditionally stop tuning in the
mmc_sd_init_uhs_card function. Currently the new field only gets set
when a DDR50 card fails to tune, which indicates the card does not
support tuning.

Signed-off-by: Erick Shepherd <[email protected]>
@erickshepherdNI erickshepherdNI force-pushed the ershephe/update-tuning-error-handling-6.6-nr branch from 9c4ebaa to 77f9179 Compare January 27, 2025 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants