Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade lengths of RichString to size_t & many related changes. #1592

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

Explorer09
Copy link
Contributor

The main goal of this PR is to upgrade the "length" parameters and properties of RichString class from int to size_t. While doing the change, there are other code quality improvements that are related or required to the data type migrations.

Important changes of these include:

  1. The /proc/*/cmdline basename matching algorithm is slightly changed to utilize the /proc/*/exe matching when possible. (Commit d384376)
  2. Add a special case to /proc/*/comm string highlighting. (This is required to prevent an internal variable commStart to go negative.)
  3. Convert to size_t for cmdlineBasenameStart, cmdlineBasenameEnd and procExeBasenameOffset values in Process class. (Use size_t for Process offset values #1588)
  4. Add additional limit clamping to Panel.cursorX value.
  5. Convert to size_t for selectedLen and scrollH values in Panel class. (This incorporates Panel.scrollH clamp logic improvement #1585)
  6. Incorporate Fix mbstowcs_nonfatal() that might convert string shorter than desired #1586 and fix mbstowcs_nonfatal(). (It's better for that PR to apply after the RichString lengths migrated to size_t type.)

Improve readability of local variables.

Signed-off-by: Kang-Che Sung <[email protected]>
The matchCmdlinePrefixWithExeSuffix() internal function (in Process.c)
can match basenames in multiple tries in case the
"cmdlineBasenameStart" input value is unreliable. Take advantage of
this and update "cmdlineBasenameStart" with the new offset detected by
the function.

Also make the matching behavior consistent regardless of
"showMergedCommand" setting.

Signed-off-by: Kang-Che Sung <[email protected]>
Add a special case when the process "comm" string is found in
"cmdline", but at the starting basename position that would be stripped
by "stripExeFromCmdline" setting. This special case will now highlight
the basename of "exe" string as well as the first token of "cmdline"
(as the two strings are concatenated together when displaying).

Signed-off-by: Kang-Che Sung <[email protected]>
Convert the members "cmdlineBasenameStart", "cmdlineBasenameEnd" and
"procExeBasenameOffset" in "Process" to size_t type. Also upgrade many
routines that search for "basenames", COMM, and PROC_EXE string to use
size_t for iterators.

The "cmdlineBasenameEnd" value is no longer allowed to negative. It is
now set to 0 during Process_init().

Signed-off-by: Kang-Che Sung <[email protected]>
Compare "<= RICHSTRING_MAXLEN" rather than "< RICHSTRING_MAXLEN".
RICHSTRING_MAXLEN does not include the terminating L'\0' character and
the internal buffer of a RichString instance has
(RICHSTRING_MAXLEN + 1) elements.

Signed-off-by: Kang-Che Sung <[email protected]>
The computed length could wrap to a negative integer, causing undefined
behavior.

(Currently it would cause out-of-bound array access due to a signed
length property in RichString. The length property should be migrated
to an unsigned size_t type, but that would be done in a later commit.)

Signed-off-by: Kang-Che Sung <[email protected]>
The text buffer contains no characters outside ASCII range, thus there
is no need to use RichString_appendnWide().

Signed-off-by: Kang-Che Sung <[email protected]>
`assert(width <= INT_MAX)` even though the "width" argument is unsigned
int type.

Signed-off-by: Kang-Che Sung <[email protected]>
Limit the maximum of the "cursorX" value to "x + w". In case of integer
overflow, cap the value to INT_MAX.

This code is written with an assumption that the "selectedLen" and
"scrollH" properties may be larger than the type of `int` (for example,
`size_t`).

Signed-off-by: Kang-Che Sung <[email protected]>
Because RichString_printoffnVal() will never print strings beyond their
NUL-terminators. Note that the routines in Panel_draw() currently
assumes each character occupies one terminal column, which does not
hold for Unicode strings. The routines might need to be rewritten in
the future to properly support Unicode.

Signed-off-by: Kang-Che Sung <[email protected]>
The Panel.scrollH value should have a minimum of 0 in the KEY_LEFT
event.

Signed-off-by: Kang-Che Sung <[email protected]>
Prevent arithmetic overflows when computing new string sizes.
For RichString_appendnWideColumns(), the "*columns" argument is now
allowed to be -1 on input, which would be interpreted as "unlimited
terminal columns". On output, the "*columns" value will now cap to
INT_MAX if the input string is extremely large, and the width
counted can overflow an `int` type. (The "*columns" buffer is
intentionally not changed to using `size_t` type.)

Signed-off-by: Kang-Che Sung <[email protected]>
The "n" parameter of mbstowcs_nonfatal() refers to number of wide
characters. But the logic seemed to be confused with "n" parameter of
mbrtowc() (the number of bytes).

Signed-off-by: Kang-Che Sung <[email protected]>
Copy link
Member

@BenBE BenBE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There'S a lot of noise introduced due to the (size_t)len casts on RichString_appendnAscii calls, most of which could be avoided if we made xSnprintf return size_t. Can you take a look in that direction?

@@ -137,44 +137,44 @@ static void CPUMeter_display(const Object* cast, RichString* out) {

len = xSnprintf(buffer, sizeof(buffer), "%5.1f%% ", this->values[CPU_METER_NORMAL]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance to reduce noise by having size_t len and thus avoiding all the subsequent casts? We know xSnprintf can't be negative, which we already check for.

Suggested change
len = xSnprintf(buffer, sizeof(buffer), "%5.1f%% ", this->values[CPU_METER_NORMAL]);
size_t len = (size_t)xSnprintf(buffer, sizeof(buffer), "%5.1f%% ", this->values[CPU_METER_NORMAL]);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach just moves the casts from the RichString append functions to xSnprintf. Unless we alter the API of xSnprintf, I don't think it reduces any noise overall.

How about this alternate solution: Create a RichString_appendAsciiFormat() function that internally combines xSnprintf and RichString_appendnAscii()?

The prototype could look something like this:

/* The "buf" argument is optional. If "buf" is NULL, then this function uses an internally allocated buffer (using xAsprintf).
It's still recommended to supply "buf" for performance reasons. */
ATTR_NONNULL_N(1, 5)
size_t RichString_appendAsciiFormat(RichString* this, int attrs, char* buf, size_t bufsize, const char* format, ...);

ATTR_NONNULL_N(1, 5)
size_t RichString_appendAsciiVFormat(RichString* this, int attrs, char* buf, size_t bufsize, const char* format, va_list vl);

@@ -137,7 +137,7 @@ static void DiskIOMeter_display(ATTR_UNUSED const Object* cast, RichString* out)

int color = cached_utilisation_diff > 40.0 ? METER_VALUE_NOTICE : METER_VALUE;
int len = xSnprintf(buffer, sizeof(buffer), "%.1f%%", cached_utilisation_diff);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar consideration re size_t. Maybe update signature of xSnprintf overall?

@@ -81,7 +82,7 @@ static HandlerResult MainPanel_eventHandler(Panel* super, int ch) {

if (EVENT_IS_HEADER_CLICK(ch)) {
int x = EVENT_HEADER_CLICK_GET_X(ch);
int hx = super->scrollH + x + 1;
size_t hx = super->scrollH + (unsigned int)x + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the second cast also be size_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenBE Let me explain a bit:

  • signed int cast to size_t: Perform a sign extension.
  • unsigned int cast to size_t: Perform a zero extension.

When I made this patch, I made a personal rule that, if a zero extension would suffice, I would avoid sign extension. This also applies to the noisy-as-you-called-it changes about me casting the xSnprintf results to size_t.

@@ -96,46 +96,44 @@ static void BarMeterMode_draw(Meter* this, int x, int y, int w) {
// The text in the bar is right aligned;
// Pad with maximal spaces and then calculate needed starting position offset
RichString_begin(bar);
RichString_appendChr(&bar, 0, ' ', w);
RichString_appendChr(&bar, 0, ' ', (unsigned int)w);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks overall kinda noisy. The source code shouldn't be a casting show … ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the problem of policy and whether the compiler would complain about implicit casts to unsigned. Given there is a conditional if (w < 1) return; a few lines earlier, compiler should detect that the w value cannot go negative in this case, but I can't say for sure.

If htop is compiled with gcc or clang's -Wsign-conversion flag, will that cause a warning?

@@ -67,10 +68,10 @@ struct Panel_ {
Vector* items;
int selected;
int oldSelected;
int selectedLen;
size_t selectedLen;
void* eventHandlerState;
int scrollV;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, this would need to change to size_t too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering that changing the scrollH type to size_t would be a bad move. It should have been unsigned int instead. It doesn't make sense to allow the UI to display 2 GiB of characters nor allow user to scroll past 2 GiB characters. In other words, it might be better to cap the scrollH value to INT_MAX whenever we have such a large data.

@@ -21,7 +22,11 @@ in the source distribution for its full text.

#define charBytes(n) (sizeof(CharType) * (n))

static void RichString_extendLen(RichString* this, int len) {
static void RichString_extendLen(RichString* this, size_t len) {
if (len > (SIZE_MAX - 1) / sizeof(CharType)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (len > (SIZE_MAX - 1) / sizeof(CharType)) {
if (len > SIZE_MAX / sizeof(CharType) - 1) {

Also, why the special handling here? Other places check only for max number of characters, not max number of bytes for storing these characters.

Copy link
Contributor Author

@Explorer09 Explorer09 Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why the special handling here? Other places check only for max number of characters, not max number of bytes for storing these characters.

You don't want the total size in bytes overflow a size_t type, do you?
Because of the +1 on allocated length, I cannot use xMallocArray to simplify things.

@BenBE BenBE added code quality ♻️ Code quality enhancement Linux 🐧 Linux related issues FreeBSD 👹 FreeBSD related issues MacOS 🍏 MacOS / Darwin related issues BSD 🐡 Issues related to *BSD PCP PCP related issues labels Jan 19, 2025
@BenBE BenBE added Solaris Solaris, Illumos, OmniOS, OpenIndiana NetBSD 🎏 NetBSD related issues OpenBSD 🐡 OpenBSD related issues labels Jan 19, 2025
@BenBE BenBE added this to the 3.4.0 milestone Jan 19, 2025
@Explorer09
Copy link
Contributor Author

There'S a lot of noise introduced due to the (size_t)len casts on RichString_appendnAscii calls, most of which could be avoided if we made xSnprintf return size_t. Can you take a look in that direction?

Because snprintf(3) only returns int, the return type that makes more sense for xSnprintf is unsigned int, not size_t.

Even though I can review all calls of xSnprintf and update the return types, I fear that would create more noise for the changes, not less. More importantly, this would take more time of me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BSD 🐡 Issues related to *BSD code quality ♻️ Code quality enhancement FreeBSD 👹 FreeBSD related issues Linux 🐧 Linux related issues MacOS 🍏 MacOS / Darwin related issues NetBSD 🎏 NetBSD related issues OpenBSD 🐡 OpenBSD related issues PCP PCP related issues Solaris Solaris, Illumos, OmniOS, OpenIndiana
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants