Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions to convert integer/logical values to character values #336

Merged
merged 8 commits into from
Apr 11, 2021

Conversation

awvwgk
Copy link
Member

@awvwgk awvwgk commented Mar 12, 2021

  • use fypp in stdlib_ascii to generate functions for all kind values

@awvwgk awvwgk added the topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... label Mar 12, 2021
Copy link
Member

@ivan-pi ivan-pi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree with the implementation and tests. I'm not sure about the name char_value. I think it is too long to be practical in building strings using concatenation, e.g. filename = basename//char_value(timestep)//".txt".

I guess the name str would be the typical name, but it risks clashing with variables. char cannot be used due to overlap with the intrinsic function.

src/stdlib_ascii.fypp Outdated Show resolved Hide resolved
Comment on lines +338 to +369
integer, parameter :: ik = ${kind}$
logical(ik), intent(in) :: val
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for improved clarity of the generated code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't like the look with the ${kind}$ preprocessor all over the place

@awvwgk
Copy link
Member Author

awvwgk commented Mar 12, 2021

I was thinking about something short, str or string would qualify but are likely to clash. char would be nice, but we can't have that one. Therefore, I went with char_value which can easily be renamed as wished by the user, but I'm open for better suggestions which are not usual variable names for strings.

@epagone
Copy link

epagone commented Mar 12, 2021

How about to_char or 2char? (:thinking: actually I'm not sure if the latter is valid...apparently it is not). I like str too but I agree that it can clash easily; to_char is reasonably short and consistent with the other procedures to_upper, to_lower, etc...

@ivan-pi
Copy link
Member

ivan-pi commented Mar 12, 2021

I just wanted to suggest to_char too. It is also very relatable to the C++ std::to_string function.

Edit: in analogy, we could have to_string return a type(string_type) instance. This would essentially be a shortcut of string_type(to_char(ival)).

@epagone
Copy link

epagone commented Mar 12, 2021

I was thinking that there is also to_str (one character shorter than to_char) but the Fortran data type is character, so I still vote for to_char.

@awvwgk
Copy link
Member Author

awvwgk commented Mar 12, 2021

Should we extend the constructor for the string_type as well? With the overloaded assignment we can already do this:

use stdlib_string_type, only : string_type, assignment(=), write(formatted)
use stdlib_ascii, only : to_char
implicit none
type(string_type) :: string
string = to_char(42)
print*, string
print*, string_type(to_char(42))
end

Do we want to extend the string_type constructor/assignment to allow this?

use stdlib_string_type, only : string_type, assignment(=), write(formatted)
implicit none
type(string_type) :: string
string = 42  ! possibly harmful
print*, string
print*, string_type(42)  ! maybe useful
end

@epagone
Copy link

epagone commented Mar 12, 2021

Do we want to extend the string_type constructor/assignment to allow this?

use stdlib_string_type, only : string_type, assignment(=), write(formatted)
implicit none
type(string_type) :: string
string = 42  ! possibly harmful
print*, string
print*, string_type(42)  ! maybe useful
end

I'd love to allow the above. I agree that allowing the assignment string = 42 it's quite loose and might be dangerous, but I would vote in favour of both (assignment and constructor).

@milancurcic
Copy link
Member

milancurcic commented Mar 12, 2021

I was thinking about something short, str or string would qualify but are likely to clash.

IMO str is the obvious choice. What would this clash with?

@ivan-pi
Copy link
Member

ivan-pi commented Mar 12, 2021

I was thinking about something short, str or string would qualify but are likely to clash.

IMO str is the obvious choice. What would this clash with?

type(string_type) :: str

@milancurcic
Copy link
Member

I was thinking that there is also to_str (one character shorter than to_char) but the Fortran data type is character, so I still vote for to_char.

You're right, but the Fortran type name is surprising in general. "Character" is commonly used to refer to a single character, like in a C char, and "string" to refer to a character string of any length. But the fact that Fortran character type can be a multi-char string is an oddity and can be confusing to newcomers.

So the question is whether we should follow an existing, suboptimal convention, or choose the most appropriate and commonly used term. I'd prefer to use str over to_char here, because we're not only converting single characters.

@milancurcic
Copy link
Member

type(string_type) :: str

@ivan-pi I'm confused. This is a snippet of a potential user code, right? So, an stdlib function name clashes with a user snippet?

Here's a scenario:

User: "I can't use the str function because I want to name my variable str."
stdlib: "Don't name your variable str! :)"

I totally get that this is a common and convenient pattern for single string variables, but if you use this argument, you're throwing out many perfectly good function names.

Is there some other scenario of a name clash?

@epagone
Copy link

epagone commented Mar 12, 2021

But the fact that Fortran character type can be a multi-char string is an oddity and can be confusing to newcomers.

I agree that Fortran character type is quite unique. On the other hand, newcomers have to learn it anyway, if they want to code in Fortran. to_str or str is inconsistent with the name of the intrinsic data type that it refers to and creates confusion, IMO.

So the question is whether we should follow an existing, suboptimal convention, or choose the most appropriate and commonly used term.

I would agree with you if we were planning on changing also the intrinsic name from character to string for multi-char strings, so that we would have name consistency with the functions of this PR. However, I would pass on such endeavour: it would be too disruptive even for me 😄

ADDENDUM: I see the point about the ambiguity of single and multi-char character variables. The other idea that comes to mind to disambiguate this aspect is to suggest to_chars or chars (plural), but I'm not so sure about it.

@ivan-pi
Copy link
Member

ivan-pi commented Mar 13, 2021

@milan, That is precisely the scenario I had in mind. I recall there being a discussion somewhere of how str is a common variable name.

I found two examples where str is a variable name in Discourse:

Here are a few Fortran libraries that use str as a variable name:

And here are a few libraries which already use str for numerical to character conversion:

Let's look at what other languages have:

Language Integer to string idiom String type
Clojure (let [s (str i)]) clojure.string
C++ auto s = std::to_string(i); std::string
C# string s = i.ToString() string
D string s = i.to!string; string
Elixir s = to_string(42) -
Erlang s = integer_to_list(42) List of characters
Go s := fmt.Sprintf("%d", i) String
Javascript var s = i.toString(); String
Java String s=((Integer)i).toString(); String
Julia string(42) String
Kotlin val x = 42.toString() String
Lua s = tostring(i) string
Nim s = $(42) -
PHP $s = (string)$i; string
Python str(42) string
Ruby s = i.to_s String
Rust let s = i.to_string(); String
Swift let str2 = String(myInt) or let str1 = "\(myInt)" String
Zig s = std.fmt.bufPrint(buf, "{}", .{42}) Null-terminated array of u8
Fortran write(s,'(I0)') 42 character(len,kind)

@ivan-pi
Copy link
Member

ivan-pi commented Mar 13, 2021

You're right, but the Fortran type name is surprising in general. "Character" is commonly used to refer to a single character, like in a C char, and "string" to refer to a character string of any length. But the fact that Fortran character type can be a multi-char string is an oddity and can be confusing to newcomers.

I'm not sure if using C is a good counter-argument. Fortran is the older language of the two, and the motivation for using Fortran is partially to avoid the pitfalls of C pointers and tedious array management. Isn't it then better to maintain the same root morpheme which is already part of language?

@milancurcic
Copy link
Member

I'm not sure if using C is a good counter-argument.

Sorry for the confusion with that, I only meant to use C char as an illustration, not a counter-argument. The argument is: that in English language "character" means just one character; in computing, "character" refers to one character and "string" refers to a sequence of characters. When I wrote "C char" I didn't mean to say "because C does it like this, Fortran should too". I meant it as "C char is a an example of a programming term mapping to a term in the English language".

@ivan-pi
Copy link
Member

ivan-pi commented Mar 13, 2021

I understand your viewpoint. Quoting Computer Hope:

The abbreviation char is used as a reserved keyword in some programming languages, such as C, C++, C#, and Java. It is short for character, which is a data type that holds one character (letter, number, etc.) of data.

I also amended my list above with the name of the string object in different languages. Fortran is the odd one 😂, which is exactly what would make the name str (or to_str) confusing. This is even more so once you consider we just added a type(string_type).

@milancurcic
Copy link
Member

@ivan-pi Thank you for the comprehensive review of names in other languages. I think that's very helpful.

Regarding the Fortran libraries that use str as a variable, I don't think it's as easy to make the argument. If a program or procedure uses str as a variable name, there's no name clash unless there is also conversion to string happening in the same scope, and which would eventually motivate importing str from stdlib. So listing all the codes that have str as a variable name is not helpful on its own for the argument.

Instead, let's narrow down to a scenario that would cause a name-clash:

  1. Does a program or procedure declare a variable str? If yes, then
  2. Is there a conversion from a non-string to a string in the same scope? If yes, then
  3. There is a name clash if the user does use stdlib_ascii or use stdlib_ascii, only: str, and the user also wants to use str function to replace their home-cooked conversion. Then, the user has a choice:
    a) Rename their variable str in that scope, or
    b) Import str as stdlib_ascii, only: to_str => str

3a could be a major nuisance for the user, and 3b is a minor nuisance. Do you agree with my description of the scenario, and is my assessment of the level of nuisance fair?

But I think the consistency argument for to_char is good and it would probably be less confusing to experienced Fortran programmers than str or to_str. I'm not strongly convinced for the "popularity" (use string because that's what you mean) over the "consistency" (use char because Fortran calls it that) argument, but I lean toward the popular choice.

From your list, 9/15 use some kind of to_ prefix, so I think to_str would also be a good name (and is consistent with existing conversion functions in stdlib_ascii). Currently, my preference is:

  1. str
  2. to_str
  3. to_char

@ivan-pi
Copy link
Member

ivan-pi commented Mar 13, 2021

I did take the list of Fortran libraries a notch too far. I merely wanted to demonstrate that str is used in many real-world projects. As you suggest, existing codes that would need to import the conversion routine can always rename it.

Regarding points 3a and 3b when developing new codes, as long as the code is modular, renaming is not a big issue with modern source code editors. In fact when preparing the list I noticed many codes use variables such as file_str, key_str, etc., which include a descriptive prefix (good practice), so 3a might be the "better" option. But in general I agree with your assessment.

One further downside of to_char is that it can be easily mistaken with the intrinsic function char(i [,kind]) that returns the character represented by the integer i. At least for integer arguments this could go unnoticed until something breaks (e.g. strange symbols in output file names).

This has me quite torn now.

@milancurcic
Copy link
Member

I added Elixir, Erlang, Nim, and Zig to your table.

@ivan-pi
Copy link
Member

ivan-pi commented Mar 13, 2021

Just to throw a few other options on the table:

  • A custom operator like (.tc. 42) for to character
  • repr like the Python represent
  • jot as in "jot down"
  • rec as in "record"
  • Functions named write() or print()

Addendum: one more (unfriendly) option are separate functions:

  • itoc, rtoc, ltoc (an itoc function was given in the book Software Tools by Kernighan and Plauger in the Ratfor Fortran dialect)

Addendum 2: borrow from C and gnuplot

  • sprintf()
  • itoa, ftoa, dtoa (these are non-standard, but available in some compilers; Fortran versions were given in the book Fortran Tools for VAX/VMS and MS-DOS)

@urbanjost
Copy link

Speaking of someone that already uses a STR() function name,one version I have that might be relevant (just uses internal reads) is in M_msg but uses class(*) to allow the
useful ability to compose a CHARACTER variable with something like STR(10.4,' is between',10,'and',11) and to create something similiar to WRITE so you can call WRT([10,6],'the value is',V/2.0) in a way somewhat similiar to list directed output.

There are other versions that use the old ITOA, FTOA, DTOA routines that are used for arrays as they are considerably faster than internal I/O; but I do not think I have a public version. It raises the interesting issue of whether you want the routines to be elemental or not.

I find overloading helps make things much more compact, so assuming you have a function like STR, overloading // with that allows you to do things like "MESSAGE='the value is '//10//' and the limit is '//300.4//'.'". Some would probably like to overload + to do that, but I prefer to overload + - / * to convert strings to numbers, and to overload INT(), REAL(), and DBLE() to take character values and assume they are numeric. So that makes for a symmetric set for converting to and from strings and numeric values.

I originally overloaded CHAR() to do something other than it does by default but decided that was a bad idea (I actually do not remember what I tried, but I remember I decided it was too confusing).

So I did not see anything about overloading '//' or using class(*) variables so thought I would mention that works well for me.

@epagone
Copy link

epagone commented Mar 13, 2021

Any comments about to_chars or chars to clarify it's a multi-character function? How do you feel about it?

@ivan-pi
Copy link
Member

ivan-pi commented Mar 13, 2021

So I did not see anything about overloading '//' or using class(*) variables so thought I would mention that works well for me.

Both Java and Javascript support the idiom of using the concatenation operator to build a string using

String s = "" + i;

If operator(//) were overload then one could do the same thing in Fortran:

character(len=:), allocatable :: s
s = ''//42   ! or 42//''

Addendum: inside the overloaded operator one still needs to use internal I/O or a function like str()/to_char()/to_str() to convert the integer, so in some way this idiom only enforces the need for a function interface.

@ivan-pi
Copy link
Member

ivan-pi commented Mar 13, 2021

Any comments about to_chars or chars to clarify it's a multi-character function? How do you feel about it?

FWIW, I find to_chars okay. The to_ prefix helps clarify the intent IMO. Maybe the even shorter Ruby versions to_s or to_c would be of interest, even if they are a bit cryptic (tos/toc just seem confusing).

I think ultimately we will need to go with the popular choice.

@awvwgk
Copy link
Member Author

awvwgk commented Mar 13, 2021

Thanks for all the feedback, naming functions is hard, especially with the oddity of the Fortran character. I prefer to keep consistent with the Fortran intrinsic names here.

I also added the integer/logical to character conversion to the constructor of the string_type, not to the assignment yet, because I'm still undecided whether it will cause more harm than good. I'm not taking the overloaded assignment from the table, but would prefer to discuss it separately, because we might think about including an assignment from integer/logical to character as well.

@milancurcic
Copy link
Member

I realize now, both arguments--for to_char and for to_str or str--are "consistency" arguments. to_char is consistent with Fortran names. to_str or str are consistent with broader use. I see merits to both.

Even though "experimental" and can change, it's important to make a good choice here because these are such high impact functions.

Let's get more eyes on this. @certik @jvdp1 @everythingfunctional @arjenmarkus @rouson @LKedward @smeskos what do you think?

@jvdp1
Copy link
Member

jvdp1 commented Mar 13, 2021

Nice and important discussion! IMO the prefix "to_" is needed, as it explains the purpose of the function (like for to_upper, ...). I would be in favor of "to_char" because the result is a character variable. Similiarly, to_string could be used for the string DT. I find to_str a bit confusing: based on the name you could think that it would return a string DT, instead of a character.

@smeskos
Copy link

smeskos commented Mar 14, 2021

I am in favor of the prefix "to_" as well. Now as far as the rest, I have given it much thought and I am leaning towards "to_string", keeping in mind that it can be used for both characters as well as the string DT.

Reasoning: I am inspired by C++ std::to_string. In a similar way as C++ has std::static_cast<char>(49) or char(49) that both will return the ascii "1", Fortran has transform(49,'a') and char(49) or achar(49), (char() being the compiler's default which I believe is almost always ascii and achar() for ascii) therefore Fortran misses something that will converts (un)signed integers and reals to a string with the same content and it's only logical to find something different than what Fortran programmers are already used to (the aforementioned options). For me, this leaves us with to_char or to_string and since the concept of string DT with the stdlib is also a recent proposal, why not use to_string for that purpose too. Bonus: it's natural to everyone who uses C++ as well, and I emphasize that since many programers who write Fortran, know and write in C++ as well, and every single one who writes in C++ for scientific purposes knows, or/and writes, or/and learned Fortran first...

@everythingfunctional
Copy link
Member

I vote for to_char. I've used the name to_string in my strff library to return varying_string. I assume that stdlib will want use to_string to return its own string type, and thus you can't use that name to return a character, or you'll having clashing interfaces for the generic resolution.

@milancurcic
Copy link
Member

I assume that stdlib will want use to_string to return its own string type, and thus you can't use that name to return a character, or you'll having clashing interfaces for the generic resolution.

With d835ad8 (this PR), it will be string_type(42).

@awvwgk
Copy link
Member Author

awvwgk commented Mar 14, 2021

For derived types the natural choice is to extend the constructor IMO, this way no additional function name is required.

src/stdlib_ascii.fypp Outdated Show resolved Hide resolved
@LKedward
Copy link
Member

I realize now, both arguments--for to_char and for to_str or str--are "consistency" arguments. to_char is consistent with Fortran names. to_str or str are consistent with broader use. I see merits to both.

My preference is for str or to_str since I personally associate char with a single character and have always conceptualised the Fortran character() declaration as a string anyway. str also matches closely with what you would expect if coming from another common language. I don't consider potential variable name clashes as a strong argument against since for small codebases the variable can be trivially renamed and for large codebases the function can be imported as an alias with use , only:. Seeking consistency with the Fortran language (for to_char) seems like a bad precedent to set since it potentially puts a lot of constraint on future naming schemes.

@awvwgk
Copy link
Member Author

awvwgk commented Mar 17, 2021

From the discussion I conclude that to_string/to_str seems to find more acceptance than to_char/to_chars. So we arrive at the two new functions:

use stdlib_ascii, only : to_string  ! returns character(len=*)
use stdlib_string_type, only : string_type, &  ! overloaded constructor, returns type(string_type)
    & write(formatted), assignment(=)
implicit none
character(len=:), allocatable :: dlc
type(string_type) :: str
! Intrinsic character case
dlc = to_string(7)
print*, dlc, to_string(42)
! Derived type case
str = string_type(7)  ! str = to_string(7) works as well due to overloaded assignment(=)
print*, str, string_type(42)
end

Copy link
Member

@LKedward LKedward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you.

@awvwgk
Copy link
Member Author

awvwgk commented Apr 11, 2021

Thanks everybody for the feedback. With four approvals I'll go ahead and merge.

@awvwgk awvwgk merged commit 3621b03 into fortran-lang:master Apr 11, 2021
@awvwgk awvwgk deleted the int-to-char branch April 11, 2021 14:49
@certik
Copy link
Member

certik commented Apr 11, 2021

I think that looks good. to_string is a good choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants