Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement strip and chomp as supplement to trim #343

Merged
merged 4 commits into from
Apr 10, 2021

Conversation

awvwgk
Copy link
Member

@awvwgk awvwgk commented Mar 13, 2021

Originally proposed by @zbeekman in #69 (comment):

Strip

"    hello    ".strip   #=> "hello"
"\tgoodbye\r\n".strip   #=> "goodbye"
"\x00\t\n\v\f\r ".strip #=> ""
"hello".strip           #=> "hello"
  • Fortran:
strip("\tgoodbye\r\n") ! "goodbye", where \t is a tab char, \n a new line etc.
strip("   hello     ") ! "hello"
strip("Hello")         ! "Hello"
!! Or maybe for a string class
s%strip() ! returns "hello" from "   hello   ", etc. as above

Chomp

"hello".chomp                #=> "hello"
"hello\n".chomp              #=> "hello"
"hello\r\n".chomp            #=> "hello"
"hello\n\r".chomp            #=> "hello\n"
"hello\r".chomp              #=> "hello"
"hello \n there".chomp       #=> "hello \n there"
"hello".chomp("llo")         #=> "he"
"hello\r\n\r\n".chomp('')    #=> "hello"
"hello\r\n\r\r\n".chomp('')  #=> "hello\r\n\r"
!! Or use a TBP on a string class with passed first argument
  • Fortran:
chomp("hello")                   ! "hello"
chomp("hello\n")                 ! "hello"
chomp("hello\r\n")               ! "hello"
chomp("hello\n\r")               ! "hello\n"
chomp("hello\r")                 ! "hello"
chomp("hello \n there")          ! "hello \n there"
chomp("hello", sep="llo")        ! "he"
chomp("hello\r\n\r\n", sep="")   ! "hello"
chomp("hello\r\n\r\r\n", sep="") ! "hello\r\n\r"
!! Or use a TBP on a string class with passed first argument

Changes to the original proposal

chomp takes also a set which is provided as array of length one characters and the default chomp uses the whitespace set from stdlib_ascii rather than removing only newlines.

use stdlib_strings, only : chomp
implicit none
print'(a)', chomp("hello", set=["l", "o"])  ! "he"
end

@awvwgk awvwgk added the topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... label Mar 13, 2021
@milancurcic
Copy link
Member

chomp takes a set rather than a substring which is more in line with intrinsic functions like scan and verify

There are probably use cases for both. Could we design chomp to take one or the other?

print'(a)', chomp("hello", set="lo")  ! "he"
print'(a)', chomp("hello", substring="lo")  ! "hel"

But that unfortunately complicates the API--set and substring would need to be optional and we'd need to handle the cases of chomp with one or three dummy arguments. Probably not worth it and I think set instead of substring is a good choice. If there's user demand for the substring variant, we can implement a separate function.

@awvwgk
Copy link
Member Author

awvwgk commented Mar 23, 2021

But that unfortunately complicates the API--set and substring would need to be optional and we'd need to handle the cases of chomp with one or three dummy arguments. Probably not worth it and I think set instead of substring is a good choice. If there's user demand for the substring variant, we can implement a separate function.

We would have to force keyword arguments by some hacks, it's not impossible, but quite ugly in Fortran...

function chomp_set(string, force_kwargs, set)
  character(len=*), intent(in) :: string
  ! dummy type to force key-value argument, make sure signature is different from substring variant
  type :: force_kwargs_type
    class(*), private, allocatable :: set
  end type force_kwargs_type
  type(force_kwargs_type), intent(in), optional :: force_kwargs
  character(len=*), intent(in) :: set
end function
function chomp_substring(string, force_kwargs, substring)
  character(len=*), intent(in) :: string
  ! dummy type to force key-value argument, make sure signature is different from set variant
  type :: force_kwargs_type
    class(*), private, allocatable :: substring(:)
  end type force_kwargs_type
  type(force_kwargs_type), intent(in), optional :: force_kwargs
  character(len=*), intent(in) :: substring
end function

@ivan-pi
Copy link
Member

ivan-pi commented Mar 28, 2021

I don't think that hack works. Since the "dummy" argument is optional in both cases, the interfaces are ambiguous.

@awvwgk
Copy link
Member Author

awvwgk commented Mar 28, 2021

I don't think that hack works. Since the "dummy" argument is optional in both cases, the interfaces are ambiguous.

Yeah, you're right, this interface doesn't work, that's why I put it up as puzzle at the discourse. Maybe there is a way to make it work in Fortran?

- empty set array, [character(len=1)::], does missmatch due to
  string length of zero (!) to the actual dummy set argument
@awvwgk
Copy link
Member Author

awvwgk commented Apr 5, 2021

Thanks for feedback so far. With two approvals I will merge later this week unless there is more to discuss on this proposal.

Copy link
Member

@ivan-pi ivan-pi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments:

  • Do we want examples in the spec for a type(string_type) variable or is this supposed to be self-explanatory?
  • I found the name of the test driver string_trim a bit confusing given that the functions tested are called strip and chomp. My only alternative would be string_strip_chomp.
  • I wasn't sure what was the purpose of testing the functions with and without the keyword argument, e.g:
          call check(chomp(string_type("hello"), "lo") == "hel")
          call check(chomp(string_type("hello"), substring="lo") == "hel")
    
  • I noticed you arranged the functions in "opposite" order of how they interact, in other words functions never call functions above them. Is there some underlying reason for this?

But otherwise, I think it is good to go!

The functions are a great example of how versatile verify is. On the other hand the high ratio of boiler-plate code for interfaces vs actual code doing stuff is somewhat disheartening property of Fortran.

@awvwgk
Copy link
Member Author

awvwgk commented Apr 10, 2021

Do we want examples in the spec for a type(string_type) variable or is this supposed to be self-explanatory?

Using string_type in the example would lead to wrapping the literals in the constructor, which feels redundant. Generally, I would think about any string related function as if it supports string_type on an equal footing to intrinsic character variables.

I wasn't sure what was the purpose of testing the functions with and without the keyword argument, e.g:

Mainly to pin the API down in the test, thinking about this, there is no “test” to check whether the first argument can be accessed by string=. Renaming a dummy argument can be considered a breaking API change and I was looking for a way to catch those in the testing.

I noticed you arranged the functions in "opposite" order of how they interact, in other words functions never call functions above them. Is there some underlying reason for this?

Mainly the order I wrote them, no particular reason on ordering them this way.

On the other hand the high ratio of boiler-plate code for interfaces vs actual code doing stuff is somewhat disheartening property of Fortran.

It is indeed quite verbose, but the wrappers for the implementation are too different to create them from a fypp macro. Since the wrappers are quite similar in general we might come up with some common skeleton for all of them as the stdlib_strings module grows and use this instead as a fypp macro.

@ivan-pi
Copy link
Member

ivan-pi commented Apr 10, 2021

Thanks for the answers.

With 3 approvals it is more than ready to be merged. 👍

@awvwgk awvwgk merged commit 0886501 into fortran-lang:master Apr 10, 2021
@awvwgk awvwgk deleted the strings branch April 10, 2021 11:51
@sgeard
Copy link

sgeard commented Apr 10, 2021

I'm not convinced this is the right approach. String handling is really the realm of scripting languages and I think that the way Tcl handles these issues is better.

Strip: why not trim, trimleft and trimright?

Chomp: in over 30 years of writing scripts (cross platform) I've never needed such a function and Tcl doesn't have one - and the examples just don't seem real. I'd like to understand what real problem is being solved with this function. The name is terrible as well - reminds me of Perl to which I seem to be allergic!
One thing Tcl does do is interpret \n as a newline in a platform independent way so when you read a file you always get '\n' irrespective of platform. Embedding platform-dependent line-endings in strings is poor practice - they are better handled in the i/o layer.

@ivan-pi ivan-pi mentioned this pull request Apr 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants