-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement sorting algorithms #98
Comments
Is it also an option to link to the algorithms provided by BLAS/LAPACK? |
Here are inefficient implementations from fortran-utils: https://github.com/certik/fortran-utils/blob/b43bd24cd421509a5bc6d3b9c3eeae8ce856ed88/src/sorting.f90, the API however might be useful to get inspiration from. Here is more efficient implementation of quicksort: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/sorting.f90#L165. I think there will be much better implementations out there, including in Lapack. |
Here is a mutli-threaded one: |
The following routines implement the version of quicksort outlined in Hanson's and Hopkins book and on the associated web site (http://www.siam.org/books/ot134). The publishers license is very permissive (at least as I read it). This version supports REAL32, REAL64, INT8, INT16, INT32, INT64, Character strings and arrays, and a user defined type. Unfortunately, the file tool for this page does not know anything about .f90 or .F90 file extensions. Here is the code: Main quicksort module: type specific routines are overloaded with generic qsort interface. Sorting can be in place or return a integer permutation array leaving original array untouched. Sorting can be in either ascending or descending order. A base user class that can be extended to define user specific classes. Specifies dummy methods for relational operators needed for sorting etc, a print method and an assignment method. A test program that tests sorting integers, reals, characters, and a user point class that is sorted on distance (Euclidean Norm) The point class used in test program. It is extended from the base User class |
Here are some of mine:
|
A few more: Sorting routines in various libraries
Various user routines
Internet forums and discussion boards
Interfaces to C
|
FWIW I needed a sorting procedure to sort the results of of hashing functions to estimate their tendency to produce conflicting hashes. I have implemented four sorting algorithms for sorting rank one arrays of kinds INT8, INT16, INT32, INT64, SP, DP, and QP.
I have implemented eight sets of arrays for testing the codes, all of length 2**20.
The quicksort array sometimes exhibits quadratic behavior on the Identical, Randon-3, and Random-10 arrays so I turned off testing for this sort on these three arrays. The results of the testing are summarized by the following table. The times are averages of eight consecutive runs.
In summary,
FWIW timsort puts a lot of effort into reducing the number of comparisons of elements of the array/list at the expense of some convoluted logic. I suspect this effort is wasted on intrinsic elements and some simplification of the code would improve its performance on random data with minimal impact on its performance on largely sorted data. |
I have implemented a fifth sorting routine, a translation of the Rust sorting routine to Fortran 2008. The Rust sort is also inspired by Tim Peters' Timsort algorithm, but drops the "galloping" used to rapidly find the minimum range to be sorted. It is released under the Apache License, Version 2.0 or the MIT license so the translation can be easily incorporated into the standard library. Rust sort is comparable to the simplified Tim Sort on random data, and the true Tim Sort on partially sorted data. I suggest that that a standard library sorting routines include at least two routines: one like Introsort for data that is expected to be random; and one like Rust sort for data that will often have presorted sections.
|
I think having two or even three of these would be great additions to stdlib. The occurence of pre-sorted sections is typical in indexes of sparse matrices for discretized PDE problems using irregular meshes. I just uploaded a copy of a hybrid sort (quick sort + insertion sort) by Houlsby & Sloan (both are professors of civil engineering) to a public repository: https://github.com/ivan-pi/hs-sort/. Since I could not figure out the journal licensing conditions, I have sent an email to the original author asking for permission to share the routines under a permissive license. For an array of 1048576 random ordered doubles, the timing is roughly 0.2 seconds, so on the same order as your routines. |
hey what is it alll about, i have just started in open source and came across this, could you guide me ahead? |
@MUzairS15 This page is an issue in a Git repository devoted to providing a Standard Library for Fortran, similar to standard libraries for C, C++, Java etc. This repository may be of interest to you if you are a Fortran programmer interested extending the language's capabilities. Most of the library code is in Fortran extended by the FYPP preprocessor. Most uses of FYPP in the library are rather simple and easy to learn. The Fortran code can be sophisticated and requires a good knowledge of most of Fortran 2008 other than coarrays. Some limited C++/C processing may also be present using C interoperability. The issue this page addresses is providing sorting subroutines for Fortran. I have developed routines for sorting arrays of intrinsic type elements, other than character, and am exploring their incorporation into the standard library. |
I have tentatively renamed the sorting routines: Testing of
|
A few more user wishes concerning sorting are in j3-fortran/fortran_proposals#101. |
@certik @ivan-pi @rweed @jacobwilliams @jvdp1 You have all expressed interest in having the standard library include a module for sorting data. I now have a reasonable draft of such a module and a testing program, but before it is pulled into the standard library the following issues should be addressed:
Several things of minor note:
The current draft specification document follows: title: Sorting ProceduresThe
|
I was wondering if a |
Work arrays would certainly help stack memory usage of
|
I have implemented optional work arrays and the current testing code now does not run out of stack space when processing |
@wclodius2 Thank you for this impressive work. Sorting algorithms are highly needed for multiple procedures in To answer to (some of) your questions in a previous comment:
OK for me
If there is no size limit on the size of the array to be sorted, then I am in favor of
These names are fine for me, and are helpful for users who don't care about the sorting algorithm but want the best algorithm for sorting their array. |
snip
There are no obvious limits on the size of the arrays that can be processed. Maybe more thorough testing would find some. For example,
The current procedures are hybrid sorts, and most clearly identified by names that would be meaningless to most users.
|
thanks for the detailed explanations. Based on these, the proposed names make sense to me. |
I have started a pull request on the |
Should stdlib also provide an interface to C's qsort function? The motivation is flexibility -- it allows sorting arbitrary derived types with arbitrary user-defined comparison functions. The downside is the user needs to provide a comparison function, and the call requires using some iso_c_binding features. To me this makes it somewhat "non-fortranic", and I prefer to use alternatives such as those suggested above, if possible. But for "very general" cases, I don't know a better technique than using C's qsort. If we were to do this, the sorting module would need to have an interface to C's qsort: interface
subroutine qsort_C(array, elem_count, elem_size, compare) bind(C,name="qsort")
use iso_c_binding, only: c_ptr, c_size_t, c_funptr
implicit none
type(c_ptr), value :: array ! C-pointer to the first entry of the array
integer(c_size_t), value :: elem_count ! Number of elements in the array
integer(c_size_t), value :: elem_size ! Size of each element, according to c_sizeof()
type(c_funptr), value :: compare ! c_funptr to the user-provided comparison function
end subroutine qsort_C
end interface To use it, the user would need to provide a function function test_integer_compare(i1ptr, i2ptr) result(sgn) bind(C)
type(c_ptr), value, intent(in) :: i1ptr, i2ptr
integer, pointer :: i1, i2 ! In more general cases these would be of type(my_derived_type)
integer(c_int) :: sgn
call c_f_pointer(i1ptr, i1)
call c_f_pointer(i2ptr, i2)
! The user defines what 'less than', 'equal', 'greater than' means by setting 'sgn'
if(i1 < i2) sgn = -1_c_int
if(i1 == i2) sgn = 0_c_int
if(i1 > i2) sgn = 1_c_int
end function The call to qsort_C is a bit ugly because one needs to make use of subroutine test_qsort_C
integer, target :: array(5) ! In more general cases this could be of type(my_derived_type)
! To be sorted
array = [4, 3, 5, 1, 2]
! In-place sort of the array
call qsort_C( &
c_loc(array(1)), &
size(array, kind=c_size_t), &
c_sizeof(array(1)), &
c_funloc(test_integer_compare) )
if(all(array == [1, 2, 3, 4, 5])) then
print*, 'PASS'
else
print*, 'FAIL'
end if
end subroutine My sense is that stdlib should provide something with this level of generality. Is there any better approach? |
I now have three requested changes in the API. Let me know if you have strong preferences.
|
Upon looking at the fortran standard in more detail, I've become uncertain about the correct use of C's qsort. I've raised a question here to clarify. |
Based on the fortran discourse discussion I have corrected the |
@gareth-nx , @jvdp1 and I are having a discussion of my results and its possible implications for the naming conventions of the |
@ivan-pi on #386 has expressed concerns that the implemented sortings for Also note that the results of the testings may be of interest to @certik for his Fortran benchmarks project. |
I have committed and pushed my source code to #386 . The testing harness found a couple of errors that didn't show up with my compilers on my computer. I have committed and pushed the fixed codes. The testing harness for the fixed codes does not report any further errors. |
@gareth-nx Do you mind emailing me at [email protected]? I can't find your contact info. I apologize for the off-topic message in this PR, but I can't find a better way on GitHub. :) |
I have started up a new branch at #408 to deal with some merge issues. |
At the very least there needs to be some quick algorithm for sorting integers and reals. But I think we can implement several algorithms and the user can choose.
This will also be needed to implement #38.
The text was updated successfully, but these errors were encountered: