Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize LSBRadixSorter and use it in SortingPostingsEnum #12800

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

gf2121
Copy link
Contributor

@gf2121 gf2121 commented Nov 13, 2023

Description

In #12114, we had great numbers for LSB radix sorter when sorting random docs in SortingDocsEnum . But we can not take advantage of the LSB radix sorter in SortingPostingsEnum because ((I guess) we need to swap an parallel offset array. This PR proposes to do the following:

  • Extract a generalized LSB radix sorter called BaseLSBRadixSorter, and change the origin LSBRadixSorter to UnsignedIntLSBRadixSorter.

  • Implement a custom LSB radix sorter fo SortingPostingsEnum.

Some Notes

  • I plan to mark the LSBRadixSorter deprecated in 9.x and delete in 10.0. So I named the abstract class BaseLSBRadixSorter instead of LSBRadixSorter to help backport.

  • I run a benchmark to make sure we do not have regression for the unsigned ints sorter. I attach the benchmark code in the PR and will remove before merge.

Benchmark                  (bit)  (order)  (size)   Mode  Cnt  Score   Error   Units
LSBSorterBenchmark.newLSB     24  natural  100000  thrpt    5  2.802 ± 0.038  ops/ms
LSBSorterBenchmark.newLSB     24  reverse  100000  thrpt    5  2.800 ± 0.040  ops/ms
LSBSorterBenchmark.newLSB     24   random  100000  thrpt    5  2.768 ± 0.039  ops/ms
LSBSorterBenchmark.newLSB     31  natural  100000  thrpt    5  2.125 ± 0.021  ops/ms
LSBSorterBenchmark.newLSB     31  reverse  100000  thrpt    5  2.128 ± 0.033  ops/ms
LSBSorterBenchmark.newLSB     31   random  100000  thrpt    5  2.101 ± 0.014  ops/ms
LSBSorterBenchmark.oldLSB     24  natural  100000  thrpt    5  2.847 ± 0.030  ops/ms
LSBSorterBenchmark.oldLSB     24  reverse  100000  thrpt    5  2.848 ± 0.013  ops/ms
LSBSorterBenchmark.oldLSB     24   random  100000  thrpt    5  2.787 ± 0.129  ops/ms
LSBSorterBenchmark.oldLSB     31  natural  100000  thrpt    5  2.154 ± 0.033  ops/ms
LSBSorterBenchmark.oldLSB     31  reverse  100000  thrpt    5  2.150 ± 0.018  ops/ms
LSBSorterBenchmark.oldLSB     31   random  100000  thrpt    5  2.121 ± 0.011  ops/ms
  • I run a benchmark to compare tim sorter and LSB radix sorter impl in SortingPostingsEnum. LSB radix sorter is much faster in most cases except for the completely sorted docs, which looks fine to me. I attached the benchmark code in the PR and will remove before merge as well.
Benchmark                       (bit)  (order)  (size)   Mode  Cnt   Score    Error   Units
DocSorterBenchmark.radixSorter     24  natural  100000  thrpt    5   2.063 ±  0.088  ops/ms
DocSorterBenchmark.radixSorter     24  reverse  100000  thrpt    5   2.082 ±  0.016  ops/ms
DocSorterBenchmark.radixSorter     24   random  100000  thrpt    5   2.110 ±  0.014  ops/ms
DocSorterBenchmark.radixSorter     24  partial  100000  thrpt    5   2.091 ±  0.026  ops/ms
DocSorterBenchmark.radixSorter     31  natural  100000  thrpt    5   1.633 ±  0.014  ops/ms
DocSorterBenchmark.radixSorter     31  reverse  100000  thrpt    5   1.636 ±  0.023  ops/ms
DocSorterBenchmark.radixSorter     31   random  100000  thrpt    5   1.641 ±  0.009  ops/ms
DocSorterBenchmark.radixSorter     31  partial  100000  thrpt    5   1.640 ±  0.010  ops/ms
DocSorterBenchmark.timSorter       24  natural  100000  thrpt    5  49.726 ±  1.452  ops/ms
DocSorterBenchmark.timSorter       24  reverse  100000  thrpt    5   0.724 ±  0.006  ops/ms
DocSorterBenchmark.timSorter       24   random  100000  thrpt    5   0.061 ±  0.001  ops/ms
DocSorterBenchmark.timSorter       24  partial  100000  thrpt    5   0.203 ±  0.023  ops/ms
DocSorterBenchmark.timSorter       31  natural  100000  thrpt    5  49.475 ±  2.480  ops/ms
DocSorterBenchmark.timSorter       31  reverse  100000  thrpt    5   2.236 ±  0.192  ops/ms
DocSorterBenchmark.timSorter       31   random  100000  thrpt    5   0.058 ±  0.001  ops/ms
DocSorterBenchmark.timSorter       31  partial  100000  thrpt    5   0.209 ±  0.012  ops/ms

long[] off = new long[size];
for (int i = 0; i < size; ++i) {
doc[i] = random.nextInt(maxDoc);
off[i] = random.nextLong(Long.MAX_VALUE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for improving our sorting @gf2121!

When possible I much prefer benchmarks based on real data/corpora -- if you run benchmarks on random data you draw random conclusions :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion @mikemccand !

When possible I much prefer benchmarks based on real data/corpora.

Benchmark on real data is great. Since this PR is optimizing the docIds sorting in SortingTermsEnum,
I guess the factor that determines the doc id distribution is not the data itself, but the order in which the sort field is written. I tried the great benchmark @jpountz posted in #12114 (comment). The result shows ~17% speed up for flushing docs indexed in random order, and ~2% regression for asc/desc.

Benchmark Detail

Baseline

Using order: RANDOM
DWPT 0 [2023-11-14T12:52:36.633402Z; main]: flush time 1214.070083 ms
DWPT 0 [2023-11-14T12:52:38.534253Z; main]: flush time 899.796208 ms
DWPT 0 [2023-11-14T12:52:40.119878Z; main]: flush time 799.165958 ms
DWPT 0 [2023-11-14T12:52:41.562375Z; main]: flush time 654.701 ms
DWPT 0 [2023-11-14T12:52:42.998111Z; main]: flush time 650.918208 ms
DWPT 0 [2023-11-14T12:52:44.433020Z; main]: flush time 647.062042 ms
DWPT 0 [2023-11-14T12:52:45.871244Z; main]: flush time 649.0795 ms
DWPT 0 [2023-11-14T12:52:47.309565Z; main]: flush time 652.586583 ms
DWPT 0 [2023-11-14T12:52:48.747658Z; main]: flush time 646.734333 ms
DWPT 0 [2023-11-14T12:52:50.196127Z; main]: flush time 654.810541 ms
Using order: ASC
DWPT 1 [2023-11-14T12:52:51.564601Z; main]: flush time 525.553291 ms
DWPT 1 [2023-11-14T12:52:52.708573Z; main]: flush time 349.199958 ms
DWPT 1 [2023-11-14T12:52:53.803105Z; main]: flush time 303.043541 ms
DWPT 1 [2023-11-14T12:52:54.896132Z; main]: flush time 299.856667 ms
DWPT 1 [2023-11-14T12:52:55.995053Z; main]: flush time 300.801292 ms
DWPT 1 [2023-11-14T12:52:57.082434Z; main]: flush time 295.964542 ms
DWPT 1 [2023-11-14T12:52:58.170703Z; main]: flush time 299.771375 ms
DWPT 1 [2023-11-14T12:52:59.258876Z; main]: flush time 296.914333 ms
DWPT 1 [2023-11-14T12:53:00.373431Z; main]: flush time 303.516583 ms
DWPT 1 [2023-11-14T12:53:01.483840Z; main]: flush time 300.98975 ms
Using order: DESC
DWPT 2 [2023-11-14T12:53:02.748401Z; main]: flush time 422.169666 ms
DWPT 2 [2023-11-14T12:53:03.935493Z; main]: flush time 387.440917 ms
DWPT 2 [2023-11-14T12:53:05.112949Z; main]: flush time 384.38025 ms
DWPT 2 [2023-11-14T12:53:06.291248Z; main]: flush time 387.05 ms
DWPT 2 [2023-11-14T12:53:07.480404Z; main]: flush time 391.184375 ms
DWPT 2 [2023-11-14T12:53:08.691136Z; main]: flush time 383.774625 ms
DWPT 2 [2023-11-14T12:53:09.875488Z; main]: flush time 391.470083 ms
DWPT 2 [2023-11-14T12:53:11.058662Z; main]: flush time 386.547958 ms
DWPT 2 [2023-11-14T12:53:12.245281Z; main]: flush time 384.264833 ms
DWPT 2 [2023-11-14T12:53:13.423061Z; main]: flush time 383.677958 ms

Candidate

Using order: RANDOM
DWPT 0 [2023-11-14T12:51:01.872202Z; main]: flush time 1087.182375 ms
DWPT 0 [2023-11-14T12:51:03.804886Z; main]: flush time 879.66 ms
DWPT 0 [2023-11-14T12:51:05.190083Z; main]: flush time 603.431084 ms
DWPT 0 [2023-11-14T12:51:06.508227Z; main]: flush time 536.702625 ms
DWPT 0 [2023-11-14T12:51:07.826303Z; main]: flush time 537.334458 ms
DWPT 0 [2023-11-14T12:51:09.145597Z; main]: flush time 534.974792 ms
DWPT 0 [2023-11-14T12:51:10.479141Z; main]: flush time 540.493958 ms
DWPT 0 [2023-11-14T12:51:11.807347Z; main]: flush time 540.360834 ms
DWPT 0 [2023-11-14T12:51:13.135150Z; main]: flush time 532.705959 ms
DWPT 0 [2023-11-14T12:51:14.452301Z; main]: flush time 533.977417 ms
Using order: ASC
DWPT 1 [2023-11-14T12:51:15.752945Z; main]: flush time 476.6145 ms
DWPT 1 [2023-11-14T12:51:16.841999Z; main]: flush time 310.893584 ms
DWPT 1 [2023-11-14T12:51:17.941010Z; main]: flush time 320.484458 ms
DWPT 1 [2023-11-14T12:51:19.032024Z; main]: flush time 306.014333 ms
DWPT 1 [2023-11-14T12:51:20.121276Z; main]: flush time 306.314708 ms
DWPT 1 [2023-11-14T12:51:21.203727Z; main]: flush time 301.988375 ms
DWPT 1 [2023-11-14T12:51:22.295764Z; main]: flush time 304.992917 ms
DWPT 1 [2023-11-14T12:51:23.383463Z; main]: flush time 304.863875 ms
DWPT 1 [2023-11-14T12:51:24.466930Z; main]: flush time 305.871583 ms
DWPT 1 [2023-11-14T12:51:25.554916Z; main]: flush time 308.523833 ms
Using order: DESC
DWPT 2 [2023-11-14T12:51:26.776390Z; main]: flush time 400.262708 ms
DWPT 2 [2023-11-14T12:51:27.954697Z; main]: flush time 399.378459 ms
DWPT 2 [2023-11-14T12:51:29.135740Z; main]: flush time 399.318917 ms
DWPT 2 [2023-11-14T12:51:30.313135Z; main]: flush time 393.676084 ms
DWPT 2 [2023-11-14T12:51:31.516547Z; main]: flush time 407.419875 ms
DWPT 2 [2023-11-14T12:51:32.692559Z; main]: flush time 392.623833 ms
DWPT 2 [2023-11-14T12:51:33.880394Z; main]: flush time 399.269125 ms
DWPT 2 [2023-11-14T12:51:35.051656Z; main]: flush time 393.07625 ms
DWPT 2 [2023-11-14T12:51:36.223096Z; main]: flush time 391.553584 ms
DWPT 2 [2023-11-14T12:51:37.404842Z; main]: flush time 392.61575 ms
Code
  enum Order {
    RANDOM,
    ASC,
    DESC;
  }

  public static void main(String[] args) throws IOException {
    for (Order order : Order.values()) {
      System.out.println("Using order: " + order.name());
      Directory dir = FSDirectory.open(Paths.get("/tmp/a"));
      IndexWriterConfig cfg = new IndexWriterConfig(new StandardAnalyzer());
      cfg.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
      cfg.setInfoStream(new PrintStreamInfoStream(System.out));
      cfg.setMaxBufferedDocs(1_000_000);
      cfg.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH);
      cfg.setIndexSort(
          new Sort(LongField.newSortField("sort_field", false, SortedNumericSelector.Type.MIN)));
      IndexWriter w = new IndexWriter(dir, cfg);
      Document doc = new Document();
      LongField sortField = new LongField("sort_field", 0);
      doc.add(sortField);
      TextField stringField1 = new TextField("string_field", "", Field.Store.NO);
      doc.add(stringField1);
      TextField stringField2 = new TextField("string_field", "", Field.Store.NO);
      doc.add(stringField2);
      TextField stringField3 = new TextField("string_field", "", Field.Store.NO);
      doc.add(stringField3);
      for (int i = 0; i < 10_000_000; ++i) {
        long sortValue =
            switch (order) {
              case RANDOM -> i % 15;
              case ASC -> i;
              case DESC -> -i;
            };
        sortField.setLongValue(sortValue);
        stringField1.setStringValue(Integer.toBinaryString(i % 10));
        stringField2.setStringValue(Integer.toBinaryString(i % 100));
        stringField3.setStringValue(Integer.toBinaryString(i % 1000));
        w.addDocument(doc);
      }
      w.flush();
      w.commit();
      w.close();
    }
  }

@mikemccand
Copy link
Member

I wonder whether Arrays.sort might be a good choice instead of making our own powerful sorting classes? OpenJDK is (gradually?) taking advantage of fast SIMD sorting so at some point / on some CPUs / for certain JDK versions, Arrays.sort should become faster than our own sorts. (But, I don't think Arrays.sort has a parallel array option?).

@jpountz
Copy link
Contributor

jpountz commented Nov 16, 2023

I like the idea, but this seems to come with greater heap requirements as well?

@gf2121
Copy link
Contributor Author

gf2121 commented Nov 20, 2023

Thanks for feedback @jpountz !

but this seems to come with greater heap requirements as well?

Yes, +1 for the concern. The original approach requires at most maxDoc/8 extra tmp slots while the lsb sorter requires n. I also considered the MSBRadixSorter (not need tmp slots) but it is 5x slower the LSB sorter. I'll think more about this and try to find out the better trade-off.

msb benchmark result
Benchmark                     (bit)            (order)  (size)   Mode  Cnt   Score    Error   Units
DocSorterBenchmark.lsbSorter     24            natural  100000  thrpt    5   2.066 ±  0.074  ops/ms
DocSorterBenchmark.lsbSorter     24            reverse  100000  thrpt    5   2.086 ±  0.048  ops/ms
DocSorterBenchmark.lsbSorter     24             random  100000  thrpt    5   2.073 ±  0.015  ops/ms
DocSorterBenchmark.lsbSorter     24            partial  100000  thrpt    5   2.072 ±  0.010  ops/ms
DocSorterBenchmark.lsbSorter     24  natural_exception  100000  thrpt    5   2.074 ±  0.037  ops/ms
DocSorterBenchmark.lsbSorter     24  reverse_exception  100000  thrpt    5   2.079 ±  0.064  ops/ms
DocSorterBenchmark.lsbSorter     31            natural  100000  thrpt    5   1.641 ±  0.030  ops/ms
DocSorterBenchmark.lsbSorter     31            reverse  100000  thrpt    5   1.646 ±  0.013  ops/ms
DocSorterBenchmark.lsbSorter     31             random  100000  thrpt    5   1.650 ±  0.014  ops/ms
DocSorterBenchmark.lsbSorter     31            partial  100000  thrpt    5   1.640 ±  0.027  ops/ms
DocSorterBenchmark.lsbSorter     31  natural_exception  100000  thrpt    5   1.641 ±  0.008  ops/ms
DocSorterBenchmark.lsbSorter     31  reverse_exception  100000  thrpt    5   1.642 ±  0.019  ops/ms
DocSorterBenchmark.msbSorter     24            natural  100000  thrpt    5   0.481 ±  0.007  ops/ms
DocSorterBenchmark.msbSorter     24            reverse  100000  thrpt    5   0.422 ±  0.005  ops/ms
DocSorterBenchmark.msbSorter     24             random  100000  thrpt    5   0.468 ±  0.003  ops/ms
DocSorterBenchmark.msbSorter     24            partial  100000  thrpt    5   0.432 ±  0.003  ops/ms
DocSorterBenchmark.msbSorter     24  natural_exception  100000  thrpt    5   0.464 ±  0.010  ops/ms
DocSorterBenchmark.msbSorter     24  reverse_exception  100000  thrpt    5   0.422 ±  0.003  ops/ms
DocSorterBenchmark.msbSorter     31            natural  100000  thrpt    5   0.565 ±  0.013  ops/ms
DocSorterBenchmark.msbSorter     31            reverse  100000  thrpt    5   0.442 ±  0.009  ops/ms
DocSorterBenchmark.msbSorter     31             random  100000  thrpt    5   0.447 ±  0.002  ops/ms
DocSorterBenchmark.msbSorter     31            partial  100000  thrpt    5   0.428 ±  0.002  ops/ms
DocSorterBenchmark.msbSorter     31  natural_exception  100000  thrpt    5   0.518 ±  0.011  ops/ms
DocSorterBenchmark.msbSorter     31  reverse_exception  100000  thrpt    5   0.444 ±  0.007  ops/ms
DocSorterBenchmark.timSorter     24            natural  100000  thrpt    5  50.030 ±  0.728  ops/ms
DocSorterBenchmark.timSorter     24            reverse  100000  thrpt    5   1.005 ±  0.007  ops/ms
DocSorterBenchmark.timSorter     24             random  100000  thrpt    5   0.077 ±  0.001  ops/ms
DocSorterBenchmark.timSorter     24            partial  100000  thrpt    5   0.238 ±  0.014  ops/ms
DocSorterBenchmark.timSorter     24  natural_exception  100000  thrpt    5   2.640 ±  0.077  ops/ms
DocSorterBenchmark.timSorter     24  reverse_exception  100000  thrpt    5   0.775 ±  0.007  ops/ms
DocSorterBenchmark.timSorter     31            natural  100000  thrpt    5  49.712 ±  2.189  ops/ms
DocSorterBenchmark.timSorter     31            reverse  100000  thrpt    5   2.511 ±  0.165  ops/ms
DocSorterBenchmark.timSorter     31             random  100000  thrpt    5   0.076 ±  0.001  ops/ms
DocSorterBenchmark.timSorter     31            partial  100000  thrpt    5   0.239 ±  0.012  ops/ms
DocSorterBenchmark.timSorter     31  natural_exception  100000  thrpt    5   2.494 ±  0.097  ops/ms
DocSorterBenchmark.timSorter     31  reverse_exception  100000  thrpt    5   1.180 ±  0.021  ops/ms

@gf2121
Copy link
Contributor Author

gf2121 commented Nov 21, 2023

I did some more work to find out the balance between memory / performance in various data distribution. The way i'm thinking now is that we keep the timsorter here, but make the run length larger and use LSB radix sorter to sort these runs.

  • From the memory perspective, we limit the memory usage to a single run. So the memory usage should be generally similar to before. Here is the tmp slot required in a single shot:
  tim sorter lsb sorter off sorter
natural 0 112500 0
reverse 0 112500 0
random 56052 112500 56248
partial 52506 112500 56248
natural_exception 32780 112500 20356
reverse_exception 45724 112500 56250

 

  • From the performance perspective, we have great performance for sorted data as before, and 4x faster on random data. But we will lose some performance for "sorted exception" cases as we are using larger run length.
Benchmark                     (bit)            (order)  (size)   Mode  Cnt   Score   Error   Units
DocSorterBenchmark.lsbSorter     24            natural  100000  thrpt    5   2.067 ± 0.037  ops/ms
DocSorterBenchmark.lsbSorter     24            reverse  100000  thrpt    5   2.067 ± 0.047  ops/ms
DocSorterBenchmark.lsbSorter     24             random  100000  thrpt    5   2.045 ± 0.031  ops/ms
DocSorterBenchmark.lsbSorter     24            partial  100000  thrpt    5   2.049 ± 0.040  ops/ms
DocSorterBenchmark.lsbSorter     24  natural_exception  100000  thrpt    5   2.074 ± 0.020  ops/ms
DocSorterBenchmark.lsbSorter     24  reverse_exception  100000  thrpt    5   2.063 ± 0.070  ops/ms
DocSorterBenchmark.lsbSorter     31            natural  100000  thrpt    5   1.593 ± 0.020  ops/ms
DocSorterBenchmark.lsbSorter     31            reverse  100000  thrpt    5   1.577 ± 0.037  ops/ms
DocSorterBenchmark.lsbSorter     31             random  100000  thrpt    5   1.586 ± 0.017  ops/ms
DocSorterBenchmark.lsbSorter     31            partial  100000  thrpt    5   1.591 ± 0.018  ops/ms
DocSorterBenchmark.lsbSorter     31  natural_exception  100000  thrpt    5   1.584 ± 0.032  ops/ms
DocSorterBenchmark.lsbSorter     31  reverse_exception  100000  thrpt    5   1.563 ± 0.033  ops/ms
DocSorterBenchmark.offSorter     24            natural  100000  thrpt    5  48.221 ± 2.295  ops/ms
DocSorterBenchmark.offSorter     24            reverse  100000  thrpt    5  16.023 ± 0.833  ops/ms
DocSorterBenchmark.offSorter     24             random  100000  thrpt    5   0.375 ± 0.166  ops/ms
DocSorterBenchmark.offSorter     24            partial  100000  thrpt    5   0.487 ± 0.233  ops/ms
DocSorterBenchmark.offSorter     24  natural_exception  100000  thrpt    5   1.304 ± 0.055  ops/ms
DocSorterBenchmark.offSorter     24  reverse_exception  100000  thrpt    5   0.943 ± 0.603  ops/ms
DocSorterBenchmark.offSorter     31            natural  100000  thrpt    5  48.030 ± 2.616  ops/ms
DocSorterBenchmark.offSorter     31            reverse  100000  thrpt    5  15.940 ± 0.961  ops/ms
DocSorterBenchmark.offSorter     31             random  100000  thrpt    5   0.363 ± 0.094  ops/ms
DocSorterBenchmark.offSorter     31            partial  100000  thrpt    5   0.486 ± 0.093  ops/ms
DocSorterBenchmark.offSorter     31  natural_exception  100000  thrpt    5   0.999 ± 0.691  ops/ms
DocSorterBenchmark.offSorter     31  reverse_exception  100000  thrpt    5   0.942 ± 0.058  ops/ms
DocSorterBenchmark.timSorter     24            natural  100000  thrpt    5  46.111 ± 4.226  ops/ms
DocSorterBenchmark.timSorter     24            reverse  100000  thrpt    5  15.828 ± 0.571  ops/ms
DocSorterBenchmark.timSorter     24             random  100000  thrpt    5   0.082 ± 0.015  ops/ms
DocSorterBenchmark.timSorter     24            partial  100000  thrpt    5   0.515 ± 0.260  ops/ms
DocSorterBenchmark.timSorter     24  natural_exception  100000  thrpt    5   2.681 ± 0.164  ops/ms
DocSorterBenchmark.timSorter     24  reverse_exception  100000  thrpt    5   1.940 ± 0.138  ops/ms
DocSorterBenchmark.timSorter     31            natural  100000  thrpt    5  46.916 ± 4.504  ops/ms
DocSorterBenchmark.timSorter     31            reverse  100000  thrpt    5  15.718 ± 0.865  ops/ms
DocSorterBenchmark.timSorter     31             random  100000  thrpt    5   0.085 ± 0.001  ops/ms
DocSorterBenchmark.timSorter     31            partial  100000  thrpt    5   0.502 ± 0.036  ops/ms
DocSorterBenchmark.timSorter     31  natural_exception  100000  thrpt    5   2.711 ± 0.134  ops/ms
DocSorterBenchmark.timSorter     31  reverse_exception  100000  thrpt    5   1.944 ± 0.149  ops/ms

@gf2121 gf2121 requested a review from jpountz November 21, 2023 08:38
* Build histogram with specified shift. Implementations should witten like:
*
* <pre>{@code
* protected void buildHistogram(int from, int to, int[] histogram, int shift) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When run jmh i find that the low level abstractions like bucket and save decreased the performance 4-5x when there are more than one implementations in single JVM. So i pull up the abstraction layer here.

@gf2121
Copy link
Contributor Author

gf2121 commented Nov 21, 2023

I also run the index script to see flush time with this new approach, result in ~15% faster for random data and no regression on asc/desc :)

Benchmark Detail

Baseline
Candidate

DWPT 0 [2023-11-21T09:26:53.204254Z; main]: flush time 825.381875 ms
DWPT 0 [2023-11-21T09:26:54.820677Z; main]: flush time 759.469459 ms
DWPT 0 [2023-11-21T09:26:56.294950Z; main]: flush time 685.044875 ms
DWPT 0 [2023-11-21T09:26:57.643239Z; main]: flush time 562.0595 ms
DWPT 0 [2023-11-21T09:26:58.977843Z; main]: flush time 555.718625 ms
DWPT 0 [2023-11-21T09:27:00.310891Z; main]: flush time 554.023042 ms
DWPT 0 [2023-11-21T09:27:01.645311Z; main]: flush time 556.796709 ms
DWPT 0 [2023-11-21T09:27:02.984576Z; main]: flush time 552.745084 ms
DWPT 0 [2023-11-21T09:27:04.327895Z; main]: flush time 550.138166 ms
DWPT 0 [2023-11-21T09:27:05.667026Z; main]: flush time 552.581875 ms
Using order: ASC
DWPT 1 [2023-11-21T09:27:06.963483Z; main]: flush time 444.065625 ms
DWPT 1 [2023-11-21T09:27:08.116839Z; main]: flush time 346.094542 ms
DWPT 1 [2023-11-21T09:27:09.223496Z; main]: flush time 297.886417 ms
DWPT 1 [2023-11-21T09:27:10.330354Z; main]: flush time 294.814708 ms
DWPT 1 [2023-11-21T09:27:11.437615Z; main]: flush time 296.89775 ms
DWPT 1 [2023-11-21T09:27:12.543693Z; main]: flush time 294.519125 ms
DWPT 1 [2023-11-21T09:27:13.650317Z; main]: flush time 294.359458 ms
DWPT 1 [2023-11-21T09:27:14.755599Z; main]: flush time 296.021333 ms
DWPT 1 [2023-11-21T09:27:15.856800Z; main]: flush time 295.717875 ms
DWPT 1 [2023-11-21T09:27:16.961871Z; main]: flush time 297.089042 ms
Using order: DESC
DWPT 2 [2023-11-21T09:27:18.189055Z; main]: flush time 388.857166 ms
DWPT 2 [2023-11-21T09:27:19.378069Z; main]: flush time 380.438708 ms
DWPT 2 [2023-11-21T09:27:20.566168Z; main]: flush time 379.276208 ms
DWPT 2 [2023-11-21T09:27:21.765879Z; main]: flush time 380.318458 ms
DWPT 2 [2023-11-21T09:27:22.955527Z; main]: flush time 380.022583 ms
DWPT 2 [2023-11-21T09:27:24.149801Z; main]: flush time 381.808792 ms
DWPT 2 [2023-11-21T09:27:25.335294Z; main]: flush time 379.536084 ms
DWPT 2 [2023-11-21T09:27:26.522849Z; main]: flush time 379.696875 ms
DWPT 2 [2023-11-21T09:27:27.708379Z; main]: flush time 378.25275 ms
DWPT 2 [2023-11-21T09:27:28.888002Z; main]: flush time 376.322417 ms

Candidate
Baseline

Using order: RANDOM
DWPT 0 [2023-11-21T09:28:48.701620Z; main]: flush time 907.094125 ms
DWPT 0 [2023-11-21T09:28:50.423239Z; main]: flush time 871.719292 ms
DWPT 0 [2023-11-21T09:28:51.959144Z; main]: flush time 792.907334 ms
DWPT 0 [2023-11-21T09:28:53.357324Z; main]: flush time 656.680334 ms
DWPT 0 [2023-11-21T09:28:54.744187Z; main]: flush time 646.845625 ms
DWPT 0 [2023-11-21T09:28:56.267555Z; main]: flush time 657.262625 ms
DWPT 0 [2023-11-21T09:28:57.654678Z; main]: flush time 646.135625 ms
DWPT 0 [2023-11-21T09:28:59.056354Z; main]: flush time 647.583209 ms
DWPT 0 [2023-11-21T09:29:00.456401Z; main]: flush time 648.0325 ms
DWPT 0 [2023-11-21T09:29:01.861058Z; main]: flush time 651.569625 ms
Using order: ASC
DWPT 1 [2023-11-21T09:29:03.110932Z; main]: flush time 439.131041 ms
DWPT 1 [2023-11-21T09:29:04.217419Z; main]: flush time 341.144292 ms
DWPT 1 [2023-11-21T09:29:05.280921Z; main]: flush time 300.120542 ms
DWPT 1 [2023-11-21T09:29:06.333502Z; main]: flush time 293.290125 ms
DWPT 1 [2023-11-21T09:29:07.386953Z; main]: flush time 293.251333 ms
DWPT 1 [2023-11-21T09:29:08.445562Z; main]: flush time 294.627333 ms
DWPT 1 [2023-11-21T09:29:09.500210Z; main]: flush time 294.796584 ms
DWPT 1 [2023-11-21T09:29:10.551829Z; main]: flush time 292.79775 ms
DWPT 1 [2023-11-21T09:29:11.610377Z; main]: flush time 297.584417 ms
DWPT 1 [2023-11-21T09:29:12.673562Z; main]: flush time 294.24025 ms
Using order: DESC
DWPT 2 [2023-11-21T09:29:13.863704Z; main]: flush time 386.06825 ms
DWPT 2 [2023-11-21T09:29:15.012659Z; main]: flush time 382.778625 ms
DWPT 2 [2023-11-21T09:29:16.159515Z; main]: flush time 381.255458 ms
DWPT 2 [2023-11-21T09:29:17.307500Z; main]: flush time 383.2325 ms
DWPT 2 [2023-11-21T09:29:18.456641Z; main]: flush time 384.675875 ms
DWPT 2 [2023-11-21T09:29:19.600958Z; main]: flush time 381.510875 ms
DWPT 2 [2023-11-21T09:29:20.748250Z; main]: flush time 382.213333 ms
DWPT 2 [2023-11-21T09:29:21.902089Z; main]: flush time 378.915958 ms
DWPT 2 [2023-11-21T09:29:23.045118Z; main]: flush time 379.851583 ms
DWPT 2 [2023-11-21T09:29:24.187161Z; main]: flush time 378.107708 ms
Code
  enum Order {
    RANDOM,
    ASC,
    DESC;
  }

  public static void main(String[] args) throws IOException {
    for (Order order : Order.values()) {
      System.out.println("Using order: " + order.name());
      Directory dir = FSDirectory.open(Paths.get("/tmp/a"));
      IndexWriterConfig cfg = new IndexWriterConfig(new StandardAnalyzer());
      cfg.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
      cfg.setInfoStream(new PrintStreamInfoStream(System.out));
      cfg.setMaxBufferedDocs(1_000_000);
      cfg.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH);
      cfg.setIndexSort(
          new Sort(LongField.newSortField("sort_field", false, SortedNumericSelector.Type.MIN)));
      IndexWriter w = new IndexWriter(dir, cfg);
      Document doc = new Document();
      LongField sortField = new LongField("sort_field", 0);
      doc.add(sortField);
      TextField stringField1 = new TextField("string_field", "", Field.Store.NO);
      doc.add(stringField1);
      TextField stringField2 = new TextField("string_field", "", Field.Store.NO);
      doc.add(stringField2);
      TextField stringField3 = new TextField("string_field", "", Field.Store.NO);
      doc.add(stringField3);
      for (int i = 0; i < 10_000_000; ++i) {
        long sortValue =
            switch (order) {
              case RANDOM -> i % 15;
              case ASC -> i;
              case DESC -> -i;
            };
        sortField.setLongValue(sortValue);
        stringField1.setStringValue(Integer.toBinaryString(i % 10));
        stringField2.setStringValue(Integer.toBinaryString(i % 100));
        stringField3.setStringValue(Integer.toBinaryString(i % 1000));
        w.addDocument(doc);
      }
      w.flush();
      w.commit();
      w.close();
    }
  }

@mikemccand
Copy link
Member

I also run the index script to see flush time with this new approach, result in ~15% faster for random data and no regression on asc/desc :)

Hmm it looks like random got a bit slower in candidate? Flush time ~550 ish ms in baseline and maybe ~650 ish ms in candidate? And it's not quite random right -- it's sort of a sawtooth between 0 and 14? Or am I reading the results backwards?

@gf2121
Copy link
Contributor Author

gf2121 commented Nov 21, 2023

Thanks for feedback @mikemccand !

Hmm it looks like random got a bit slower in candidate? Flush time ~550 ish ms in baseline and maybe ~650 ish ms in candidate?

Ohhh! I reconfirmed and it turns out i paste the benchmark result in wrong place, sorry!

And it's not quite random right -- it's sort of a sawtooth between 0 and 14? Or am I reading the results backwards?

Thanks for pointing out this, it is indeed not random enough. I changed the RANDOM to use java.util.Random and renamed the origin RANDOM to ROUND. The new random distribution get improved a bit more, 20+%.

Benchmark Detail

Baseline

Using order: RANDOM
DWPT 0 [2023-11-21T10:49:03.532762Z; main]: flush time 1138.599958 ms
DWPT 0 [2023-11-21T10:49:05.455861Z; main]: flush time 1059.658875 ms
DWPT 0 [2023-11-21T10:49:07.224901Z; main]: flush time 991.167625 ms
DWPT 0 [2023-11-21T10:49:08.847679Z; main]: flush time 850.281875 ms
DWPT 0 [2023-11-21T10:49:10.468672Z; main]: flush time 848.403625 ms
DWPT 0 [2023-11-21T10:49:12.104744Z; main]: flush time 861.536542 ms
DWPT 0 [2023-11-21T10:49:13.731466Z; main]: flush time 851.316958 ms
DWPT 0 [2023-11-21T10:49:15.350887Z; main]: flush time 847.584917 ms
DWPT 0 [2023-11-21T10:49:16.963837Z; main]: flush time 843.849042 ms
DWPT 0 [2023-11-21T10:49:18.579249Z; main]: flush time 843.092666 ms
Using order: ROUND
DWPT 1 [2023-11-21T10:49:20.051903Z; main]: flush time 648.484542 ms
DWPT 1 [2023-11-21T10:49:21.472366Z; main]: flush time 643.642417 ms
DWPT 1 [2023-11-21T10:49:22.889719Z; main]: flush time 644.994541 ms
DWPT 1 [2023-11-21T10:49:24.307484Z; main]: flush time 642.117958 ms
DWPT 1 [2023-11-21T10:49:25.727815Z; main]: flush time 642.38 ms
DWPT 1 [2023-11-21T10:49:27.143574Z; main]: flush time 639.769875 ms
DWPT 1 [2023-11-21T10:49:28.562387Z; main]: flush time 644.234375 ms
DWPT 1 [2023-11-21T10:49:29.975009Z; main]: flush time 639.01125 ms
DWPT 1 [2023-11-21T10:49:31.396969Z; main]: flush time 643.216 ms
DWPT 1 [2023-11-21T10:49:32.810467Z; main]: flush time 639.049041 ms
Using order: ASC
DWPT 2 [2023-11-21T10:49:34.100537Z; main]: flush time 473.33425 ms
DWPT 2 [2023-11-21T10:49:35.236826Z; main]: flush time 352.816167 ms
DWPT 2 [2023-11-21T10:49:36.312917Z; main]: flush time 293.915917 ms
DWPT 2 [2023-11-21T10:49:37.386792Z; main]: flush time 290.221458 ms
DWPT 2 [2023-11-21T10:49:38.463960Z; main]: flush time 287.046708 ms
DWPT 2 [2023-11-21T10:49:39.537561Z; main]: flush time 287.051709 ms
DWPT 2 [2023-11-21T10:49:40.610809Z; main]: flush time 287.296375 ms
DWPT 2 [2023-11-21T10:49:41.686863Z; main]: flush time 290.536083 ms
DWPT 2 [2023-11-21T10:49:42.751377Z; main]: flush time 289.183375 ms
DWPT 2 [2023-11-21T10:49:43.824249Z; main]: flush time 289.238584 ms
Using order: DESC
DWPT 3 [2023-11-21T10:49:45.039267Z; main]: flush time 394.276959 ms
DWPT 3 [2023-11-21T10:49:46.203835Z; main]: flush time 365.40575 ms
DWPT 3 [2023-11-21T10:49:47.359253Z; main]: flush time 364.55 ms
DWPT 3 [2023-11-21T10:49:48.548749Z; main]: flush time 385.198 ms
DWPT 3 [2023-11-21T10:49:49.715963Z; main]: flush time 366.247083 ms
DWPT 3 [2023-11-21T10:49:50.881628Z; main]: flush time 372.473333 ms
DWPT 3 [2023-11-21T10:49:52.037239Z; main]: flush time 367.666041 ms
DWPT 3 [2023-11-21T10:49:53.192338Z; main]: flush time 364.390834 ms
DWPT 3 [2023-11-21T10:49:54.346795Z; main]: flush time 367.417208 ms
DWPT 3 [2023-11-21T10:49:55.506692Z; main]: flush time 374.948625 ms

Candidate

Using order: RANDOM
DWPT 0 [2023-11-21T10:31:14.638348Z; main]: flush time 926.650958 ms
DWPT 0 [2023-11-21T10:31:16.527778Z; main]: flush time 983.61375 ms
DWPT 0 [2023-11-21T10:31:18.105650Z; main]: flush time 745.283416 ms
DWPT 0 [2023-11-21T10:31:19.545346Z; main]: flush time 614.212208 ms
DWPT 0 [2023-11-21T10:31:20.986866Z; main]: flush time 621.046833 ms
DWPT 0 [2023-11-21T10:31:22.418842Z; main]: flush time 613.169292 ms
DWPT 0 [2023-11-21T10:31:23.843488Z; main]: flush time 608.060375 ms
DWPT 0 [2023-11-21T10:31:25.289972Z; main]: flush time 633.770083 ms
DWPT 0 [2023-11-21T10:31:26.729025Z; main]: flush time 617.815 ms
DWPT 0 [2023-11-21T10:31:28.152042Z; main]: flush time 606.253292 ms
Using order: ROUND
DWPT 1 [2023-11-21T10:31:29.546556Z; main]: flush time 540.889709 ms
DWPT 1 [2023-11-21T10:31:30.891868Z; main]: flush time 534.34825 ms
DWPT 1 [2023-11-21T10:31:32.235487Z; main]: flush time 529.94025 ms
DWPT 1 [2023-11-21T10:31:33.585848Z; main]: flush time 538.600959 ms
DWPT 1 [2023-11-21T10:31:34.926304Z; main]: flush time 535.212458 ms
DWPT 1 [2023-11-21T10:31:36.261841Z; main]: flush time 529.868792 ms
DWPT 1 [2023-11-21T10:31:37.612535Z; main]: flush time 532.926375 ms
DWPT 1 [2023-11-21T10:31:38.950114Z; main]: flush time 531.968 ms
DWPT 1 [2023-11-21T10:31:40.283548Z; main]: flush time 529.449208 ms
DWPT 1 [2023-11-21T10:31:41.621569Z; main]: flush time 531.614458 ms
Using order: ASC
DWPT 2 [2023-11-21T10:31:42.931710Z; main]: flush time 466.0205 ms
DWPT 2 [2023-11-21T10:31:44.110242Z; main]: flush time 361.563833 ms
DWPT 2 [2023-11-21T10:31:45.270395Z; main]: flush time 344.598167 ms
DWPT 2 [2023-11-21T10:31:46.391066Z; main]: flush time 297.298416 ms
DWPT 2 [2023-11-21T10:31:47.508596Z; main]: flush time 292.465833 ms
DWPT 2 [2023-11-21T10:31:48.619912Z; main]: flush time 294.3465 ms
DWPT 2 [2023-11-21T10:31:49.733508Z; main]: flush time 294.211834 ms
DWPT 2 [2023-11-21T10:31:50.844318Z; main]: flush time 292.396292 ms
DWPT 2 [2023-11-21T10:31:51.957632Z; main]: flush time 294.951792 ms
DWPT 2 [2023-11-21T10:31:53.060245Z; main]: flush time 293.81875 ms
Using order: DESC
DWPT 3 [2023-11-21T10:31:54.309892Z; main]: flush time 397.02825 ms
DWPT 3 [2023-11-21T10:31:55.507951Z; main]: flush time 375.452125 ms
DWPT 3 [2023-11-21T10:31:56.705769Z; main]: flush time 379.94275 ms
DWPT 3 [2023-11-21T10:31:57.916353Z; main]: flush time 374.742583 ms
DWPT 3 [2023-11-21T10:31:59.098488Z; main]: flush time 370.185083 ms
DWPT 3 [2023-11-21T10:32:00.286668Z; main]: flush time 373.631208 ms
DWPT 3 [2023-11-21T10:32:01.479051Z; main]: flush time 369.689833 ms
DWPT 3 [2023-11-21T10:32:02.665413Z; main]: flush time 370.781 ms
DWPT 3 [2023-11-21T10:32:03.841312Z; main]: flush time 372.006916 ms
DWPT 3 [2023-11-21T10:32:05.019313Z; main]: flush time 374.449833 ms
Code
enum Order {
    RANDOM,
    ROUND,
    ASC,
    DESC;
  }

  public static void main(String[] args) throws IOException {
    Random random = new Random(4317849138248L);
    for (Order order : Order.values()) {
      System.out.println("Using order: " + order.name());
      Directory dir = FSDirectory.open(Paths.get("/tmp/a"));
      IndexWriterConfig cfg = new IndexWriterConfig(new StandardAnalyzer());
      cfg.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
      cfg.setInfoStream(new PrintStreamInfoStream(System.out));
      cfg.setMaxBufferedDocs(1_000_000);
      cfg.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH);
      cfg.setIndexSort(
          new Sort(LongField.newSortField("sort_field", false, SortedNumericSelector.Type.MIN)));
      IndexWriter w = new IndexWriter(dir, cfg);
      Document doc = new Document();
      LongField sortField = new LongField("sort_field", 0);
      doc.add(sortField);
      TextField stringField1 = new TextField("string_field", "", Field.Store.NO);
      doc.add(stringField1);
      TextField stringField2 = new TextField("string_field", "", Field.Store.NO);
      doc.add(stringField2);
      TextField stringField3 = new TextField("string_field", "", Field.Store.NO);
      doc.add(stringField3);
      for (int i = 0; i < 10_000_000; ++i) {
        long sortValue =
            switch (order) {
              case RANDOM -> random.nextLong(15);
              case ROUND -> i % 15;
              case ASC -> i;
              case DESC -> -i;
            };
        sortField.setLongValue(sortValue);
        stringField1.setStringValue(Integer.toBinaryString(i % 10));
        stringField2.setStringValue(Integer.toBinaryString(i % 100));
        stringField3.setStringValue(Integer.toBinaryString(i % 1000));
        w.addDocument(doc);
      }
      w.flush();
      w.commit();
      w.close();
    }
  }

Copy link

github-actions bot commented Jan 8, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 8, 2024
@github-actions github-actions bot removed the Stale label Jan 9, 2024
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants