Asynchronous, but still janking on large (>2k) lists #35

krnlde · 2015-12-11T13:02:47Z

As mentioned here knockout/knockout/issues/1954 your fastForEach implementation has recognizable improvements over the original ko-foreach. But still I experience big janks of 1-2 sec when iterating over a list with 5k entries.

Mind the chrome timeline capture here:

The biggest part of the 2876ms frame is the purple "recalculating style". I didn't get into the "animation frame fired" JS part too deep, but it feels like it is not janking.

brianmhunt · 2015-12-11T14:08:06Z

Interesting, thanks @krnide. I'd really like to fix that – and it's precisely the kind of scenario I'd like FFE to be comfortable with.

Can you give any more info on what's happening in the recalculating part? Or perhaps share a jsFiddle/CodePen/etc?

brianmhunt · 2015-12-11T14:09:40Z

Incidentally, if you're using a <li> tag does li {display:block; } help? Per http://stackoverflow.com/questions/19796057

brianmhunt · 2015-12-11T14:11:02Z

Incidentally, at one point I was adding/removing 100,000 items from lists, without much janking. I wonder if there's been a regression.

krnlde · 2015-12-11T14:12:10Z

It is exactly your example: http://output.jsbin.com/dakihezega/2

Open the site in Chrom[e|ium]
Open its dev tools
navigate to "Timeline"
Hit Record
check "use fastForEach"
Enter 5000 in the text box
Hit add items
Wait for ~5 seconds
Stop recording
Analyze

~~Regarding your hint with the display: block: I would pretty much like to use the fastForEach in SVG, which has no display: block~~ SVG has display:block, but does not do any difference.

krnlde · 2015-12-11T14:15:33Z

btw Firefox behaves exactly the same. Same jank time.

brianmhunt · 2015-12-11T14:33:58Z

Oh yes, I see. Thanks for clarifying. :)

So the good news is that the time is linear i.e. inserting 500 items is 150ms, 5000 items is 1500ms. The problem, then, is not the insertion of items but the rendering of off-screen items (whose rendering needn't be immediate).

I'm mulling solutions, along the lines of: http://stackoverflow.com/a/12613687/19212 – basically delaying the rendering of items that are off-screen. My thinking is along the lines of, right after insertion into the DOM:

// Per http://stackoverflow.com/a/7557433/19212
function isVisible(node) {
    var rect = el.getBoundingClientRect();
    return (
        rect.top >= 0 &&
        rect.left >= 0 &&
        rect.bottom <= window.innerHeight &&
        rect.right <= window.innerWidth
    );
}

childNode.forEach(function (node) {
   if isVisible(node) {
     // leave it alone
   } else {
     // insert a vertical placeholder the same height as the node,
     // then delay insertion of the node for a few milliseconds
     // so we don't get jank.
   }
})

Does that strategy sound like it'd hit the mark?

krnlde · 2015-12-11T14:37:48Z

The idea of deferring the actual draw is exactly what I need. But how would you tell the browser not to render the node? Deferring the Node.appendChild()? My guess is, if you defer node insertions to a timeframe of ~25ms each, it would kill the jank entirely - and as a result would build up the screen part by part. This, however, could lead to unnecessary repaints of surrounding (depending) objects, but only if you are in a page flow and not positioning everything absolute, which I won't do in an SVG environment.

brianmhunt · 2015-12-11T14:46:32Z

Hmm. A simpler option might be to batch childNodes by chunks of e.g. 50.

In that case we're adding an O(n) splicing operation to the node list, but the splicing is probably way less overhead than the also-linear-time DOM node insertion.

krnlde · 2015-12-11T14:47:06Z

👍 SGTM

brianmhunt · 2015-12-11T14:49:08Z

Thanks for reporting this @krnlde – It's an interesting use case.

I'll mull for a bit on the best way to do this. @cervengoc Do you have any thoughts on this?

krnlde · 2015-12-11T14:51:42Z

My only objection: Rather than putting the chunks in a definite frame (like you said 50) I would listen for the heartbeat of the browser. var start = Date.now() before the operation and then lookup if (Date.now() - start < 25) deferFollowingChunksToNextLoop() in every iteration.
Browsers can profit from faster CPUs, while slower ones still don't jank.

brianmhunt · 2015-12-11T14:53:24Z

@krnlde I agree.

cervengoc · 2015-12-11T15:27:16Z

@brianmhunt I don't completely understand the use case and the problem on first read, and unfortunately I currently don't have time to get deeper into it.

One note: Chunking up the batch insertions shouldn't be so hard to implement. But I think for the sake of simplicity I would stay with a built-in constant chunk size instead of measuring the timing. The reason is that the code part which batches insertions is far before actually inserintg nodes. So let's say I would probably limit the batch insertions to 300, that would cover almost every usual scenarios I think.

Hope I didn't misunderstood something or wrote something completely silly :)

brianmhunt · 2015-12-11T15:32:27Z

Thanks @cervengoc – appreciate the insight. :)

krnlde · 2015-12-11T15:34:12Z

The simplicity argument is signed. But what if you have a costly DOM insertion which itself needs 300ms? Then you would have 300x300ms of blocking iterations. In case of the time measurement the browser would insert one node per loop and, yes, still jank but not for a solid 90 seconds :-P

Don't get me wrong, I'm exaggerating the consequences of course. But the time measurement is just e few characters and 1 line more than to just check if the chunk size is reached. Plus, if your CPU is capable of rendering your full (say 10k) array in one 25ms time frame, you would prevent him from doing that by applying your chunk size regardlessly and thus slowing down the computation.

cervengoc · 2015-12-11T15:38:51Z

The cost of DOM insertion can widely vary, even data item by data item based on data structure, bindings, etc. It's really hard to measure it I think.

In my personal opinion it should be very rare that someone has a template where insertion costs 300ms. Actually I wouldn't add the complexity to support it dynamically. If one has such a DOM structure, then he could set an option for chunk size explicitly.

krnlde · 2015-12-11T15:41:10Z

Not complicated at all, it's that easy:

var start = Date.now();
for (;;) {
  if (Date.now() - start < 25) appendNode();
  else return deferToNextEventLoop();
}

(pseudocode)

cervengoc · 2015-12-11T15:43:36Z

Okay, but with the batch insertion the inserted node is actually a documentFragment with several hundreds of list items. It's possible that I don't see something trivial though, but it seems not trivial for me to implement it.

If we didn't have the documentFragment utilization it would be more simple, but that optimization is key in much more often scenarios IMHO.

brianmhunt · 2015-12-11T15:44:13Z

@krnlde It's not quite that simple, as @cervengoc points out.

I think it's doable, but it'll take some rather cerebral thinking. :)

krnlde · 2015-12-11T15:45:20Z

Touché. For the ko world it might be true. That's why I'm requesting help here :)

Thank you guys for your time and your help so far.

cervengoc · 2015-12-11T15:46:02Z

By the way @brianmhunt how can it be that even when using documentFragment the browser still janks? Isn't inserting a documentFragment kind of an atomic thing?

brianmhunt · 2015-12-11T15:49:32Z

@cervengoc It totally is atomic, but as you can see from the "recalculating style" in the timeline capture @krnlde posted above, it can be a very long atomic thing. :)

cervengoc · 2015-12-11T15:54:47Z

Still I think the best way to give the ability to set a chunk size with an option. I think it' rare that a data collection is so heterogenous that one item takes 5ms and another takes 100.

The default could be to not use any chunking for keeping the performance, and developers could pass a desired value based on their knowledge about their data structure.

What do you think?

krnlde · 2015-12-11T15:58:21Z

it can be a very long atomic thing

lol. Which consequently can't be deferred to multiple event-loops. Also WebWorkers won't work here, because they don't have access to the DOM ~~afaik~~.

krnlde · 2015-12-11T16:04:43Z

The default could be to not use any chunking for keeping the performance, and developers could pass a desired value based on their knowledge about their data structure.

I can definitely commit to that.

cervengoc · 2015-12-11T16:07:16Z

Okay, so I'll start to think about it in the background but due to time reasons don't expect a solution thi year, at least not from me :)

krnlde · 2015-12-11T16:08:07Z

No probs. Contributions are always welcome, regardless what time :) Thank you!

krnlde · 2015-12-15T13:17:03Z

In the meantime, maybe I can have a look at that stuff. Can you provide some details how you would solve that around the whole insertion? Sounds like Promises are a good bet here. After a short look it appears to me that the only function depending on the node insertion is the afterAdd method which may or may not be provided through a spec. Am I right so far? That would mean we could insert the nodes completely asynchronously by first inserting the ko.virtualElements.insertAfter(containerNode, frag, insertAfterNode); (#L309) and then, step-by-step, insert the frag.appendChild(nodeOrNodeArrayToInsert[i]); on top of it asynchronously (#L307). After that we call the afterAdd method of the spec either through a Promise or directly via callbacks-style.

Also from what I can see the afterQueueFlushed method is called after each cycle and not at the end of the whole queue, which would inform listeners over a partially ready list and not over a whole list - maybe that was your guys intention so you can react on the nodes immediatelly. But then not only the name "queue flushed" is wrong but also the latter usecase is missing.

cervengoc · 2015-12-15T15:50:15Z

@krnlde Thank you for your interest and efforts. However, before looking into this I would really like to see #34 merged, because I've made major changes in internal logic which surely affects a proposed implementation of this as well. @brianmhunt will probably merge it soon :)

krnlde · 2015-12-15T16:12:34Z

I stay tuned and subscribe to that issue :)

krnlde · 2016-01-08T14:20:38Z

I hate IE so much. ;)

brianmhunt · 2016-01-08T14:31:53Z

Yeah, if it works in Safari + Chrome + Firefox, then I'd be inclined to report the problem upstream. ... if there was an upstream to whom to report it. :)

cervengoc · 2016-01-08T14:34:09Z

I think nowadays one can almost always afford such minor issues of IE. It's important to think about kind of a "price/performance" of a feature/fix/workaround. In one word, I wouldn't care about it too much if it works in other major browsers like this.

krnlde · 2016-01-08T14:34:24Z

I'll have it tested in Safari as well. I think in an hour I am good to go.

krnlde · 2016-01-08T16:24:23Z

I tested it again with around 2k dots. Right now I can only provide a Timeline from Chrome but here happens the same, the processQueue is called very often.

Here is a detail of a single rAF call:

What makes me wonder is that the frame size is so long (2115ms). I'm sure it is not the time the browser janks, it is just the time it needed to compute the array (while being reactive at that time, thanks to rAF). The flame chart breaks down to really nice and short atomic instructions - so at least this is no concern.

brianmhunt · 2016-01-08T16:31:12Z

Weird. The rAF should be at-most 1/60th of a second. The chain should look like this:

- processQueue
  - animateFrame
    - ...
    - insertAllAfterByChunks
      - animateFrame
        - insertAllAfterByChunks
          - animateFrame

In other words, there should be one, and only one, top-level processQueue for each arrayChange
event.

Perhaps this is what is happening, but the presentation (for the rAF) is "up-ended" so the animateFrame appears to be the caller – when in reality it's the processQueue. In other words, visually it is many calls to processQueue when in reality it is one call spread out over many frames.

What's the output when you do a console.log in processQueue? (i.e. to count the number of times it's called)

krnlde · 2016-01-08T16:36:09Z

When I console log inside processQueue it shows the log exactly as often as in the flame chart. oO

OK, but this could be the nature of my app. Since I have other things going on and I also have nested foreach's

krnlde · 2016-01-08T16:49:09Z

Alright, just clearing things up: When I also put a console.log inside FastForEach.prototype.insertAllAfterByChunks I see exactly what should happen. As said the amount of processQueues is in because of my special setup. The pure dots are computed in exactly one processQueue with lots of insertAllAfterByChunks. It is just funny that I don't find that function in the flame chart. I only see the FastForEach.prototype.added which is as big as the processQueue.

krnlde · 2016-01-08T16:52:39Z

Found it, is on the end of the FastForEach.prototype.added function:

krnlde · 2016-01-08T17:21:48Z

A test with ~12k dots scales in the same manner in Chrome. No jank, just a little more time to draw the dots. We (I) should definitely reduce the chunk size, because it only draws 1-2 frames each sec.

Edge has bigger problems hovever. These frames I was mentioning earlier seem to block the browser entirely - for solid 5-6 secs on ~12k dots. :-/

But as @cervengoc said, it's just IE/Edge... I think we made awesome progress on fast-foreach and I definitely appreciate the work you guys have done (fortunately not only for me 👍 )! Thank you!

cervengoc · 2016-01-11T11:18:13Z

Hi, I've updated my fork to allow configuring the chunk size. I've also fixed a critical issue as the DOM node chunks got inserted in reverse order.

However, I'm still not sure if we should allow the chunk size configuration at a per-binding level, or it's enough to test and find an optimal value for "everyone". I think that this option is too "technical" to include as a binding option, but will see.

For now you can globally set FastForEach.DOM_INSERTION_CHUNKSIZE, which has a default value of 50.

Also, I've found some cases where the internal logic is affected and I have to think it over a few more times, so this is still far from being stable and ready to be merged.

krnlde · 2016-01-11T11:38:25Z

The biggest concern would be a modification on the array while insertAllAfterByChunks. You sould lock the array for that time and apply the modification as soon as the insertion is done.

cervengoc · 2016-01-11T11:43:50Z

Yes, something like that. Removes are not problem, but any insert on an index which has a pending DOM insertion could break it all at first glance.
The safest way would be to delay the processQueue call until the insertions complete. This could be easily implemented with an internal state flag. Then the processQueue would delay itself until it has a "free" frame. This way we could overcome every "conflict" scenarios. @brianmhunt what do you think?

krnlde · 2016-04-14T09:35:10Z

Hey guys! It's been a while we talked so I just wanted to bump this topic since all our precious work we've done gets lost if we do not merge it back to the master.

How can I help?

cervengoc · 2016-04-14T10:38:28Z

@krnlde Unfortunately I'm completely drowning in my current work, and don't have time for it now. Possible in a few weeks it will get lighter.

As I wrote earlier, the delaying of processQueue would be a good solution I think to handle concurrent DOM inserts and array modifications.

Also, it would be needed to cover some cases with new tests, and I have a feeling that some of them will not be trivial.

krnlde · 2016-04-14T11:11:22Z

I could write some tests I guess. Concurrent modifications are indeed a problem. But that should be testable, too.

I feel not comfortable enough to edit the plugin though, so unfortunately no library specific help for processQueue from my side :-/

cervengoc · 2016-04-14T11:30:06Z

I can make that modification, the coding itself should not be a big deal, but I have no time to design and write tests for it. So if you can help with that, I would appreciate it. I will try to make these changes in the next days.

cervengoc · 2016-05-09T21:47:00Z

@krnlde @brianmhunt Sorry for the huge delay, I finally had some time to implement the aforementioned synchronization between queue processing and DOM manipulations (insertions for now).You can find it in my fork's latest code.

The implementation is simply based on a flag which is set until chunked DOM insertions finish. If this flag is set, the processQueue method simply delays itself for the next frame, and so on.

Note that I unfortunately didn't have time for preparing tests for all of this stuff, would be great if someone could join the party :)

krnlde · 2016-05-10T15:37:01Z

Hey @cervengoc thank you for your implementation. Where should I post issues to this? Here or in your repo?

cervengoc · 2016-05-10T18:00:09Z

I think we should move this topic there for clarity.

cervengoc · 2016-05-11T10:57:24Z

@brianmhunt @krnlde Sorry guys, I messed up some things. I completely reverted the previous implementation and found another one, which you can find in my fork. I had to recreate the complete fork because I'm such a noob with git, I am suffering a lot with more complex tasks.

The problem with the previous implementation was that it partitioned only the DOM additions or large node sets. However, in many cases the whole processing can be a lot of work (eg. applying bindings to all children when bindings are complex, for example including nested fastForEach bindings, etc.).

So the new implementation partitions the whole processing. I'll write some comments on the commit.

krnlde · 2016-05-11T11:31:17Z

You were faster than me :) I'll hold back the bug report and test the new implementation. Thank you again!

krnlde · 2016-05-20T16:21:59Z

@cervengoc unfortunately I can't create issues in your fork so this'll have to do: I created an isolated scenario with your new fastForEach implementation here: https://jsfiddle.net/krnlde/huhz0p9x/

Each block contains 3200 circles.

You can see in line 485 that I am timing the actual draw by starting the timer and stopping it the (quasi) next event loop.

Now open the console in Chrome. When you hover over the gray blocks you'll see a lot of white circles appearing and as you mouseout they'll disappear. The process is still synchronous and blocks the browser for around 250ms avg on my PC. Plus on my real world example the second mouseover on a block (after a fresh page reload) is incredibly slow to draw (around the factor "times 10"). After that slow draw, everything goes back to normal. Could this have anything to do with the nested FFE?

Hope I could help a bit

cervengoc · 2016-05-20T21:07:25Z

@krnlde Thanks for the feedback. I'm sorry I forgot to mention that the new implementation is an opt-in setting, which means that by default it doesn't use partitioning.

You have to set the partition option at binding, like

data-bind="fastForEach: { data: items, partition: 50 }"

Also, note that measuring time can take some resources itself, so please at first step, test without it.

krnlde · 2016-06-09T11:16:58Z

This works really good! Thank you for this. Only issue right now is in a nested fastForEach the partition would run in parallel. For example if you have 10 rows and 20 cols and you iterate over them with fastForEach like so:

<!-- ko fastForEach: {data: rows, partition: 5} -->
<div class="row">
  <!-- ko fastForEach: {data: cols, partition: 5} -->
  <div class="col">
    ...
  </div>
  <!-- /ko -->
</div>
<!-- /ko -->

in the first iteration it'll draw 5 rows and 0 cols. In the second iteration it'll draw another 5 rows but now it'll draw 5 cols in each of the previously drawn rows, so it is 5 * 5 = 25 rows in the second iteration.

What I would expect that every partition is globally queued and processed. Where should I put this issue @cervengoc? Could you open the "Issues" Tab in your forked FFE so we can write there? Also I need some advice on how to write and execute the tests.

krnlde · 2016-06-17T08:24:07Z

Bump

cervengoc · 2016-06-21T10:45:06Z

@krnlde I'll try to enable Issues in my fork.

However, this nested behavior is IMO the expected. I think it's a very rare real-world need to nest two fast-foreach instances each having a defined partition size, because in any case which I can think of you have one "dimension" which is "heavy" and others not.

Also, in pure technical speaking it would add a big complexity to introduce some kind of global queue handling and IMO it's far not worth it. My current implementation is I think simple enough to find a way to the main fork.

cervengoc · 2016-08-03T21:05:26Z

Are here any news? I had to restructure my fork, but the implementation is still there in its own branch.

krnlde · 2016-08-03T21:16:25Z

From my side everything is fine. I wrote the things that bothered me, but apart from that there's only the tests that need to be written. I was asking how to do it. Could you clarify?

cervengoc · 2016-08-03T21:27:01Z

Ok. This is not easy to test in my opinion. One option would be to fake the animateFrame function of fastForEach so it would provide some kind of callback where we can check after each cycle if the partitioning really happens, eg. the appropriate number of child elements are there after each cycle.

What I did for the sake of quick check is duplicated all the tests with a non-zero partition size to see if it breaks anything. But I don't like this testing technique, so we'll have to prepare some normal tests.

For test running we use karma, and for the unit tests and assertions mocha and chai currently. All the tests are in one big js file now in the folder spec. If you check the existing tests, you can have an idea what's happening.

But anyway, when I have some time, I can prepare those tests too. Just don't know when yet.

Asynchronous, but still janking on large (>2k) lists #35

Asynchronous, but still janking on large (>2k) lists #35

Comments

krnlde commented Dec 11, 2015

brianmhunt commented Dec 11, 2015

brianmhunt commented Dec 11, 2015

brianmhunt commented Dec 11, 2015

krnlde commented Dec 11, 2015

krnlde commented Dec 11, 2015

brianmhunt commented Dec 11, 2015

krnlde commented Dec 11, 2015

brianmhunt commented Dec 11, 2015

krnlde commented Dec 11, 2015

brianmhunt commented Dec 11, 2015

krnlde commented Dec 11, 2015

brianmhunt commented Dec 11, 2015

cervengoc commented Dec 11, 2015

brianmhunt commented Dec 11, 2015

krnlde commented Dec 11, 2015

cervengoc commented Dec 11, 2015

krnlde commented Dec 11, 2015

cervengoc commented Dec 11, 2015

brianmhunt commented Dec 11, 2015

krnlde commented Dec 11, 2015

cervengoc commented Dec 11, 2015

brianmhunt commented Dec 11, 2015

cervengoc commented Dec 11, 2015

krnlde commented Dec 11, 2015

krnlde commented Dec 11, 2015

cervengoc commented Dec 11, 2015

krnlde commented Dec 11, 2015

krnlde commented Dec 15, 2015

cervengoc commented Dec 15, 2015

krnlde commented Dec 15, 2015

krnlde commented Jan 8, 2016

brianmhunt commented Jan 8, 2016

cervengoc commented Jan 8, 2016

krnlde commented Jan 8, 2016

krnlde commented Jan 8, 2016

brianmhunt commented Jan 8, 2016

krnlde commented Jan 8, 2016

krnlde commented Jan 8, 2016

krnlde commented Jan 8, 2016

krnlde commented Jan 8, 2016

cervengoc commented Jan 11, 2016

krnlde commented Jan 11, 2016

cervengoc commented Jan 11, 2016

krnlde commented Apr 14, 2016

cervengoc commented Apr 14, 2016

krnlde commented Apr 14, 2016

cervengoc commented Apr 14, 2016

cervengoc commented May 9, 2016 • edited Loading

krnlde commented May 10, 2016

cervengoc commented May 10, 2016

cervengoc commented May 11, 2016 • edited Loading

krnlde commented May 11, 2016

krnlde commented May 20, 2016 • edited Loading

cervengoc commented May 20, 2016

krnlde commented Jun 9, 2016 • edited Loading

krnlde commented Jun 17, 2016

cervengoc commented Jun 21, 2016

cervengoc commented Aug 3, 2016

krnlde commented Aug 3, 2016

cervengoc commented Aug 3, 2016

cervengoc commented May 9, 2016 •

edited

Loading

cervengoc commented May 11, 2016 •

edited

Loading

krnlde commented May 20, 2016 •

edited

Loading

krnlde commented Jun 9, 2016 •

edited

Loading