Improve the performance of certain GraphNode methods and avoid recalculating the AABB multiple times during rendering. #7245

querielo · 2025-01-03T12:17:42Z

I experimented with placing multiple animated skinned meshes in Playcanvas and Unity 6.1, which is going to support WebGPU to address huge performance issues with skinned meshes.

I understand that it's best to bake animations for skinned meshes instead of using the Anim component when dealing with high number of animated skinned meshes. But still...

The PR proposes replacing the recursive implementation of core GraphNode methods with an iterative approach and avoid recalculating the AABB multiple times during rendering to improve performance of skinned meshes.

Here are my results:

Main branch	PR

…e aabb a few times while rendering

Maksims · 2025-01-03T13:30:47Z

Please test the memory footprint and allocation/garbage collection difference in dynamic scenes with/without this PR. Glancing into the code, this PR will introduce a lot of allocations (arrays, pop, unshift, etc) into hot code (the code that might be running multiple times on every frame).

This might lead to a minor/major GC stalls.

querielo · 2025-01-03T13:44:21Z

@Maksims It is a good point, that's why _tmpGraphNodeStack was added. It doesn's looks like there is a big difference in terms of memory

Main	PR

src/scene/graph-node.js

willeastcott · 2025-01-03T13:48:09Z

src/scene/graph-node.js

@@ -98,6 +73,12 @@ function findNode(node, test) {
 * a powerful set of features that are leveraged by the `Entity` class.
 */
 class GraphNode extends EventHandler {
+    /**
+     * @type {GraphNode[]}
+     * @ignore


Should this be @private instead?

willeastcott · 2025-01-03T13:54:03Z

This is very cool. I was wondering...since this is such performance critical code, would it be more efficient to store the stack pointer (index)? Then, a pop is a decrement and a push is an array 'write at index'. And to start using the stack, just set the pointer number to 0. Obviously, Array#push and Array#pop are very optimized...so I don't know if this would really make a tangible difference.

willeastcott · 2025-01-03T16:37:41Z

src/scene/graph-node.js

@@ -594,11 +593,22 @@ class GraphNode extends EventHandler {
        const results = [];
        const test = createTest(attr, value);

-        this.forEach((node) => {
+        const stack = GraphNode._tmpGraphNodeStack;


Since forEach is now optimized to execute iteratively, why duplicate the iterative loop again here?

AlexAPPi · 2025-01-03T17:00:15Z

This approach eats up a lot of memory, maybe we should move away from storing references to objects in arrays on indexes?

about: https://www.mattzeunert.com/2018/01/25/v8-javascript-memory-quiz.html

jsbench: https://jsbench.me/cvm5h498pk/1

export interface Node {
    get parent(): Node;
    get children(): Node[];
}

export type SKIP_DIVE_OPERATION = 1;
export type ABORT_OPERATION = 2;
export type TECWDC<T> = (child: T, index: number, depth: number) => void | SKIP_DIVE_OPERATION | ABORT_OPERATION;

export function everyChildWithDive<T extends Node>(node: T, callback: TECWDC<T>, offset?: number): void | ABORT_OPERATION;
export function everyChildWithDive(node: Node, callback: TECWDC<Node>, offset: number = 0): void | ABORT_OPERATION {

    let children = node.children;
    let length   = children.length;

    if (offset < 0) {
        offset += length;
    }

    if (length < 1 || offset < 0) {
        return;
    }

    let parent = node;
    let stackIndex = 0;

    const stack = [offset];

    while (offset < length) {

        const child  = children[offset];
        const result = callback(child, offset, stackIndex);

        if (result === 2) {
            return 2;
        }

        offset++;

        if (result !== 1) {

            const childChildren = child.children;
            const childChildrenLength = childChildren.length;

            if (childChildrenLength > 0) {

                stack.push(0);
                stack[stackIndex++] = offset;

                offset   = 0;
                parent   = child;
                children = childChildren;
                length   = childChildrenLength;

                continue;
            }
        }
        
        while (offset === length && stackIndex > 0) {
            stack.pop();
            offset   = stack[--stackIndex];
            parent   = parent.parent;
            children = parent.children;
            length   = children.length;
        }
    }
}

querielo · 2025-01-03T23:21:58Z

@AlexAPPi I'll look at your suggestion tomorrow, ok?

Regarding my solution, as noted in the comment, I'm using shift() in the forEach method to implement a breadth-first approach (it is a queue instead of a stack):

const queue = [node];
while (queue.length > 0) {
    // shift() is used for breadth-first approach; use pop() for depth-first
    const current = queue.shift();
    
    // . . .
    
    const children = current.children;
    for (let i = 0; i < children.length; i++) {
        queue.push(children[i]);
    }
}

While breadth-first traversal tends to be slower and consume more memory than depth-first, it's necessary here to pass unit tests that expect a specific order. I'd prefer to use pop() for a depth-first approach, but this would require modifying some unit tests. I'm hesitant to make such changes, as they might disrupt components that depend on the current order.

Note: It appears that the third solution from https://jsbench.me/cvm5h498pk/1 has a bug (I'm not sure about purpose of the break statement).

I cannot run the fourth solution:

But as I can see the depth-first solution is faster than the second solution:

Thank you for this article!

@willeastcott can I change breadth-first to depth-first? And a unit test that requires breadth-first? I'll try to use an offset instead of shift() but it probably consumes more memory or implement a fast queue.

querielo · 2025-01-04T12:27:55Z

I'll try to implement a solution with a queue pool (breadth-first). It is 20% slower than using a pure offset, but requires less memory. For example, on my scene with 21_943 nodes a capacity of an auto extending circular queue is 4096.

… tests for them, use the classes in the forEach method of GraphNode

querielo · 2025-01-05T13:40:40Z

Update:

Introduce a new Queue class that implements a circular buffer, automatically resizing its capacity as needed.
- Added tests for Queue.
Add a free method to ObjectPool to allow deallocation of objects back into the pool.
- Refactor internal _resize logic in ObjectPool to track object indices.
- Added tests for ObjectPool.
Update GraphNode.forEach to use a pooled Queue for iteration, reducing memory allocations in breadth-first traversals.

Checked the method on a large-scale scene to confirm its performance and accuracy. The first run is slow (about 8ms), but other runs are fast.

Main	PR

willeastcott · 2025-01-05T20:26:17Z

Thanks @querielo - hopefully @mvaligursky can do a review tomorrow. 😄

Maksims · 2025-01-06T07:40:20Z

Changing the order of execution in public APIs definitely will break people's project, so we always avoid any breaking changes of public APIs.

Maksims · 2025-01-06T09:26:36Z

I went, and did various tests for specific mathods of a GraphNode: findByName (which uses findNode) and forEach.

And the performance execution of findByName and forEach is reduced in some cases, and is statistically insignificant in overall.

On my machine (windows, gaming laptop, chome), here are the stats:

Master Branch: 15,355ms
This PR: 15,341ms
Master Branch + Simplified findByName: 14,303ms (on that later).

Please run the test provided below.

Observations:

Performance of a very simple case of findByName - is reduced by ~60% (not good).
Performance of a very simple case of forEach - is reduced by ~50% (not good).
Performance of early hierarchy match - is dramatically reduced, it is now inverse with what it used to be when matching late. In complex hierarchies this will lead to a magnitude of longer lookup timings when match would be early.
Performance on complex hierarchies, does not change, but has a reversed cost between early/late matches.
forEach cost is downgraded also, between any complexity of the hierarchy.

The complexity introduced by this PR in certain methods is not justified by increased memory usage and lack of performance benefits.

It would be best to tackle AABB's issue in a different way, so it does not bring complexity to core methods of GraphNode, or at least it is significantly beneficial to it.

Also, there is a behaviour difference on forEach and potentially related methods, where modifications to immediate children during forEach execution with this PR will be different. As it collects an array of children before executing callback on them. So if you delete subsequent entities during forEach, with this PR it will call forEach for already removed entities.

Regarding findByName - it constructs a method to test for the property, this can be optimized, by not creating test function and avoiding extra executions. A simple change for that, makes findByName ~12% faster.

Here is the test:

import { deviceType, rootPath } from 'examples/utils';
import * as pc from 'playcanvas';

const canvas = /** @type {HTMLCanvasElement} */ (document.getElementById('application-canvas'));
window.focus();

const gfxOptions = {
    deviceTypes: [deviceType],
    glslangUrl: `${rootPath}/static/lib/glslang/glslang.js`,
    twgslUrl: `${rootPath}/static/lib/twgsl/twgsl.js`
};

const device = await pc.createGraphicsDevice(canvas, gfxOptions);
device.maxPixelRatio = Math.min(window.devicePixelRatio, 2);

const createOptions = new pc.AppOptions();
createOptions.graphicsDevice = device;

createOptions.componentSystems = [pc.RenderComponentSystem, pc.CameraComponentSystem, pc.LightComponentSystem];
createOptions.resourceHandlers = [pc.TextureHandler, pc.ContainerHandler];

const app = new pc.AppBase(canvas);
app.init(createOptions);
app.start();

// Set the canvas to fill the window and automatically change resolution to be the same as the canvas size
app.setCanvasFillMode(pc.FILLMODE_FILL_WINDOW);
app.setCanvasResolution(pc.RESOLUTION_AUTO);

// Ensure canvas is resized when window changes size
const resize = () => app.resizeCanvas();
window.addEventListener('resize', resize);
app.on('destroy', () => {
    window.removeEventListener('resize', resize);
});

// create box entity
const box = new pc.Entity('cube');
box.addComponent('render', {
    type: 'box'
});
app.root.addChild(box);

// create camera entity
const camera = new pc.Entity('camera');
camera.addComponent('camera', {
    clearColor: new pc.Color(0.5, 0.6, 0.9)
});
app.root.addChild(camera);
camera.setPosition(0, 0, 3);

// create directional light entity
const light = new pc.Entity('light');
light.addComponent('light');
app.root.addChild(light);
light.setEulerAngles(45, 0, 0);


const entitySingle = new pc.Entity('single');
app.root.addChild(entitySingle);

const entitySimple16 = new pc.Entity('simple-16');
for(let i = 0; i < 16; i++) {
    const child = new pc.Entity(`child-${i}`);
    entitySimple16.addChild(child);
}
app.root.addChild(entitySimple16);

const entityComplex = new pc.Entity('complex');
for(let a = 0; a < 32; a++) {
    const childA = new pc.Entity(`child-${a}`);
    entityComplex.addChild(childA);

    for(let b = 0; b < 16; b++) {
        const childB = new pc.Entity(`child-${a}-${b}`);
        childA.addChild(childB);

        for(let c = 0; c < 8; c++) {
            const childC = new pc.Entity(`child-${a}-${b}-${c}`);
            childB.addChild(childC);

            for(let d = 0; d < 4; d++) {
                const childD = new pc.Entity(`child-${a}-${b}-${c}-${d}`);
                childC.addChild(childD);
            }
        }
    }
}
app.root.addChild(entityComplex);


// rotate the box according to the delta time since the last frame
app.on('update', (/** @type {number} */ dt) => box.rotate(10 * dt, 20 * dt, 30 * dt));


const samplesSingle = 1024 * 1024;
const samplesSimple = 512 * 512;
const samplesComplex = 64 * 64;

let timingSingle = 0;
let timingSingleMiss = 0;
let timingSingleForEach = 0;

let timingSimpleEarly = 0;
let timingSimpleMid = 0;
let timingSimpleLate = 0;
let timingSimpleMiss = 0;
let timingSimpleForEach = 0;

let timingComplexEarly = 0;
let timingComplexDeepEarly = 0;
let timingComplexMid = 0;
let timingComplexDeepMid = 0;
let timingComplexLate = 0;
let timingComplexDeepLate = 0;
let timingComplexMiss = 0;
let timingComplexForEach = 0;

let timingTotal = 0;

// run test only at 10th frame
let frame = 10;
app.on('update', () => {
    if (--frame === 0) {
        // run test
        let time = 0;
        let timeStart = performance.now();

        
        // single
        time = performance.now();
        for(let i = 0; i < samplesSingle; i++) {
            entitySingle.findByName('single');
        }
        timingSingle += performance.now() - time;

        //      miss
        time = performance.now();
        for(let i = 0; i < samplesSingle; i++) {
            entitySingle.findByName('does-not-exist');
        }
        timingSingleMiss += performance.now() - time;

        //       for each
        time = performance.now();
        for(let i = 0; i < samplesSingle; i++) {
            entitySingle.forEach(() => { });
        }
        timingSingleForEach += performance.now() - time;


        // simple
        //      early
        time = performance.now();
        for(let i = 0; i < samplesSimple; i++) {
            entitySimple16.findByName('child-0');
        }
        timingSimpleEarly += performance.now() - time;

        //      mid
        time = performance.now();
        for(let i = 0; i < samplesSimple; i++) {
            entitySimple16.findByName('child-8');
        }
        timingSimpleMid += performance.now() - time;

        //      late
        time = performance.now();
        for(let i = 0; i < samplesSimple; i++) {
            entitySimple16.findByName('child-15');
        }
        timingSimpleLate += performance.now() - time;

        //      miss
        time = performance.now();
        for(let i = 0; i < samplesSimple; i++) {
            entitySimple16.findByName('does-not-exist');
        }
        timingSimpleMiss += performance.now() - time;

        //       for each
        time = performance.now();
        for(let i = 0; i < samplesSimple; i++) {
            entitySimple16.forEach(() => { });
        }
        timingSimpleForEach += performance.now() - time;


        // complex
        //      early
        time = performance.now();
        for(let i = 0; i < samplesComplex; i++) {
            entityComplex.findByName('child-0');
        }
        timingComplexEarly += performance.now() - time;

        //      deep early
        time = performance.now();
        for(let i = 0; i < samplesComplex; i++) {
            entityComplex.findByName('child-0-0-0-0');
        }
        timingComplexDeepEarly += performance.now() - time;
        
        //      mid
        time = performance.now();
        for(let i = 0; i < samplesComplex; i++) {
            entityComplex.findByName('child-16');
        }
        timingComplexMid += performance.now() - time;
        
        //      deep mid
        time = performance.now();
        for(let i = 0; i < samplesComplex; i++) {
            entityComplex.findByName('child-16-8-4-2');
        }
        timingComplexDeepMid += performance.now() - time;
        
        //      late
        time = performance.now();
        for(let i = 0; i < samplesComplex; i++) {
            entityComplex.findByName('child-31');
        }
        timingComplexLate += performance.now() - time;
        
        //      deep late
        time = performance.now();
        for(let i = 0; i < samplesComplex; i++) {
            entityComplex.findByName('child-31-15-7-3');
        }
        timingComplexDeepLate += performance.now() - time;
        
        //      deep late
        time = performance.now();
        for(let i = 0; i < samplesComplex; i++) {
            entityComplex.findByName('does-not-exist');
        }
        timingComplexMiss += performance.now() - time;

        //       for each
        time = performance.now();
        for(let i = 0; i < samplesComplex; i++) {
            entityComplex.forEach(() => { });
        }
        timingComplexForEach += performance.now() - time;

        timingTotal = performance.now() - timeStart;


        console.log('samples single', samplesSingle);
        console.log('timingSingle total', Math.round(timingSingle + timingSingleMiss + timingSingleForEach));
        console.log('timingSingle', Math.round(timingSingle));
        console.log('timingSingleMiss', Math.round(timingSingleMiss));
        console.log('timingSingleForEach', Math.round(timingSingleForEach));

        console.log('samples simple', samplesSimple);
        console.log('timingSimple total', Math.round(timingSimpleEarly + timingSimpleMid + timingSimpleLate + timingSimpleMiss + timingSimpleForEach));
        console.log('timingSimpleEarly', Math.round(timingSimpleEarly));
        console.log('timingSimpleMid', Math.round(timingSimpleMid));
        console.log('timingSimpleLate', Math.round(timingSimpleLate));
        console.log('timingSimpleMiss', Math.round(timingSimpleMiss));
        console.log('timingSimpleForEach', Math.round(timingSimpleForEach));

        console.log('samples complex', samplesComplex);
        console.log('timingComplex total', Math.round(timingComplexEarly + timingComplexDeepEarly + timingComplexMid + timingComplexDeepMid + timingComplexLate + timingComplexDeepLate + timingComplexMiss + timingComplexForEach));
        console.log('timingComplexEarly', Math.round(timingComplexEarly));
        console.log('timingComplexDeepEarly', Math.round(timingComplexDeepEarly));
        console.log('timingComplexMid', Math.round(timingComplexMid));
        console.log('timingComplexDeepMid', Math.round(timingComplexDeepMid));
        console.log('timingComplexLate', Math.round(timingComplexLate));
        console.log('timingComplexDeepLate', Math.round(timingComplexDeepLate));
        console.log('timingComplexMiss', Math.round(timingComplexMiss));
        console.log('timingComplexForEach', Math.round(timingComplexForEach));

        console.log('total', Math.round(timingTotal));
    }
});

export { app };

querielo · 2025-01-06T10:31:26Z

@Maksims Yes, I moved the PR to draft because there are several scenarios where we see performance drops. In some synthetic tests, there’s a slowdown, while in others there’s an improvement. I’m still investigating and looking for ways to fix it. I checked the performance in real PlayCanvas projects --- most of them show gains, but some do experience a slowdown and I understand the reasons behind it. I’m looking into possible solutions now.

I can separate the AABB and GraphNode changes into a separate PR.

willeastcott · 2025-01-06T10:33:18Z

@querielo It's definitely a great move to isolate incremental improvements into different PRs, yeah.

Maksims · 2025-01-06T10:54:06Z

I can separate the AABB and GraphNode changes into a separate PR.

Is it possible to keep core GraphNode functions as they are, as they are extremely simple and straight forward currently.
And implement a specific AABB's optimisations?

Also, it is not clear from PR start what is a current AABB's bottleneck, and what is optimized specifically.
Is it unnecessary AABB's recalculations of non-dirty AABB's, or what exactly is optimized here?

Please provide tests and numbers for isolated parts (AABB's only), and some numbers related to them, for example if there are unnecessary AABB's calculations, is it possible to show the number of them, and how many are avoided with this optimisation. And similar numbers, to understand the problem better.

If AABB's recalculations require a different way to iterate through hierarchy, then instead of changing existing iteration methods, introduce a specific one for optimized functionality. Also it should be probably in a specific Component, not GraphNode - as it is a very generic an slim class that knows nothing about rendering.

willeastcott · 2025-01-06T10:58:06Z

Is it possible to keep core GraphNode functions as they are, as they are extremely simple and straight forward currently.

That depends on whether a meaningful speed up is still achievable. 😄

mvaligursky · 2025-01-06T11:06:09Z

My gut feeling when I had a quick look at this during holidays was that the aabb calculations could be a nice win, but I wasn't sold on big wins based on recursive vs non-recursive walk over the hierarchy. I agree these two are separate optimisations and need separate PRs.

Based on one screenshot, I think you're testing this on a large number of characters. By default, characters have a high cost update of aabb, as we make sure it's precise, and the aabbs of bones are transformed by their matrices.
Typically, to optimize this out completely, this should be used: https://api.playcanvas.com/classes/Engine.RenderComponent.html#customaabb
This basically allows users to set a fixed aabb in object space, avoiding the engine calculating it from animated bones each frame. It still gets converted to world space as needed.

This was originally added to model component here, but render component supports it as well, which was added after this.

willeastcott · 2025-01-16T15:30:56Z

@querielo Any progress with the PRs you were planning?

MAG-AdrianMeredith · 2025-01-17T15:52:08Z

We got around these issues by using lods and a deferred execution system to reduce the framerate of lower quality lods which mostly solved the performance issue. Is it possible to have a fully gpu skinning? Whatabout skinning and instancing?
In my experience, microbenchmarking is totally useless and can lead you down a wrong path

mvaligursky · 2025-01-17T16:05:37Z

Is it possible to have a fully gpu skinning?

In general yes: https://stoyan3d.wordpress.com/2021/07/23/vertex-animation-texture-vat/
But we don't have this implemented. I've used it in the past and it's crazy fast.

Whatabout skinning and instancing?

Doable but not done. Are you trying to save draw calls?

querielo · 2025-01-18T13:32:54Z

@willeastcott Unfortunately, I had not enough time. Still WIP

I experiment with this benchmark. My best result is here:

Test Case	Main (ops/sec ± %)	Research result WIP [not in this PR] (ops/sec ± %)
complex-16: findByName (p0)	41,631,363 ±17.40% (84 runs sampled)	82,724,354 ±10.49% (85 runs sampled)
complex-16: findByName (p5)	17,509 ±0.45% (98 runs sampled)	38,081 ±1.38% (92 runs sampled)
complex-16: findByName (p10)	8,430 ±1.46% (88 runs sampled)	21,226 ±0.93% (92 runs sampled)
complex-16: findByName (p25)	1,865 ±1.77% (89 runs sampled)	8,564 ±0.72% (96 runs sampled)
complex-16: findByName (p50)	607 ±1.43% (90 runs sampled)	1,787 ±3.40% (89 runs sampled)
complex-16: findByName (p75)	387 ±1.52% (90 runs sampled)	787 ±1.87% (88 runs sampled)
complex-16: findByName (p90)	276 ±2.44% (83 runs sampled)	636 ±2.01% (93 runs sampled)
complex-16: findByName (p95)	230 ±8.46% (78 runs sampled)	685 ±1.17% (95 runs sampled)
complex-16: findByName (p100)	225 ±1.55% (79 runs sampled)	568 ±1.32% (94 runs sampled)
complex-16: forEach - count "child"	422 ±1.91% (90 runs sampled)	3,062 ±1.37% (98 runs sampled)
deep-narrow: findByName (p0)	51,111,560 ±0.59% (100 runs sampled)	75,038,398 ±0.72% (99 runs sampled)
deep-narrow: findByName (p5)	15,686,485 ±1.24% (99 runs sampled)	17,354,031 ±0.55% (96 runs sampled)
deep-narrow: findByName (p10)	7,481,730 ±0.77% (99 runs sampled)	7,871,519 ±0.34% (99 runs sampled)
deep-narrow: findByName (p25)	4,719,099 ±0.30% (98 runs sampled)	5,284,981 ±0.44% (98 runs sampled)
deep-narrow: findByName (p50)	1,830,604 ±0.29% (97 runs sampled)	1,719,027 ±4.15% (94 runs sampled)
deep-narrow: findByName (p75)	1,237,125 ±0.41% (95 runs sampled)	1,259,369 ±0.87% (93 runs sampled)
deep-narrow: findByName (p90)	987,077 ±0.80% (100 runs sampled)	1,003,850 ±0.39% (98 runs sampled)
deep-narrow: findByName (p95)	879,433 ±4.00% (98 runs sampled)	925,820 ±0.84% (95 runs sampled)
deep-narrow: findByName (p100)	828,371 ±3.59% (95 runs sampled)	869,562 ±0.43% (98 runs sampled)
deep-narrow: forEach - count "child"	1,410,805 ±0.47% (97 runs sampled)	1,260,831 ±0.95% (97 runs sampled)
wide-shallow: findByName (p0)	44,312,496 ±3.26% (93 runs sampled)	68,127,657 ±1.31% (89 runs sampled)
wide-shallow: findByName (p5)	11,677,169 ±0.63% (96 runs sampled)	11,708,320 ±0.54% (95 runs sampled)
wide-shallow: findByName (p10)	7,529,309 ±0.34% (98 runs sampled)	8,772,932 ±0.35% (101 runs sampled)
wide-shallow: findByName (p25)	2,439,713 ±0.28% (100 runs sampled)	2,325,104 ±0.96% (96 runs sampled)
wide-shallow: findByName (p50)	1,198,470 ±0.42% (96 runs sampled)	1,059,050 ±0.31% (100 runs sampled)
wide-shallow: findByName (p75)	782,697 ±0.23% (99 runs sampled)	703,360 ±2.61% (96 runs sampled)
wide-shallow: findByName (p90)	856,695 ±0.80% (97 runs sampled)	936,670 ±0.80% (93 runs sampled)
wide-shallow: findByName (p95)	816,399 ±0.44% (93 runs sampled)	849,755 ±0.31% (101 runs sampled)
wide-shallow: findByName (p100)	750,973 ±0.47% (99 runs sampled)	732,667 ±2.51% (94 runs sampled)
wide-shallow: forEach - count "child"	868,954 ±2.90% (94 runs sampled)	817,848 ±0.74% (98 runs sampled)
ultra-deep: findByName (p0)	44,595,915 ±1.08% (96 runs sampled)	84,905,349 ±1.67% (95 runs sampled)
ultra-deep: findByName (p5)	148,084 ±0.30% (98 runs sampled)	150,822 ±0.67% (97 runs sampled)
ultra-deep: findByName (p10)	58,782 ±0.40% (96 runs sampled)	64,269 ±0.22% (100 runs sampled)
ultra-deep: findByName (p25)	30,092 ±0.40% (96 runs sampled)	32,706 ±0.40% (99 runs sampled)
ultra-deep: findByName (p50)	17,164 ±0.24% (100 runs sampled)	17,338 ±0.41% (99 runs sampled)
ultra-deep: findByName (p75)	11,885 ±0.56% (99 runs sampled)	11,500 ±4.24% (95 runs sampled)
ultra-deep: findByName (p90)	10,111 ±0.51% (99 runs sampled)	9,268 ±4.93% (94 runs sampled)
ultra-deep: findByName (p95)	12,231 ±0.47% (100 runs sampled)	12,603 ±2.66% (85 runs sampled)
ultra-deep: findByName (p100)	13,294 ±1.06% (98 runs sampled)	14,171 ±3.56% (94 runs sampled)
ultra-deep: forEach - count "child"	15,832 ±0.54% (100 runs sampled)	14,877 ±2.46% (99 runs sampled)

The test cases measure the performance of different hierarchical structures and operations on them.

complex-16: A mixed-depth structure with levels of varying branching factors. Simulates realistic complex structures, such as file systems or dependency trees.
deep-narrow: A tall and thin structure, where depth is emphasized. Useful for testing operations where recursion or depth-first search could be expensive.
wide-shallow: A flat and wide structure, emphasizing breadth. Tests operations that deal with many siblings in a single level.
ultra-deep: An extremely deep structure, testing the limits of recursion or stack depth in operations.

p* represents the node at *% of the total hierarchy traversal order, calculated after converting the tree into a linear array.
For example, p10, p25, represent nodes at 10%, 25%, etc., of the array, providing a way to test performance across different levels or positions in the tree.

Result: my WIP code solves nearly all of @Maksims' concerns. However, the current implementation (it is not in this PR) is somewhat unrefined and requires further improvement before publishing. Also, it's essential to conduct additional testing to enhance code quality and reliability and expand the suite of unit tests. It is interesting that could'n

I believe I should close the pull request until it's ready. :-(

But I added this PR about AABB.

Improve performance of some GraphNode's methods, and don't recalculat…

edcca59

…e aabb a few times while rendering

vercel bot deployed to Preview January 3, 2025 12:19 View deployment

querielo added 2 commits January 3, 2025 13:24

Fix linter comment

c85208b

Merge branch 'main' into kirill/graph-node

8410c48

vercel bot deployed to Preview January 3, 2025 12:28 View deployment

Minor fix

00dce36

vercel bot deployed to Preview January 3, 2025 13:35 View deployment

Linter comment

d96f15e

willeastcott reviewed Jan 3, 2025

View reviewed changes

src/scene/graph-node.js Outdated Show resolved Hide resolved

vercel bot deployed to Preview January 3, 2025 13:46 View deployment

Linter comment

484dba6

willeastcott reviewed Jan 3, 2025

View reviewed changes

vercel bot deployed to Preview January 3, 2025 13:49 View deployment

willeastcott added the performance Relating to load times or frame rate label Jan 3, 2025

willeastcott reviewed Jan 3, 2025

View reviewed changes

querielo added 2 commits January 3, 2025 18:12

Minor fix: playcanvas#7245 (comment)

7c18f99

Minor fix: playcanvas#7245 (comment)

6beff46

vercel bot deployed to Preview January 3, 2025 17:14 View deployment

Add the Queue class, add the free method to the ObjectPool class, add…

352be44

… tests for them, use the classes in the forEach method of GraphNode

vercel bot deployed to Preview January 5, 2025 13:26 View deployment

Fix linter comment

9e12552

vercel bot deployed to Preview January 5, 2025 13:35 View deployment

willeastcott requested a review from mvaligursky January 5, 2025 20:21

Update Queue. Use offsets

776dfa2

vercel bot deployed to Preview January 6, 2025 03:39 View deployment

Minor fix

c38b981

vercel bot deployed to Preview January 6, 2025 03:59 View deployment

Minor fix

7f66759

vercel bot deployed to Preview January 6, 2025 04:34 View deployment

querielo marked this pull request as draft January 6, 2025 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the performance of certain GraphNode methods and avoid recalculating the AABB multiple times during rendering. #7245

Improve the performance of certain GraphNode methods and avoid recalculating the AABB multiple times during rendering. #7245

querielo commented Jan 3, 2025 •

edited

Loading

Maksims commented Jan 3, 2025 •

edited

Loading

querielo commented Jan 3, 2025

willeastcott Jan 3, 2025

willeastcott commented Jan 3, 2025 •

edited

Loading

willeastcott Jan 3, 2025

AlexAPPi commented Jan 3, 2025 •

edited

Loading

querielo commented Jan 3, 2025 •

edited

Loading

querielo commented Jan 4, 2025 •

edited

Loading

querielo commented Jan 5, 2025 •

edited

Loading

willeastcott commented Jan 5, 2025

Maksims commented Jan 6, 2025

Maksims commented Jan 6, 2025 •

edited

Loading

querielo commented Jan 6, 2025

willeastcott commented Jan 6, 2025 •

edited

Loading

Maksims commented Jan 6, 2025 •

edited

Loading

willeastcott commented Jan 6, 2025

mvaligursky commented Jan 6, 2025 •

edited

Loading

willeastcott commented Jan 16, 2025

MAG-AdrianMeredith commented Jan 17, 2025 •

edited

Loading

mvaligursky commented Jan 17, 2025 •

edited

Loading

querielo commented Jan 18, 2025 •

edited

Loading

Improve the performance of certain GraphNode methods and avoid recalculating the AABB multiple times during rendering. #7245

Are you sure you want to change the base?

Improve the performance of certain GraphNode methods and avoid recalculating the AABB multiple times during rendering. #7245

Conversation

querielo commented Jan 3, 2025 • edited Loading

Maksims commented Jan 3, 2025 • edited Loading

querielo commented Jan 3, 2025

willeastcott Jan 3, 2025

Choose a reason for hiding this comment

willeastcott commented Jan 3, 2025 • edited Loading

willeastcott Jan 3, 2025

Choose a reason for hiding this comment

AlexAPPi commented Jan 3, 2025 • edited Loading

querielo commented Jan 3, 2025 • edited Loading

querielo commented Jan 4, 2025 • edited Loading

querielo commented Jan 5, 2025 • edited Loading

willeastcott commented Jan 5, 2025

Maksims commented Jan 6, 2025

Maksims commented Jan 6, 2025 • edited Loading

And the performance execution of findByName and forEach is reduced in some cases, and is statistically insignificant in overall.

Observations:

The complexity introduced by this PR in certain methods is not justified by increased memory usage and lack of performance benefits.

querielo commented Jan 6, 2025

willeastcott commented Jan 6, 2025 • edited Loading

Maksims commented Jan 6, 2025 • edited Loading

willeastcott commented Jan 6, 2025

mvaligursky commented Jan 6, 2025 • edited Loading

willeastcott commented Jan 16, 2025

MAG-AdrianMeredith commented Jan 17, 2025 • edited Loading

mvaligursky commented Jan 17, 2025 • edited Loading

querielo commented Jan 18, 2025 • edited Loading

querielo commented Jan 3, 2025 •

edited

Loading

Maksims commented Jan 3, 2025 •

edited

Loading

willeastcott commented Jan 3, 2025 •

edited

Loading

AlexAPPi commented Jan 3, 2025 •

edited

Loading

querielo commented Jan 3, 2025 •

edited

Loading

querielo commented Jan 4, 2025 •

edited

Loading

querielo commented Jan 5, 2025 •

edited

Loading

Maksims commented Jan 6, 2025 •

edited

Loading

willeastcott commented Jan 6, 2025 •

edited

Loading

Maksims commented Jan 6, 2025 •

edited

Loading

mvaligursky commented Jan 6, 2025 •

edited

Loading

MAG-AdrianMeredith commented Jan 17, 2025 •

edited

Loading

mvaligursky commented Jan 17, 2025 •

edited

Loading

querielo commented Jan 18, 2025 •

edited

Loading