Split results in 0 selected tests, which leads to failed GitHub action #95

ArnauMunsOrenga · 2024-06-17T09:07:04Z

Hi,

We have integrated pytest-split as part of our CI pipeline and noticed the following behaviour.

We were running our tests in 7 splits, but after updating the .test_durations script (new tests being added), the 7th split did not select any test. This lead to a failed CI run since all the tests were deselected (see below).

Any idea why this could happen?

We tried to create more splits after this error was raised (increased from 7 to 10) and everything worked fine.

We would like to understand why the 7th split did not select any test when --splits 7 but everything worked well when --splits 10.

Thanks in advance. Your package has become very useful in speeding up our CICD pipelines.

The text was updated successfully, but these errors were encountered:

jerry-git · 2024-06-19T12:20:01Z

Interesting! Probably not an open source repo as you didn't include a link to a GHA run? If it's private, could you share the durations file? The test names can be anonymised. I'd be mainly interested in seeing what kind of duration values there are.

ArnauMunsOrenga · 2024-06-21T17:52:13Z

@jerry-git thanks for the quick reply. Indeed, the repo is not open source so I can't share the github action run.

These were the different splits:

And this was the durations file for that run:
tests_empty_split.txt

Most of the tests are fast, while there are only a few which take more than 5 seconds.

Thank you!

jerry-git · 2024-06-26T12:55:46Z

Thanks for the data! I think I know what’s up. So, there are a couple of tests which have 10+ seconds duration while around 500/608 tests take 0.01 seconds or less. If the splits (when split to 7) would be optimal, each group would take around 17.92 seconds to run if we’d run all the tests listed in tests_empty_split.txt.

The duration based chunks algo basically takes tests into a single group until optimal time is reached (17.92 seconds in this case). Then it moves to fill the next group. The tests are looped in the same order in which pytest collects them (alphabetical order AFAIK). Same as code here:

pytest-split/src/pytest_split/algorithms.py

Lines 109 to 118 in c7a3272

    
           group_idx = 0 
        
           for item, item_duration in items_with_durations: 
        
               if duration[group_idx] >= time_per_group: 
        
                   group_idx += 1 
        
               selected[group_idx].append(item) 
        
               for i in range(splits): 
        
                   if i != group_idx: 
        
                       deselected[i].append(item) 
        
               duration[group_idx] += item_duration

.

I believe those longer tests coincidentally happen to be at the end of their groups. For example, the very first group could have 472 shorter tests with total execution time of just a hair lower than 17.92 seconds, then a test which takes 10+ seconds comes and gets still added to the group. The fact that the estimated runtime for that first group is notably longer than 17.92 seconds is not taken into account while filling the next groups. So, at the end, this leads to the last group not having tests at all.

Here's a quickly hacked analysis of tests_empty_splits.txt:

1: 466/608 estimated duration: 18.03s
2: 8/608 estimated duration: 19.52s
3: 8/608 estimated duration: 19.11s
4: 15/608 estimated duration: 26.18s
5: 9/608 estimated duration: 28.84s
6: 102/608 estimated duration: 13.74s
7: 0/608 estimated duration: 0.00s
In total would run 608/608
Avg test time 0.20627700692763154
Optimal time per group would be 17.91663145885714

If your test suite is robust enough to run the tests in semi random order, the best would be to use --splitting-algorithm least_duration. You can check the README for details about the behaviour but that would lead into optimal splits in your case.

I’ll see how I could improve the splits with the duration based chunks algo. Taking into account the estimated total runtime for previous groups should be relatively low hanging fruit for improving the behaviour of the algorithm.

ArnauMunsOrenga · 2024-06-28T12:24:01Z

Thanks for the support and the detailed explanation Jerry.

Indeed our test suite can run in random order, as there are no dependencies between tests. We have changed the --splitting-algorithm to least_duration and everything is working fine 🙂.

Moving forward, I believe this issue can be closed, unless you want to keep it open to track any change related to it.

Once again, thank you.

bronsonrudner · 2024-06-28T17:47:36Z

Hi @jerry-git . I encountered a similar issue. Perhaps an algorithm like the following. Though mainly the _get_minimum_split would be a more accurate way of determining the max duration for test suites with chunky tests.

def split_tasks(tasks, n):
    """Split tasks into n contiguous sections, such that each is non-empty, and the maximum section is minimised"""
    max_bucket_size = _get_minimum_split(tasks, n)
    return list(_get_sections(tasks, n, max_bucket_size))

def _get_minimum_split(tasks, n):
    """If tasks split into n contiguous sections, determines the maximum segment of the split which minimises this number"""
    left, right = max(tasks), sum(tasks)
    while left < right:
        mid = (left + right) // 2
        if _can_split(tasks, n, mid):
            right = mid
        else:
            left = mid + 1
    return left

def _can_split(tasks, n, max_sum):
    current_sum = 0
    required_sections = 1
    for task in tasks:
        if current_sum + task > max_sum:
            required_sections += 1
            current_sum = task
            if required_sections > n:
                return False
        else:
            current_sum += task
    return True

def _get_sections(tasks, n, max_bucket_size):
    tasks = tasks[::-1]
    current_sum = 0
    current_bucket = []
    num_buckets = 1
    while tasks:
        task = tasks.pop()
        if current_sum + task > max_bucket_size:
            yield current_bucket
            num_buckets += 1
            current_sum = task
            current_bucket = [task]
        else:
            current_bucket.append(task)
            current_sum += task
        if num_buckets + len(tasks) == n:
            yield current_bucket
            break
    while tasks:
        yield [tasks.pop()]

# Example usage:
tasks = [7, 2, 5, 10, 8]
n = 2
buckets = split_tasks(tasks, n)
print(buckets)  # Output: [[7, 2, 5], [10, 8]]

mcasperson added a commit to OctopusSolutionsEngineering/OctopusCopilot that referenced this issue Sep 10, 2024

Changed algorithm because of jerry-git/pytest-split#95

dee5d1d

jerry-git mentioned this issue Oct 16, 2024

pytest.mark.parametrize potential weird interaction #102

Closed

matti-lamppu mentioned this issue Jan 21, 2025

Email application url City-of-Helsinki/tilavarauspalvelu-core#1491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split results in 0 selected tests, which leads to failed GitHub action #95

Split results in 0 selected tests, which leads to failed GitHub action #95

ArnauMunsOrenga commented Jun 17, 2024

jerry-git commented Jun 19, 2024

ArnauMunsOrenga commented Jun 21, 2024

jerry-git commented Jun 26, 2024

ArnauMunsOrenga commented Jun 28, 2024

bronsonrudner commented Jun 28, 2024

Split results in 0 selected tests, which leads to failed GitHub action #95

Split results in 0 selected tests, which leads to failed GitHub action #95

Comments

ArnauMunsOrenga commented Jun 17, 2024

jerry-git commented Jun 19, 2024

ArnauMunsOrenga commented Jun 21, 2024

jerry-git commented Jun 26, 2024

ArnauMunsOrenga commented Jun 28, 2024

bronsonrudner commented Jun 28, 2024