How to use `--store-durations` in Github Actions? #20

michamos · 2021-06-16T09:57:38Z

Let me first thank you for this great tool, which makes it super easy to decrease testing time in Github Actions.

I have some interrogations about how the new feature to combine --store-durations with --groups is supposed to be used to update test timings while running the splitted test suite during CI. I couldn't find any documentation, but I assume the idea is to do as suggested by @sondrelg in #11 (comment) and basically use the Github actions/cache to cache the .test_durations file.

I see two problems with that approach, both due to the fact that, as far as I understand, loading the cache is done at the beginning of the job, storing the cache at the end.

If there are several concurrent runs of the tests for different groups, they will read the same value of the file at the beginning if available (from the previous run), but they will try to overwrite the cache when they finish. The result is that the slowest group will have its durations persisted, because faster groups will have their durations overwritten almost immediately by slower ones. I believe this will cause all test durations from faster groups to be overestimated (as they will have no durations, so average test duration of the slowest group will be taken), so on average the other groups with unestimated tests will still finish faster, and it's not clear to me that over several runs, the durations will converge to the accurate values, as the slowest group will probably remain the slowest.
I don't think there's any guarantee that a job for a given group can't start after the job for a different group in the same run finishes. However, if that happens, the first job will have already updated the durations in the cache before the second one starts, causing a potentially different split into groups, which might cause some tests to run twice or not to run at all.

Unless I misunderstood how this whole thing works, I think the more robust approach would be to store the group durations as artifacts, and then have an additional job in the workflow which depends on the group runs that consolidates all the separate artifacts into one duration file and caches that. That would solve problem 1. as all group durations will be taken into account, and problem 2. as the caching will happen only after all test runs finish. The missing piece to implement such a strategy is a tool that can combine the test durations from the different groups, and a way to annotate in the .test_durations file whether a duration has been updated in the current run in order for the tool to know which durations need to be put into the combined file.

The text was updated successfully, but these errors were encountered:

sondrelg · 2021-06-16T10:06:15Z

the more robust approach would be to store the group durations as artifacts, and then have an additional job in the workflow which depends on the group runs that consolidates all the separate artifacts into one duration file

That's exactly what I ended up doing 😄

Here's a functional example (what I use)

  test:
    ...
    steps:
      ...
      - run: pip install awscli

      - name: Download test-durations
        run: aws s3 cp s3://<bucket>/.test_durations .test_durations
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

      - name: Run tests
        run: pytest --splits 8 --group ${{ matrix.group }} --store-durations

      - name: Upload partial durations
        uses: actions/upload-artifact@v1
        with:
          name: split-${{ matrix.group }}
          path: .test_durations

  upload-timings:
    needs: test
    steps:
      - uses: actions/checkout@v2
      - run: pip install awscli

      - name: Download artifacts
        uses: actions/download-artifact@v2

      # This will generate a single .test_durations file in the root dir
      - name: Combine test-durations
        run: python .github/scripts/combine_dict.py 8

      - name: Upload test-durations
        run: |
          aws s3 cp .test_durations s3://<bucket>/.test_durations
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

The combine-dict script is just a simple dict merge:

if __name__ == '__main__':
    import json
    import sys

    splits = int(sys.argv[1])

    x = {}

    for split in range(1, splits + 1):
        with open(f'split-{split}/.test_durations') as f:
            x.update(json.load(f))

    with open('.test_durations', 'w') as f:
        json.dump(x, f)

Maybe you could try it out and add some documentation if you get it to work? 🙂

sondrelg · 2021-06-16T10:09:43Z

Just as a follow-up, I ended up using S3 since that was already available to me, but if you manage to get this to work with the cache itself that would be even better 👍

michamos · 2021-06-16T10:13:45Z

Thanks @sondrelg, good to know my suggestion seems to work and you did all the hard work already 🙂
It's not clear to me how you distinguish when combining the durations between durations of tests that have been run in the current group, and hence have been updated, from durations of tests that the group didn't select and were already present in the .test_durations file from the start. In other words, if there's already a .test_durations file present in the cache, won't you end up simply keeping the .test_durations file of the last group?

sondrelg · 2021-06-16T10:16:34Z

Yeah you're probably right 🤔 I hadn't considered that tbh.

Could we maybe pass the --store-durations arg to get it to output partial durations and upload those instead?

mbkroese · 2021-06-16T18:27:43Z

I also recognised the problem that you are flagging here.
In my case the groups run on different nodes, and they get their own CI_NODE_INDEX assigned automatically by Gitlab.
After the durations are computed I rename .test_durations for node i to .test_durations.i, and store that as artifact.
Then a subsequent job simply goes over all the json files and averages the values for the same key.

The downside of this approach is that the old values get a large weight.

Maybe a solution could be to change the meaning of the --store-durations flag a bit.
We can allow the following values: all (default), group.
So if users use --store-durations=group then a .test_durations file would be created with just the durations of that group.
If users just do --store-durations then we can interpret that as equivalent to --store-durations=all.

I would actually really like a feature like this, and when this gets introduced also immediately add the combine-durations functionality described here: #11 (comment)

sondrelg · 2021-06-16T19:16:14Z

Could the same thing maybe be achieved by specifying which file to read from, and which to write from @mbkroese?

Since we only write durations for the tests run, the output file (if you're not writing back to the same file as you read from) would contain only the new durations, right 🙂

I definitely wouldn't mind built-in combine-duration logic 👏

mbkroese · 2021-06-16T20:10:38Z

@sondrelg I think that could work - right now we update the data that we read before and write the result, but it could be changed to just create a new dictionary which we can write to the write location.

However, think this should be considered a breaking change, because in your scheme, if the read and write location are the same, we would overwrite our current durations with those of the group. Whereas in the current situation (where read and write location are already the same) we would just update the durations of tests we ran.

sondrelg · 2021-06-16T21:02:15Z

if the read and write location are the same, we would overwrite our current durations with those of the group. Whereas in the current situation (where read and write location are already the same) we would just update the durations of tests we ran.

Isn't this the same thing? Can you elaborate a little bit more on what's different in this case? 🙂

mbkroese · 2021-06-16T21:07:00Z

Suppose currently we have these durations:

{'a': 1, 'b': 2, 'c': 3}

And we run one group that only executes test 'a' with new duration 5, then (in the current implementation) we'll write:

{'a': 5, 'b': 2, 'c': 3}

Whereas in your proposal we'd be writing:

{ 'a': 5}

right?

sondrelg · 2021-06-17T06:25:01Z

If we said pytest --input=.durations --output=.durations then we should end up with

{'a': 5, 'b': 2, 'c': 3}

But if we say pytest --input=.durations --output=.durations-${{ matrix.group }}

Then .durations should remain

{'a': 1, 'b': 2, 'c': 3}

And we would write { 'a': 5} to .durations-1 for matrix group 1.

So the current implementation would be equivalent of specifying the same file for input and output while breaking them up into two options (I think) should give us the flexibility of writing partial outputs.

Does that make sense or am I missing something? 🙂

mbkroese · 2021-06-17T06:31:47Z

I see what you mean now. I assumed you always just wanted to write the durations of tests we ran. But instead you want to update the durations at the write location of the tests we ran. Makes sense 👍

I think both proposed solutions are valid.

sondrelg · 2021-06-17T06:43:58Z

If --store-durations=group would just write group durations to the durations file, I guess that would also be equivalent for my use-case, so I don't mind how it's implemented.

@michamos or @jerry-git, do you have any pros/cons/preferences?

michamos · 2021-06-17T08:22:45Z

Actually I managed to make it work correctly afaict with the current implementation, by modifying @sondrelg's combining script to also take into account the previous version of the test durations to only update the changed values. I also managed to use the github cache to store the test durations across runs.

This is the (simplified) workflow I use:

 test:
    # additional config, like matrix omitted here
    steps:
      # test setup omitted here

      - name: Get durations from cache
        uses: actions/cache@v2
        with:
          path: test_durations
          # the key must never match, even when restarting workflows, as that
          # will cause durations to get out of sync between groups, the
          # combined durations will be loaded if available
          key: test-durations-split-${{ github.run_id }}-${{ github.run_number}}-${{ matrix.group }}
          restore-keys: |
            test-durations-combined-${{ github.sha }}
            test-durations-combined

      - name: Run tests
        run: pytest --splits 6 --group ${{ matrix.group }} --store-durations

      - name: Upload partial durations
        uses: actions/upload-artifact@v2
        with:
          name: split-${{ matrix.group }}
          path: .test_durations

  update_durations:
    name: Combine and update integration test durations
    runs-on: ubuntu-latest
    needs: test
    steps:
      - name: Checkout
        uses: actions/checkout@v2

      - name: Get durations from cache
        uses: actions/cache@v2
        with:
          path: .test_durations
          # key won't match during the first run for the given commit, but
          # restore-key will if there's a previous stored durations file, 
          # so cache will both be loaded and stored
          key: test-durations-combined-${{ github.sha }}
          restore-keys: test-durations-combined

      - name: Download artifacts
        uses: actions/download-artifact@v2

      - name: Combine test durations
        uses: ./.github/actions/combine-durations
        with:
          split-prefix: split-

The tricky point when using the actions/cache is that if a key matches in the cache, the cache will be loaded but not updated (docs). So we need to ensure a cache miss based on the key and use restore-keys instead for loading the cache.

For clarity, I've split combine-durations into its own action. This might become a github action into a separate repo. The action.yml contains

name: Combine durations
description: Combine pytest-split durations from multiple groups

inputs:
  durations-path:
    description: The path to the durations file (must match `--durations-path` arg to pytest)
    required: false
    default: .test_durations
  split-prefix:
    description: The path to the split durations (must match the artifacts name)
    required: true

runs:
  using: composite
  steps:
    - name: Combine durations
      shell: bash
      run: >
        python3 $GITHUB_ACTION_PATH/combine_durations.py ${{ inputs.split-prefix }} ${{ inputs.durations-path }}

and the combine_durations.py script is

import json
import sys
from pathlib import Path

split_prefix = sys.argv[1]
durations_path = Path(sys.argv[2])

split_paths = Path(".").glob(f"{split_prefix}*/{durations_path.name}")
try:
    previous_durations = json.loads(durations_path.read_text())
except FileNotFoundError:
    previous_durations = {}
new_durations = previous_durations.copy()

for path in split_paths:
    durations = json.loads(path.read_text())
    new_durations.update(
        {
            name: duration
            for (name, duration) in durations.items()
            if previous_durations.get(name) != duration
        }
    )

durations_path.parent.mkdir(parents=True, exist_ok=True)
durations_path.write_text(json.dumps(new_durations))

It would be good if (a less quick-and-dirty version of) this script became part of pytest-split as it depends on implementation details of the test durations storage format.

michamos · 2021-06-18T12:19:04Z

@jerry-git what do you think about adding a command to combine outputs directly to pytest-split? it could have similar logic to the script I posted in #20 (comment) and have an API that is slightly more general to accomodate different CI systems:

pytest-split-combine [--durations-path DURATIONS_PATH] SPLIT_DURATIONS_PATH...

the --durations-path would be optional and have the same meaning and default as in the pytest plugin, while the other non-optional arguments would be the paths to the spilt durations files. I believe it's better to be explicit here for a general purpose tool rather than do the globbing in the script.

jerry-git · 2021-06-20T17:05:09Z

Sounds great 👍

while the other non-optional arguments would be the paths to the spilt durations files

Does this mean that one would do something like
pytest-split-combine foo/split-1 foo/split-2 foo/split-3 ...?

michamos · 2021-06-23T16:13:50Z

Does this mean that one would do something like
pytest-split-combine foo/split-1 foo/split-2 foo/split-3 ...?

That's what I had in mind, yes. Do you have another suggestion?

sondrelg · 2021-06-26T10:15:43Z

If you set up the functionality with argparse, you can use the Poetry scripts feature to make the command runnable with any command you want 🙂 https://python-poetry.org/docs/pyproject/#scripts

jerry-git · 2021-08-09T10:03:09Z

FYI there's now a slowest-tests CLI command available. If someone wants to do the pytest-split-combine, the same idea can be used for introducing the CLI command, see e.g. #30

matthuisman · 2023-08-07T02:21:53Z

for me the solution was to use --clean-durations
https://github.com/jerry-git/pytest-split/blob/master/src/pytest_split/plugin.py#L73

That will make pytest only output the durations for the tests it ran in that group :)
Then you can easily combine all the output durations from groups without them overriding each others values

estahn · 2023-08-17T06:48:28Z

Some great tips. I wanted to add that there is no additional python script required to combine the files. jq is available in the action runner, hence you can combine the files like so:

jq '. + input' .test_durations1 .test_durations2

estahn · 2023-08-17T20:41:32Z

The solution suggested by @michamos will push out cache keys that might be important. I implemented the following which should avoid pushing out cached items:

It's to note that we only run split 2 and I haven't found a nicer way to restore from multiple keys.

      - name: "[Test Duration] Restore test duration (1)"
        id: test-duration-cache-restore-1
        uses: actions/cache/restore@v3
        with:
          path: .test_durations.1
          key: test-durations-1

      - name: "[Test Duration] Restore test duration (2)"
        id: test-duration-cache-restore-2
        uses: actions/cache/restore@v3
        with:
          path: .test_durations.2
          key: test-durations-2

      - name: "[Test Duration] Combine test duration cache files"
        if: steps.test-duration-cache-restore-1.outputs.cache-hit == 'true' && steps.test-duration-cache-restore-2.outputs.cache-hit == 'true'
        run: |
          jq '. + input' .test_durations.1 .test_durations.2 > .test_durations

      - name: Run Tests
        timeout-minutes: 10
        run: |
          poetry run pytest ... --splits 2 --group ${{ matrix.group }} --store-durations --durations-path=.test_durations.${{ matrix.group }}

      - name: "[Test Duration] Delete test duration cache (${{ matrix.group }})"
        continue-on-error: true
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          gh extension install actions/gh-actions-cache
          gh actions-cache delete test-durations-${{ matrix.group }} -R ${{ github.repository }} --confirm

      - name: "[Test Duration] Save test duration"
        id: test-duration-cache-save
        uses: actions/cache/save@v3
        with:
          path: .test_durations.${{ matrix.group }}
          key: test-durations-${{ matrix.group }}

charles-cooper · 2024-03-18T14:42:58Z

maybe another option here which could help the ergonomics would be for pytest-splits to accept multiple test durations files? like, --durations-path=test_durations1,test_durations2,.... it would combine them using the cli provided ordering to resolve conflicts. this would make it much easier to distribute generation of the test durations (and to collect them later!).

michaelgmiller1 · 2024-05-24T01:17:02Z

For anyone using @michamos solution, note that the paths must match (at least under actions/cache@v4) otherwise you will not get any cache hit!

The line
path: test_durations
has to change to:
path: .test_durations

michaelgmiller1 · 2024-11-01T02:09:30Z

For anyone using @michamos's solution, note that hidden files are now ignored by default in upload-artifact: https://github.com/actions/upload-artifact?tab=readme-ov-file#uploading-hidden-files

To solve this, add include-hidden-files: true as a parameter to the upload-artifact action.

samylaumonier · 2025-01-07T19:08:17Z

Hey guys, here is my solution with the v4 actions (actions/cache/restore@v4 and actions/cache/save@v4):

on: push
jobs:
  test:
    runs-on: self-hosted  # You need to tweak that to use your own Docker image
    strategy:
      fail-fast: false
      matrix:
        job-index: [ 1, 2, 3, 4, 5, 6, 7, 8 ]
    steps:
      # It's mandatory to use the exact same path when saving/restoring cache, otherwise it won't work
      # (the same key is not enough - see documentation:
      # https://github.com/actions/cache/blob/main/README.md#cache-version).
      # I went with `/tmp/.test_durations`.
      - name: Restore test durations
        id: restore-test-durations
        uses: actions/cache/restore@v4
        with:
          path: /tmp/.test_durations
          key: tests-durations-${{ github.sha }}-${{ github.run_attempt }}
          restore-keys: |
            tests-durations-${{ github.sha }}-
            tests-durations-

      # This step will be executed only when there IS a cache hit (exact match or not).
      # See documentation about the `cache-hit` output:
      # https://github.com/actions/cache/blob/main/README.md#outputs
      # > cache-hit - A string value to indicate an exact match was found for the key.
      # > If there's a cache hit, this will be 'true' or 'false' to indicate if there's an exact match
      # > for key.
      # > If there's a cache miss, this will be an empty string.
      - name: Use cached test durations
        if: steps.restore-test-durations.outputs.cache-hit != ''
        working-directory: /path/to/your/app
        run: mv /tmp/.test_durations .test_durations

      # This step will be executed only when there is NO cache hit.
      # You need to commit file `.test_durations_fallback`.
      # You can also refresh it manually from time to time to keep an up-to-date fallback
      # (see step "Upload final test durations" below).
      - name: Use fallback test durations
        if: steps.restore-test-durations.outputs.cache-hit == ''
        working-directory: /path/to/your/app
        run: mv .test_durations_fallback .test_durations

      # When running pytest, we write the new test durations using options
      # `--store-durations --clean-durations`.
      # Option `--clean-durations` is undocumented but you can check its implementation here:
      # https://github.com/jerry-git/pytest-split/blob/fb9af7e0122c18a96a7c01ca734c4ab01027f8d9/src/pytest_split/plugin.py#L68-L76
      # > Removes the test duration info for tests which are not present while running the suite with
      # > '--store-durations'.
      - name: Run pytest
        working-directory: /path/to/your/app
        run: |
          pytest \
            --retries 1 --cumulative-timing 1 \
            --splits ${{ strategy.job-total }} --group ${{ matrix.job-index }} \
            --store-durations --clean-durations

      # Each matrix job uploads its freshly updated partial test durations. We regroup them all
      # within one final file in the "Merge all partial test durations" step below.
      - name: Upload test durations
        uses: actions/upload-artifact@v4
        with:
          name: test-durations-${{ matrix.job-index }}
          path: /path/to/your/app/.test_durations
          if-no-files-found: error
          include-hidden-files: true

  cache-test-durations:
    name: Cache test durations
    needs: test
    if: success() || failure()
    runs-on: self-hosted  # You need to tweak that to use your own runner
    steps:
      # By default, this action will download all the artifacts that have been created previously
      # (we created one per matrix job):
      - name: Download all partial test durations
        uses: actions/download-artifact@v4

      # This step regroups the 8 partial files and sorts keys alphabetically:
      - name: Merge all partial test durations
        run: |
          jq -s 'add' test-durations-*/.test_durations \
          | jq 'to_entries | sort_by(.key) | from_entries' \
          > /tmp/.test_durations

      # This step uploads the final file as an artifact. You can then download it from the Github GUI,
      # and use it to manually commit file `.test_durations_fallback` from time to time,
      # to keep an up-to-date fallback:
      - name: Upload final test durations
        uses: actions/upload-artifact@v4
        with:
          name: test-durations
          path: /tmp/.test_durations
          if-no-files-found: error
          include-hidden-files: true

      # Finally, we cache the new test durations. This file will be restored in next CI execution
      # (see step "Restore test durations" above).
      - name: Cache final test durations
        uses: actions/cache/save@v4
        with:
          path: /tmp/.test_durations
          key: tests-durations-${{ github.sha }}-${{ github.run_attempt }}

Useful info:

https://github.com/actions/cache/blob/main/README.md#cache-version
https://github.com/actions/cache/blob/main/README.md#outputs

pytest-split/src/pytest_split/plugin.py

Lines 68 to 76 in fb9af7e

    
           group.addoption( 
        
               "--clean-durations", 
        
               dest="clean_durations", 
        
               action="store_true", 
        
               help=( 
        
                   "Removes the test duration info for tests which are not present " 
        
                   "while running the suite with '--store-durations'." 
        
               ), 
        
           )

alexrashed · 2025-01-08T08:01:14Z

@samylaumonier Thanks a lot for your sophisticated and well documented solution! I wonder if we could create a composite action to simplify the usage and maintain it collaboratively? :)

DanielKotik mentioned this issue Mar 3, 2022

Improve speed of cpu test workflow with GitHub's action strategy mala-project/mala#272

Open

jerry-git mentioned this issue Aug 2, 2022

[Actions] Auto-Update cookiecutter template #58

Merged

alexrashed mentioned this issue Aug 29, 2023

add test durations for Pro Integration test split balancing localstack/localstack#9003

Merged

ytausch mentioned this issue Jul 29, 2024

enable parallel tests with pytest-xdist and pytest-split regro/cf-scripts#2874

Merged

1 task

agriyakhetarpal mentioned this issue Oct 22, 2024

Re-enable Pyodide CI job and testing for SymPy sympy/sympy#27183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use `--store-durations` in Github Actions? #20

How to use `--store-durations` in Github Actions? #20

michamos commented Jun 16, 2021

sondrelg commented Jun 16, 2021

sondrelg commented Jun 16, 2021

michamos commented Jun 16, 2021 •

edited

Loading

sondrelg commented Jun 16, 2021

mbkroese commented Jun 16, 2021 •

edited

Loading

sondrelg commented Jun 16, 2021

mbkroese commented Jun 16, 2021

sondrelg commented Jun 16, 2021

mbkroese commented Jun 16, 2021

sondrelg commented Jun 17, 2021

mbkroese commented Jun 17, 2021

sondrelg commented Jun 17, 2021

michamos commented Jun 17, 2021 •

edited

Loading

michamos commented Jun 18, 2021 •

edited

Loading

jerry-git commented Jun 20, 2021

michamos commented Jun 23, 2021

sondrelg commented Jun 26, 2021

jerry-git commented Aug 9, 2021

matthuisman commented Aug 7, 2023 •

edited

Loading

estahn commented Aug 17, 2023

estahn commented Aug 17, 2023 •

edited

Loading

charles-cooper commented Mar 18, 2024 •

edited

Loading

michaelgmiller1 commented May 24, 2024

michaelgmiller1 commented Nov 1, 2024 •

edited

Loading

samylaumonier commented Jan 7, 2025 •

edited

Loading

alexrashed commented Jan 8, 2025

How to use --store-durations in Github Actions? #20

How to use --store-durations in Github Actions? #20

Comments

michamos commented Jun 16, 2021

sondrelg commented Jun 16, 2021

sondrelg commented Jun 16, 2021

michamos commented Jun 16, 2021 • edited Loading

sondrelg commented Jun 16, 2021

mbkroese commented Jun 16, 2021 • edited Loading

sondrelg commented Jun 16, 2021

mbkroese commented Jun 16, 2021

sondrelg commented Jun 16, 2021

mbkroese commented Jun 16, 2021

sondrelg commented Jun 17, 2021

mbkroese commented Jun 17, 2021

sondrelg commented Jun 17, 2021

michamos commented Jun 17, 2021 • edited Loading

michamos commented Jun 18, 2021 • edited Loading

jerry-git commented Jun 20, 2021

michamos commented Jun 23, 2021

sondrelg commented Jun 26, 2021

jerry-git commented Aug 9, 2021

matthuisman commented Aug 7, 2023 • edited Loading

estahn commented Aug 17, 2023

estahn commented Aug 17, 2023 • edited Loading

charles-cooper commented Mar 18, 2024 • edited Loading

michaelgmiller1 commented May 24, 2024

michaelgmiller1 commented Nov 1, 2024 • edited Loading

samylaumonier commented Jan 7, 2025 • edited Loading

alexrashed commented Jan 8, 2025

How to use `--store-durations` in Github Actions? #20

How to use `--store-durations` in Github Actions? #20

michamos commented Jun 16, 2021 •

edited

Loading

mbkroese commented Jun 16, 2021 •

edited

Loading

michamos commented Jun 17, 2021 •

edited

Loading

michamos commented Jun 18, 2021 •

edited

Loading

matthuisman commented Aug 7, 2023 •

edited

Loading

estahn commented Aug 17, 2023 •

edited

Loading

charles-cooper commented Mar 18, 2024 •

edited

Loading

michaelgmiller1 commented Nov 1, 2024 •

edited

Loading

samylaumonier commented Jan 7, 2025 •

edited

Loading