Nam's Thoughts

Determining Optimal Core Count with `hyperfine` and `pytest`

Problem

At SEEK, we use buildkite to run pytest from a dockerised ETL pyspark container. However, with ~1,500 tests and counting, our tests have been taking > 5 minutes and resulting in unfavourable feedback times.

By default, pytest will only use one core to run tests, however with the pytest plugin pytest-xdist, you can utilise as many cores available to run your testing but up to a point:

  • Set too low, pytest will not fully utilise the parallelism available on the machine.
  • Set too high, pytest will think that there’s more cores than it has available and reflect performance degradation.

As such, for the machines I plan on running on (AWS’ c7g.4xlarge i.e. AWS’ ARM Graviton) machine, I used hyperfine to experiment the most optimal CPU cores

Setup

To set the context, I have a unit testing script that’s run inside a docker container which is invoked by the Makefile, with the number of cores for pytest being passed as an environment variable. Here’s the bash script:

Makefile

.PHONY: test
test:
    docker-compose -f cicd/tests/docker-compose-test.yml up --abort-on-container-exit $(service)

cicd/tests/docker-compose-test.yml

version: '3.8'
services:
  unit:
    image: # ecr image redacted for privacy
    entrypoint: /usr/local/app_user/test_pipelines.sh
    volumes:
      - ../../test_pipelines.sh:/usr/local/app_user/test_pipelines.sh
    environment:
      - pytest_cores

bash script

#!/usr/bin/env bash
set -eux

echo "Setting up environment"
PYTEST_OPTIONS="--durations=0 --no-cov-on-fail --cov-branch --cov-report term-missing --cov-fail-under=0.00"
pytest_cores=${pytest_cores:-4} # passed in as an env var from docker-compose
echo 'pytest_cores: ' $pytest_cores

echo "Unit Test - All Pipelines"
pytest -n $pytest_cores $PYTEST_OPTIONS src/

Benchmarking with Hyperfine

The primary aim was to find out the optimal number of cores required to make our tests run faster. That’s where hyperfine comes in.

To determine the best core count, I used the following hyperfine command:

hyperfine --runs 3 -P cores 2 10 'make test service=pipelines pytest_cores={cores}' --export-markdown pytest-cores.md

Here’s a breakdown of the command:

--runs 3: Run the command 3 times and take the average. -P cores 2 10: Vary the number of cores from 2 to 10. make test service=pipelines pytest_cores={cores}: The command being benchmarked where {cores} is a placeholder that hyperfine replaces based on the current loop. --export-markdown pytest-cores.md: Export the results into a markdown file for further analysis.

Analysis

Results

CommandMean [s]Min [s]Max [s]Relative
make test service=pipelines pytest_cores=2234.301 ± 4.784230.408239.6411.72 ± 0.05
make test service=pipelines pytest_cores=3178.949 ± 2.660177.008181.9801.31 ± 0.03
make test service=pipelines pytest_cores=4151.718 ± 3.913149.024156.2061.11 ± 0.04
make test service=pipelines pytest_cores=5140.316 ± 3.379138.189144.2121.03 ± 0.03
make test service=pipelines pytest_cores=6136.448 ± 2.899133.942139.6231.00
make test service=pipelines pytest_cores=7136.451 ± 2.571134.383139.3291.00 ± 0.03
make test service=pipelines pytest_cores=8141.022 ± 2.241139.504143.5961.03 ± 0.03
make test service=pipelines pytest_cores=9147.063 ± 1.624145.626148.8251.08 ± 0.03
make test service=pipelines pytest_cores=10152.674 ± 2.731150.211155.6111.12 ± 0.03

Graph

From the data, we can clearly see a general decrease in the mean execution time as the number of cores increases from 2 to 6. However, after 6 cores, there’s an increase in the time it takes for the tests to execute.

It’s essential to recognize that while increasing concurrency (by adding more cores) can speed up execution, it doesn’t always lead to linear improvements. There’s a sweet spot, which in this case is around 6 cores for the c5d.4xlarge instance. Beyond this point, any performance gains from parallel execution are offset by the overheads introduced.

Conclusion

By combining the power of hyperfine and pytest, it became straightforward to determine the optimal number of cores needed for my tests. Not only does this help in reducing test run times, but it also aids in optimising resource utilization.

References

Check out my related postss:

Code used to generate results

# Annotating the graph with context

plt.figure(figsize=(12, 7))
plt.plot(cores, mean_times, marker='o', linestyle='-', color='b')
plt.title("Mean Execution Time vs. pytest_cores on c5d.4xlarge with pytest-xdist")
plt.xlabel("pytest_cores")
plt.ylabel("Mean Time [s]")
plt.grid(True, which='both', linestyle='--', linewidth=0.5)

# Annotations
plt.annotate('Optimum at 6 cores', xy=(6, 136.448), xytext=(4, 200),
             arrowprops=dict(facecolor='red', arrowstyle='->'),
             color='red')
plt.annotate('Possible overhead & contention beyond this point',
             xy=(7, 140), xytext=(7.5, 180),
             arrowprops=dict(facecolor='red', arrowstyle='->'),
             color='red')

plt.tight_layout()
plt.show()

Tags: