Benchmarking file uploads with Locust

April 7, 2020 3-minute read

locust • benchmarking • web performance • http load testing

I had to perform some functional load testing of some file upload features. The last time I played with load testing Jmeter was the incumbent, with an entire team dedicated to its usage. My first crack was to harness curl and launch a bunch of loops into the background. This worked well to generate load but offered little in terms of process control and reporting. To change the number of clients I had to rerun my wrappers with new variables. Reporting involved several scripts to parse hundreds of thousands of lines of text. This isn’t in a consumable format.

Off to DDG. I stumbled upon Locust, a load testing python module that conveniently uses Python Requests. Perfect.

Installation

The quick and dirty installation steps

python3 -mvenv ~/.python_envs/locust
source ~/.python_envs/locust/bin/activate
pip install wheel
pip install -e git://github.com/locustio/locust.git@master#egg=locustio

You should now be able to run locust.

locust --help

Configuration

Locust' configuration is contained in a locustfile.py file. Here’s mine:

from locust import HttpLocust, TaskSet, task, between
import uuid

class UserBehavior(TaskSet):
    @task
    def uploads(self):
        url = '/uploads'
        filename = '50m.tar.gz'
        headers = {
                'User-Agent': 'curl/7.58.0',
                'Cookie': 'authsomething=datastring; othercookie=datastring',
        }
        files = [
            ('file', (filename, open(filename, 'rb'), 'application/x-gtar')),
            ('guid', (None, str(uuid.uuid4()))),
        ]

        response = self.client.post(url, headers=headers, files=files)

class MyLocust(HttpLocust):
    tasks = [UserBehavior]
    wait_time = between(0.1, 0.2)

A few notes

The application I was testing required some auth cookies, rather than overcomplicate the task at hand, I chose to just use the Cookie header from a browser request. If you wanted full login/logout, you’d add that as separate tasks. Locust will save the cookies in a login request and use them on an upload request.
Curl derives the content type from the file extension. Python requests does not, you need to explicitly declare it. This swallowed a bunch of time as I had to compare packet dumps to figure out why my requests in Locust would fail but curl worked. Content-Type headers.
User-Agent can be tossed. It was an attempt to troubleshoot the curl/locust differences. Leaving it here as an example.
The file referenced here is simply a dd of urandom. To generate your own run

dd if=/dev/urandom of=50m.tar.gz bs=1024 count=50000

Running Locust

You’ll notice that the url I post to is just the path. Locust manages the host separately by default - you can define the host in the locustfile, but I preferred to specify it at run time. There are two options to run Locust:

1. Web Frontend

This gives you a very nice interface complete with summary charts and graphs. You can also change the parameters of the load test live. Going from 200 users to 300, back down to 150.

To start the web frontend simply run the following where -host is the prefix to the url specified in the locustfile. Locust will bind to *:8089 by default so visit http://machineip:8089 in your browser.

locust -f locustfile.py --host https://yourhost.com

2. CLI

I mainly used the “headless” operation for debugging the locustfile.

To run locust in headless mode and have it start the test right away, run the following where -c is the number of clients to simulate and -r is the hatch rate per second to spawn them.

locust -f locustfile.py --host https://yourhost.com -c 5 -r 1 --headless

If you want more verbose output:

locust -L DEBUG -f locustfile.py --host https://yourhost.com -c 5 -r 1 --headless