How it works
It’s helpful to understand first what dbworkload does:
-
At runtime,
dbworkloadfirst imports the class you pass,bank.py. -
It spawns n threads for concurrent execution (see next section on Concurrency).
- By default, it sets the connection to
autocommitmode. - Each thread creates a database connection - no need for a connection pool.
- In a loop, each
dbworkloadthread will: - execute function
loop()which returns a list of functions. - execute each function in the list sequentially. Each function, typically, executes a SQL statement/transaction.
- Execution stats are funneled back to the MainThread, which aggregates and, optionally, prints them to stdout and saves them to a CSV file.
- If the connection drops, it will recreate it. You can also program how long you want the connection to last.
dbworkloadstops once a limit has been reached (iteration/duration), or you Ctrl+C.
Concurrency - processes and threads
dbworkload uses both the multiprocessing and threading library to achieve high concurrency, that is, opening multiple connections to the DBMS.
There are 2 parameters that can be used to configure how many processes you want to create, and for each process, how many threads:
--procs/-x, to configure the count of processes (defaults to the CPU count)--concurrency/-c, to configure the total number of connections (also referred to as executing threads)
dbworkload will spread the load across the processes, so that each process has an even amount of threads.
Example: if we set --procs 4 and --concurrency 10, dbworkload will create as follows:
- Process-1: MainThread + 1 extra thread. Total = 2
- Process-2: MainThread + 1 extra thread. Total = 2
- Process-3: MainThread + 2 extra threads. Total = 3
- Process-4: MainThread + 2 extra threads. Total = 3
Total executing threads/connections = 10
This allows you to fine tune the count of Python processes and threads to fit your system.
Furthermore, each executing thread receives a unique ID (an integer).
The ID is passed to the workload class with function setup(), along with the total count of threads, i.e. the value passed to -c/--concurrency.
You can leverage the ID and the thread count in various ways, for example, to have each thread process a subset of a dataset.