Lbench is a simple Linux multithread benchmark tool.
Various aspects of CPU and OS performance are measured for 1-9 parallel threads.
Measure how well performance scales with multiple threads using multiple CPU processor cores.
Lbench can make the following measurements for 1 to 9 parallel threads:
- integer-32 and float-64 arithmetic performance
- performance of common engineering functions (sqrt, sin, etc.)
- memory throughput for processor cache and main memory, for block moves,
4-byte sequential get/put moves, and 4-byte random get/put moves
- time required to acquire and release a mutex lock
- time required to start and complete a process thread
- time required to start and complete a sub-process
- disk throughput for serial and random I/O using various block sizes
(using direct to disk, without OS caching)
- classic Whetstone benchmark for type double (64 bit floating point)
- classic Linpack benchmark for type double
Lbench has two other functions not related to benchmarking:
- Cooling performance: run multiple CPU-bound threads, report processor core temperatures,
detect if the CPU clock is throttled down due to thermal overload.
- Memory test and burn-in: continuously fill memory with random values, read back and compare.
Conclusions: based on Intel Core i5 2.7 GHz with 4 processor cores and Linux kernel 4.2:
- For pure calculations, 4 threads have nearly 4 times the aggregate throughput of one thread.
- The cache memory throughput for 4 threads is nearly 4 times that of one thread.
- The main memory throughput for 4 threads is not much greater than 1 thread.
- Block memory moves are 15-48 times faster than 4-byte sequential get/put moves.
- 4-byte sequential get/put moves are 1 - 6 times faster than 4-byte random get/put moves.
- 4-byte sequential get/put moves have about the same speed for cache and main memory.
- Four threads contending for a mutex-controlled resource can swap ownership 9 million times per second.
time to create a thread which does nothing but exit is about 5.7
microseconds, increasing to 15 microseconds if four threads are
continuously starting and exiting.
- The time to
create a sub-process which does nothing but exit is about 0.8
milliseconds, increasing to 1.1 milliseconds if four threads are doing
The user guide is available from the [help] button and goes into more
detail about each benchmark and its configurable parameters.
measurements for one thread