Measuring Execution Time

In a programmer's daily work, there are times when it's necessary to improve an application's performance. This often involves identifying bottlenecks, and the best way to do so is by measuring their execution time. From another perspective, we might need to present the application's execution time for various reasons—whether for marketing purposes or as part of a school or university report. Let’s dive into this topic: how execution time is measured, the different approaches depending on specific needs, and how to implement these methods with concrete examples.

Before We Dive In...

The following article is a translation of a text I originally wrote in Polish for my blog. If you speak Polish, you might prefer reading it in its original language: https://swistak.codes/post/mierzenie-czasu-wykonania.

I’ve also written another article that covers the topic of hardware time measurement. If you’re curious about how computers measure time, you can check it out here: https://swistak.codes/post/jak-komputer-mierzy-czas. Unfortunately, this article is also only available in Polish, but you could try using a translator.

Profilers

If you are a professional programmer, the best way to measure application performance and identify problematic areas is by using profilers. These tools, often bundled with development environments, are designed specifically for this purpose.

The profilers I use (or have used) work by recording the application's activity. You start recording in the tool, perform the problematic action in the application, and then stop the recording. After analyzing the raw data, the tool generates a timeline of events, showing what happened step by step. You can usually examine everything in great detail to understand what triggered each action and how long it took.

Below is an example of the tool I use most frequently: the built-in profiler in Firefox. Specifically, I use its recording feature, while the results are visualized on a dedicated web page. This helps me locate problematic areas in web applications.

Profiling results for running the blog świstak.codes visualized using profiler.firefox.com.

It’s important to note that different profilers may operate differently and focus on various aspects of performance. If you’re working in a different environment, you might encounter completely different measurement methods. However, the general concept remains similar.

While I highly recommend using profilers for professional purposes, I understand why you might not always want to rely on them. Here’s why:

Profilers provide an overwhelming amount of information, which might not always be necessary.
Once you identify a problematic area, you may prefer to focus on testing just that part.
Profilers can slow down the entire application, making them unsuitable for scenarios like demonstrating how fast something works.

Manual Time Measurement

Let’s move on to manually measuring execution time. This method primarily involves calling appropriate functions to capture the time both before and after a function’s execution. By subtracting the two timestamps, we can calculate the execution time. It’s as simple as that.

However, it’s important to understand how to measure time effectively. Time can be obtained from various sources. The simplest method is to retrieve calendar time, but this comes with low resolution—accurate only to seconds or milliseconds. While milliseconds are acceptable in many cases, seconds often are not. Computers offer other options for handling this, using various hardware counters. Accessing these counters directly via assembly language is possible, but it’s generally not recommended. Instead, we’ll focus on solutions provided by programming languages or even operating systems. Later, I’ll explain why using calendar time for this purpose is usually a poor choice.

For testing, we will measure the execution time of the Ackermann function in its optimized iterative version (as described by J.W. Grossman and R.S. Zeitman in doi:10.1016/0304-3975(88)90046-1). I adapted the code to match the programming language being used. In the case of systems, I rewrote the code in scripting languages available in their command lines to avoid requiring compilation or additional installations. I haven’t included the implementations directly in this article, but you can find them in the linked Replit projects.

Operating Systems

Let’s start with operating systems, as this is the most universal method—it doesn’t require access to the code. It’s also the simplest way to measure the total execution time of an application.

We’ll cover two cases that cover the three most popular operating systems: Windows, Linux, and macOS.

Linux and macOS

Both Linux and macOS provide the time command. You simply prepend this command to the one you want to measure (script, application, or anything else), and once the application finishes executing, it outputs data about how long the execution took.

For testing purposes, I rewrote the algorithm as a Bash script, which you can find on Replit. If you don’t have access to Linux or macOS, you can fork this script on Replit and run it in the Shell tab. Just remember to grant execution permissions using chmod +x ./main.sh. Alternatively, if you're using Windows, you can run the script via WSL.

The output of the command on Linux looks as follows:

Screenshot from a Linux terminal showing the commands 'time ./main.sh 3 4' and 'time ./main.sh 4 1' executed.

And on macOS:

$Screenshot from PowerShell on Windows showing the commands 'Measure-Command { \main.ps1 3 4 }' and 'Measure-Command { \main.ps1 4 1 }' executed.$

Here’s a brief explanation of the times displayed:

real / total — The total time elapsed from the start to the end of the command execution.
user — The time the CPU spent executing the command’s code directly.
sys — The time the CPU spent performing system-level operations during the command execution.
cpu — The percentage of available CPU power utilized by the command during its execution.

It’s important to note that, while it might seem like user and sys times should add up to real, this isn’t always the case in practice. I encourage you to investigate why this discrepancy occurs—it's an interesting topic to explore!

Windows

On Windows, we can use Measure-Command, which functions similarly to the time command.

For testing purposes, I rewrote the algorithm as a PowerShell script, available on Replit. This script can also be executed on Linux (including Replit) and macOS by installing PowerShell and using it in the same way as demonstrated below for Windows. Before running the script, you need to enter PowerShell by executing the command pwsh.

On Windows, if you haven’t run a PowerShell script before, you may need to change the execution policy. To allow scripts from any source, run Set-ExecutionPolicy unrestricted as an administrator. For safety, remember to restore the original setting afterward by running Set-ExecutionPolicy default.

Let’s get started. Below is an example of invoking the command and its output:

$Screenshot from PowerShell on Windows showing the commands 'Measure-Command { \main.ps1 3 4 }' and 'Measure-Command { \main.ps1 4 1 }' executed.$

Here’s how to interpret the results:

Days, Hours, Minutes, Seconds, Milliseconds — The time it took to execute the command, broken down into these units. Note that these values are rounded to whole numbers.
Ticks — The most precise available unit of time used to measure the command’s execution. One tick equals 100 nanoseconds (0.1 microseconds), so in the example above, 851,907 ticks correspond to 85.1907 milliseconds.
TotalDays, TotalHours, TotalMinutes, TotalSeconds, TotalMilliseconds — The total execution time in the specified unit, provided without rounding (resulting in smaller values for days and hours).

Programming Languages

If we want to measure the execution time of a single function, it’s much simpler to implement time measurement directly within the code.

Here, I’ll demonstrate how to write a function that measures the execution time of another function. The function itself, along with its arguments, will be passed as parameters. I chose this approach to showcase how to handle argument passing in strongly typed languages, as it will also make it easier for you to adapt this to a no-argument version. Additionally, we’ll always measure time in the most precise unit available.

The programming languages I’ll demonstrate were chosen based on my experience writing algorithms where execution time measurement was essential. I believe this selection covers the most popular use cases, and the languages are presented in alphabetical order.

C

We’ll start with the most challenging case, as the C language is not particularly known for being developer-friendly. It’s also a common misconception that passing a function as an argument to another function isn’t possible in C. However, this is entirely incorrect—it’s achievable by properly defining the function type.

Below is the code for a function that measures the execution time of another function. It’s important to note that this method measures only the time during which the CPU is actively processing (similar to the user time in time), meaning it doesn’t account for time spent waiting on input/output operations.

#include <time.h>

// The first argument is a pointer to a function; the remaining are its arguments
// int (*func)(int, int) is a function returning int, taking two int arguments
void measure(int (*func)(int, int), int m, int n) {
  // ts1 and ts2 are of type timespec structure
  // They store time split into seconds and nanoseconds
  struct timespec ts1, ts2;
  // time will hold our calculated result
  // The variable will store a 64-bit unsigned number
  unsigned long long int time;
  // Capture the initial time moment
  // CLOCK_PROCESS_CPUTIME_ID retrieves CPU usage time for the process
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts1);
  // Execute the passed function
  int f_result = func(m, n);
  // Capture the time after execution
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts2);
  // Calculate the time difference
  time = 1e9 * ts2.tv_sec + ts2.tv_nsec - (1e9 * ts1.tv_sec + ts1.tv_nsec);
  // Do whatever you want with the time; here, I print it to the console with the result
  printf("Result: %d; time: %llu ns\n", f_result, time);
}

You can find the code along with an example of its usage on Replit. Note that this code has been tested on Linux and should also work on other POSIX-compliant systems (Unix, BSD, macOS, etc.). I haven’t tested it on Windows, but it likely won’t work with the MSVC compiler. However, it should function correctly under MinGW.

I’ll also mention that the CLOCK_PROCESS_CPUTIME_ID clock, at least on Linux, is managed by the system kernel and provides the most precise possible measurement. In earlier versions of Linux (prior to 2.6), it utilized the TSC (Time Stamp Counter). However, directly reading the TSC could produce inaccurate results when running code on multi-core architectures (essentially all modern architectures).

If you’re more interested in measuring the total execution time rather than just the time the CPU spends executing the function, consider using CLOCK_MONOTONIC (or, for Linux-specific use, the even more precise CLOCK_MONOTONIC_RAW). There are many other available clocks you can explore, which are described in the C documentation.

C#

In C#, unlike in C, we don’t need to worry about which specific counter to use for retrieving time or whether the method is supported by the operating system. The System.Diagnostics namespace provides the Stopwatch class, specifically designed for measuring time.

Similar to the example in C, I’ll demonstrate how to implement a method that measures the execution time of a passed function:

using System.Diagnostics;

// ...

// The first argument is the function whose time we will measure
// Func<int,int,int> means it returns int (the last argument) and takes two int arguments
public static void Measure(Func<int, int, int> func, int m, int n)
{
  // Create a new stopwatch and start it immediately
  var watch = Stopwatch.StartNew();
  // Execute the passed function
  var result = func(m, n);
  // Stop the stopwatch
  watch.Stop();
  // From the TimeSpan object in Elapsed, we can retrieve
  // the elapsed milliseconds as a floating-point value
  var timeMs = watch.Elapsed.TotalMilliseconds;
  // Alternatively, directly retrieve the number of ticks from the stopwatch (1 tick = 100 ns)
  // Choose which representation you prefer; the results are the same
  var timeTicks = watch.ElapsedTicks;
  // Do whatever you want with the time; here, I print it to the console
  Console.WriteLine($"Result: {result}; time: {timeMs}ms, {timeTicks} ticks");
}

You can find the code on Replit. However, if you run it there, keep in mind that the C# compiler tends to be relatively slow on that platform.

You may notice that the results in the code are very similar to those we observed in PowerShell. This is because PowerShell is built on the .NET platform, just like C#. As a result, both use the same highest-precision unit (ticks) and provide the TotalMilliseconds property. The object holding this data is of type TimeSpan, which is also what the Measure-Command function in PowerShell returns—an instance of the TimeSpan class.

JavaScript

In JavaScript, I’ll demonstrate two different methods for measuring time: one that works universally across browsers and Node.js (and is widely recommended), and another that is specific to Node.js.

Web Performance APIs

The first method uses the Web Performance APIs, a set of W3C-defined standards for accurately measuring the performance of JavaScript applications. For our purposes, we’ll use the performance object, specifically its now() method, which provides high-resolution and precise time measurements in milliseconds.

Here’s an example in code:

// Import is required only if we are in Node.js
// Browsers have `window.performance` available by default
const { performance } = require("node:perf_hooks");

// The function accepts any function and any number of arguments
function measure(func, ...args) {
  // Retrieve the current time from the appropriate counter
  const start = performance.now();
  // Execute the function
  const result = func(...args);
  // Retrieve the time again
  const end = performance.now();
  // Subtract the start time from the end time
  const time = end - start;
  // At this point, you can do whatever you want with the measurement
  // Here, I print it to the console
  console.log(`Result: ${result}; time: ${time} ms`);
}

You can find the code on Replit in a version runnable in Node.js, though it will also work in the browser. If you’re working with multiple threads, it’s worth noting that to ensure accurate measurements, you might want to include performance.timeOrigin in your calculations to synchronize measurements.

Alternatively, if you need to measure times in various places or configurations, it may be more convenient to use the performance.mark() function to create named timestamps, and then calculate metrics with performance.measure().

It’s also important to note that the accuracy of measurements depends on the environment. In Node.js, measurements are performed in nanoseconds, so the resulting value includes a fractional part. In contrast, measurements in the browser are rounded to milliseconds.

process.hrtime

If you’re working in Node.js, another option for measuring performance is process.hrtime. This method, available since the early days of Node.js, remains reliable and provides time measurements in nanoseconds.

Starting with Node.js 10, a new version of this method was introduced based on BigInt: process.hrtime.bigint(). Below is an example of how to use this function:

// Import hrtime from the process module
const { hrtime } = require("node:process");

// The function accepts any function and any number of arguments
function measure(func, ...args) {
  // Retrieve the current time from the appropriate counter
  const start = hrtime.bigint();
  // Execute the function
  const result = func(...args);
  // Retrieve the time again
  const end = hrtime.bigint();
  // Subtract the start time from the end time
  const time = end - start;
  // At this point, you can do whatever you want with the measurement
  // Here, I print it to the console
  console.log(`Result: ${result}; time: ${time} ns`);
}

As always, you can find the code on Replit. The usage is similar to other methods, differing only in the function being called. Keep in mind that the result here is of type BigInt.

The older version, process.hrtime(), which uses a standard numeric type, is marked as deprecated and may be removed in the future. However, it is still commonly encountered in existing projects. One notable feature of this version is its ability to accept a previous time measurement as an argument, returning the difference between the two. This simplifies calculations, as the result is an array, eliminating the need for custom arithmetic.

Python

time

Python, like the previous languages, supports time measurement with nanosecond precision. The following functions are available for this purpose:

time.perf_counter_ns(): Returns a time measurement that accounts for everything happening on the computer. A version without the _ns suffix is also available, which returns the time in seconds (while still maintaining nanosecond precision).
time.process_time_ns(): Works similarly to time measurement in C, measuring only the time during which the CPU was actively processing your application. As with the previous method, a version without the _ns suffix is also available.

In the code below, we’ll use the second method to replicate the behavior we implemented in C:

import time


# The function accepts a function and its arguments as parameters
def measure(func, *args):
  # Retrieve the start time from the process_time counter
  start = time.process_time_ns()
  # Execute the function
  result = func(*args)
  # Retrieve the end time
  end = time.process_time_ns()
  # Calculate the execution time
  total_time = end - start
  # You can do whatever you want with total_time here
  # I print it to the console
  print(f"Result: {result}; time: {total_time} ns")

You can find the complete code on Replit.

timeit

In Python, another way to measure time is with timeit. This module uses time.perf_counter under the hood to measure the performance of a selected piece of code by repeatedly running it and calculating the average execution time. It can be used from the command line, within code, or as a magic command in Jupyter (%timeit), making it especially convenient in Jupyter environments.

Although this approach slightly deviates from the previously shown pattern of measuring the execution time of a single run, its practicality warrants a demonstration. Below is an example of how to use this feature in Jupyter as a magic command:

You can explore the complete notebook on Google Colab, where you can run it yourself and see how it works.

This method is particularly useful because it provides ready-made statistics. In addition to the average execution time (in nanoseconds, microseconds, or milliseconds, depending on the result), it also calculates the standard deviation, offering a clearer picture of performance. More on this later.

Why Don’t We Subtract Calendar Time?

Looking at the code snippets above, you might wonder—why don’t we simply subtract the date after executing a function from the date before executing it? After all, we often have access to the date in the form of Unix time in seconds or even milliseconds. This question is particularly valid since many articles aimed at beginner programmers demonstrate time measurement in exactly this way.

Let me make this clear—measuring time this way isn’t inherently wrong. If you’re not looking for precise measurements and only need an approximate value, it’s perfectly adequate. However, when writing reports that compare performance or when precise measurements are crucial, it’s better to rely on solutions specifically designed for such tasks.

So, why is calendar time unsuitable? In short, it’s because it doesn’t increase monotonically. Not only can it move backward, but the duration of a time unit isn’t always consistent. This inconsistency arises because the operating system synchronizes the clock using the NTP protocol. And what does this synchronization cause?

Time jumps (forward or backward): From an ordinary user’s perspective, these changes may seem minor, typically within a few milliseconds. However, when measuring execution time, we operate precisely in these units, making such jumps problematic.
Daylight saving time and leap seconds: Calendar time includes two predictable jumps—switching between standard time and daylight saving time. Additionally, there is a third, less common jump caused by leap seconds. If measurements are run overnight on specific days, the results could be significantly distorted. Applications must be designed to handle these scenarios correctly, regardless of when they are executed.
Subtle synchronization adjustments: Synchronization doesn’t always result in a sudden time jump. In some systems, when the time difference is small, the clock is adjusted gradually by speeding it up or slowing it down instead of directly setting a new time. For example, Linux’s ntpd behaves this way for differences below 128 milliseconds.
Leap smear: Speaking of gradual adjustments, leap seconds are sometimes handled similarly through a process known as leap smear. While rare, this typically occurs in cloud environments and affects only one day. Nevertheless, it’s another reason why calendar time is unreliable for measuring execution time.

How often does synchronization occur? On Linux, for example, the background ntpd service synchronizes the system clock every 64 to 1024 seconds, with the exact frequency varying based on system conditions.

In contrast to calendar time, time retrieved from other counters is monotonic. This means it always increases (or stays the same) and is not subject to corrections like synchronization adjustments. Additionally, monotonic time has no direct connection to calendar time.

Results of executing two commands in the NodeJS 18 console. The first command is new Date(), and the result is 2023-12-26T18:56:51.448Z. The second command is new Date(Number(process.hrtime.bigint() / 1000000n)), and the result is 1970-01-01T11:26:30.785Z. The difference between calendar time and the reading from the high-precision timer in NodeJS converted to a calendar date. In this case, the hrtime counter has only measured 11 hours since the start of the Unix epoch, which is very far from the actual date when the command was executed.

How to Correctly Interpret Results?

Once we’ve measured the execution time of a program or algorithm, the results can be used for various purposes. However, it’s crucial to correctly interpret the results, particularly when the measurements are intended for a report.

I’ll share some guidelines shortly, which will include a few statistical terms. Don’t worry—I’ll explain these terms later, along with the relevant formulas.

Guidelines

Below are some tips on how to make the most of your time measurements:

Repeat measurements multiple times: Perform at least 10 repetitions; the more, the better. Repeating measurements helps reduce random variations and provides more reliable results.
Represent results meaningfully:
- Graphical representation: A box plot is an excellent choice. It doesn’t require data processing and visually highlights outliers (as whiskers or dots, depending on settings), the interquartile range (the box), and the median (a line inside the box).
- Textual representation: For numerical presentation, calculate the arithmetic mean and the standard deviation (estimated using the square root of the unbiased variance estimator). This method is simple and offers a low estimation error.
- Handling many values: If a box plot becomes unreadable due to numerous data points, consider calculating the arithmetic mean for other plot types (e.g., scatter plots). However, be cautious of outliers, especially with fewer measurements. To mitigate their impact, you can discard the smallest and largest values or focus on values between the first and third quartile.
Maintain consistent conditions: Run measurements on the same hardware with a similar system load to ensure comparable results.
Focus on trends, not absolute values: Time measurements depend on various factors, so it’s better to focus on trends or percentage differences.
- For a single algorithm with different inputs: Analyze how execution times grow. For example, use scatter or line plots to observe whether growth is linear, logarithmic, exponential, etc. Consider using logarithmic scaling for the time axis if necessary.
- When comparing algorithms:
  - If both algorithms can accept the same inputs, measure execution times for identical inputs and compare their growth patterns, preferably on the same chart.
  - If arguments differ or growth patterns are the same, compare their relative performance, e.g., "Algorithm X is, on average, twice as fast as Algorithm Y."

A great example of effective time measurements and their presentation is Python’s timeit, as demonstrated earlier. It repeats the algorithm’s execution multiple times and reports the average execution time along with the standard deviation, providing meaningful and accurate results.

Explanation of Terms

Now I’ll quickly explain the statistical terms mentioned above, in the order they appeared. These definitions are intended to be intuitive and practical, based on how I use these terms in daily practice. Where necessary, I’ll also include relevant formulas.

Quartiles: Imagine all the measurements (there are $n$ $n$ of them) are sorted. Then calculate the positions $\lfloor\frac{1}{4}n\rfloor$ $⌊ \frac{1}{4} n ⌋$ , $\lfloor\frac{1}{2}n\rfloor$ $⌊ \frac{1}{2} n ⌋$ , $\lfloor\frac{3}{4}n\rfloor$ $⌊ \frac{3}{4} n ⌋$ , and find the measurements at these positions (counting from 1).
- The value at $\frac{1}{4}$ is the first quartile ( $Q_1$ ), at $\frac{1}{2}$ is the second quartile ( $Q_2$ ), and at $\frac{3}{4}$ is the third quartile ( $Q_3$ ).
- Note: The positions described here are not universal. Different methods for calculating quartile values can result in slight variations in the positions or values.
- Some tools (e.g., Excel) may also include a zeroth quartile (the minimum value) and a fourth quartile (the maximum value).
- Important: There’s a similar concept called a quantile, but it is not the same. A quantile is a value corresponding to a specific fraction of the data. For example, a quantile of order $\lfloor\frac{1}{4}\rfloor$ is the same as the first quartile, but a quantile of order $\lfloor\frac{1}{100}\rfloor$ doesn’t correspond to any quartile.
Median: The value located exactly in the middle of the sorted list of measurements—equivalent to the second quartile ( $Q_2$ $Q_{2}$ ).
- If the number of elements ( $n$ ) is even, the median is typically calculated as the arithmetic mean of the two middle elements.
Arithmetic Mean: The most commonly known average, calculated as the sum of all numbers divided by their count:

\overline{x} = \frac{1}{n}\left(\sum_{i=1}^{n}{x_{i}}\right)

Standard Deviation: A measure of how widely values are dispersed around the mean. Since we’re working with a statistical sample (a subset of all possible values), the standard deviation is approximate and calculated using estimators. Various formulas exist for this purpose.
Square Root of the Unbiased Variance Estimator: In my view, this is the simplest standard deviation estimator. It is calculated using the formula:

\sigma = \sqrt{\frac{1}{n} \sum_{i=1}^{n}\left( x_i - \overline{x} \right)^2}

Summary

This brings us to the end of what I wanted to share about measuring the execution time of applications and algorithms. In my opinion, the most important takeaway is to use the right tool for the specific purpose of the measurement.

If the goal is to identify problematic areas in an application, profilers are the best choice. However, if the problematic areas are already known and the objective is to evaluate improvements, or if the focus is on performance research, specialized clocks that provide monotonic time should be used. Remember, calendar time is generally unreliable in these contexts.

Literature

Profiling (computer programming), https://en.wikipedia.org/w/index.php?title=Profiling_(computer_programming)&oldid=1154740224 (last visited: 2023-12-26).
time(1) - Linux man page, https://linux.die.net/man/1/time (last visited: 2023-12-26).
Measure-Command (Microsoft.PowerShell.Utility) - PowerShell | Microsoft Learn, https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/measure-command?view=powershell-7.4 (last visited: 2023-12-26).
clock_gettime(3) - Linux manual page, https://man7.org/linux/man-pages/man3/clock_gettime.3.html (last visited: 2023-12-26).
Stopwatch Class (System.Diagnostics) | Microsoft Learn, https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.stopwatch?view=net-8.0 (last visited: 2023-12-26).
A Primer for Web Performance Timing APIs, https://w3c.github.io/perf-timing-primer/ (last visited: 2023-12-26).
Performance measurement APIs | Node.js v21.5.0 Documentation, https://nodejs.org/api/perf_hooks.html (last visited: 2023-12-26).
Process | Node.js v21.5.0 Documentation, https://nodejs.org/api/process.html (last visited: 2023-12-26).
time — Time access and conversions — Python 3.12.1 documentation, https://docs.python.org/3.12/library/time.html (last visited: 2023-12-26).
timeit — Measure execution time of small code snippets — Python 3.12.1 documentation, https://docs.python.org/3.12/library/timeit.html (last visited: 2023-12-26).
Built-in magic commands — IPython 8.19.0 documentation, https://ipython.readthedocs.io/en/stable/interactive/magics.html (last visited: 2023-12-26).
High precision timing - Web APIs | MDN, https://developer.mozilla.org/en-US/docs/Web/API/Performance_API/High_precision_timing (last visited: 2023-12-26).
Pandzic I., Clock Synchronization and Monotonic Clocks, https://inelpandzic.com/articles/clock-synchronization-and-monotonic-clocks/ (last visited: 2023-12-26).
ntpd(8): Network Time Protocol daemon - Linux man page, https://linux.die.net/man/8/ntpd (last visited: 2023-12-26).