Introducing etrace – a multi-purpose application profiling tool
by Ian Johnson on 30 October 2020
These days, the internal workings of Linux applications involve many different moving parts. Sometimes, it can be rather difficult to debug them when things go wrong or run slower than expected. Tracing an application’s execution is one way of understanding potential issues without diving into the source code. To this end, we wrote an app-tracing tool called etrace, designed to detect performance bottlenecks and runtime issues in snaps.
In this article, we will be taking a look at etrace with an overview of the basic functionality of etrace, and highlight its usage through several representative examples.
Etrace at a glance
Etrace is a generic tracing application, useful for three broad measurement and debugging purposes:
- How much time it takes for an application to display a (graphical/UI) window on a screen.
- The sequence of tasks created and executed by the main program during its runtime.
- The list of files accessed during a program’s execution.
These metrics can be used to debug potential problems in snaps, understand what a snap is trying to do, or find performance bottlenecks in a snap. Of course, it also works with native Linux packages or any executable program as well, albeit with slightly reduced out of the box functionality – it doesn’t reinstall the native package for example – but can still trace the application and measure how long it takes to display a window.
Basic usage
Etrace is available as a snap – first we have to install it. Because etrace is used to run arbitrary programs, including other snaps and even traditional linux packages, it needs system-wide permissions via classic confinement, which can be accepted by using the –classic flag when running the below command. To install etrace:
snap install etrace --candidate --classic
Note that currently, etrace only works with X11 systems.
The first use case of etrace is to measure how long it takes for a graphical snap application to display a window on a screen. We included this functionality in etrace partly because there were not any sufficiently effective tools or options to achieve this available out there. With etrace, it is very easy to do this.
Let’s start out with a simple snap, gnome-calculator, and run it in a loop 10 times to see how much time this execution takes. Please note you need to have gnome-calculator installed – snap install gnome-calculator. Here we use the –no-trace option because we don’t want the full tracing stack, we just want etrace to measure how long it takes to launch – we will get into the full tracing capabilities later on.
$ etrace --repeat=10 exec --use-snap-run --no-trace gnome-calculator --cmd-stderr=/dev/null
Total startup time: 1.531152957s
Total startup time: 513.948576ms
Total startup time: 512.980061ms
Total startup time: 515.576753ms
Total startup time: 508.354472ms
Total startup time: 515.734329ms
Total startup time: 508.414271ms
Total startup time: 514.258788ms
Total startup time: 508.407346ms
Total startup time: 511.950964ms
When you run this command, you should see the gnome-calculator window show up very briefly 10 times. If this is the first time you have run gnome-calculator on this boot, it may be slower for reasons we explain in another blog post. The numbers shown indicate elapsed time it takes for the application to display a window. For non-graphical applications, it will be the time it takes before the application exits by itself.
This use case itself is great if you want to check snap startup time. Snap developers can use this first etrace functionality to help optimize their startup times and compare different application settings. Etrace also supports JSON output for integration with other scripts as well as CI, build systems for testing and metrics collection/data gathering.
Useful Snap Options
As mentioned earlier, snap applications may be slower to start on the very first execution due to system caches not being set up. Re-running the application multiple times will usually erase this one-time cost, but when measuring how long it takes for a snap to start, we want to specifically look at this worst case scenario when data is not cached in memory and the system has to do the maximum amount of work before the user sees the application. To accommodate this frequent use-case with etrace, we have two options which make this much easier. The first is the ability to reinstall the snap:
--reinstall-snap
This option will remove the snap (and thus delete its root-owned data in system directories, but not user-data), and then reinstall the snap. This clears out system caches like fonts, internal databases, and other data that is typically owned by root, but does not delete data that is specific to a user, such as browsing history.
The second option allows you to handle user data:
--clean-snap-user-data
This flag will create a snapshot of the snap before deleting the user data. As mentioned, an example of user data would be a Web browser profile.
With both of these flags specified in the etrace command we clear more of the system’s cache and thus see more consistent startup times for gnome-calculator on the same machine:
$ etrace --repeat=10 exec --use-snap-run --no-trace gnome-calculator --reinstall-snap --clean-snap-user-data --cmd-stderr=/dev/null
Total startup time: 1.524554867s
Total startup time: 1.531155857s
Total startup time: 1.526059853s
Total startup time: 1.533270808s
Total startup time: 1.535193187s
Total startup time: 1.538566123s
Total startup time: 1.529733654s
Total startup time: 1.528315879s
Total startup time: 1.53689303s
Total startup time: 1.023516385s
Here, we see that the startup times are slower than before as well as more consistent, due to the elimination of the caching that the system was doing in between runs. It is also worth noting that in the previous example, the startup was much slower than all the other ones because it was the first execution of gnome-calculator in this desktop session.
If your application requires additional things to be done before/after starting it up either to test a specific scenario or to clear out more caching that might be taking place, etrace supports running generic scripts via the –prepare-script and –restore-script options as well as –prepare-script-args and –restore-script-args.
Tracing task execution
The second use case for etrace is to check what tasks or processes a snap runs during its execution. This is useful to examine the snap’s behavior and troubleshoot snaps that do run well, fail to launch or provide no meaningful output.
While etrace normally will automatically close the graphical window created by the program, sometimes it’s necessary to not have this behavior, for example to test a specific action in the application or when working with a command line application which does not create a window. For these use cases, there is the –no-window-wait option to instruct etrace to wait for the program to exit by itself.
Here we use etrace exec with full tracing turned on (by omitting the –no-trace option):
$ etrace exec telegram-desktop --no-window-wait
56 exec calls during snap run:
Start Stop Elapsed Exec
0 95417 95.417022ms /snap/bin/telegram-desktop
14991 20267 5.276918ms /usr/lib/snapd/snap-seccomp
38522 39649 1.127004ms /usr/lib/snapd/snap-device-helper
40294 41350 1.055955ms /usr/lib/snapd/snap-device-helper
41984 43035 1.051902ms /usr/lib/snapd/snap-device-helper
43688 44741 1.053094ms /usr/lib/snapd/snap-device-helper
...
82329 83477 1.148939ms /usr/lib/snapd/snap-device-helper
88778 93460 4.682064ms snap-update-ns
95417 100613 5.196094ms /usr/lib/snapd/snap-exec
100613 212749 112.13684ms /snap/telegram-desktop/1708/bin/desktop-launch
105275 107645 2.36988ms /usr/bin/date
115309 118616 3.30615ms /usr/bin/getent
...
120239 122471 2.232074ms /usr/bin/md5sum
192968 196316 3.347873ms /usr/bin/head
199725 203120 3.395795ms /usr/bin/ln
204533 207864 3.331899ms /usr/bin/rm
208199 211477 3.277063ms /usr/bin/ln
212749 6000720 5.787970066s /snap/telegram-desktop/1708/usr/bin/telegram-desktop
Total time: 6.000720024s
Total startup time: 6.008373172s
This is a table showing the total set of tasks that were run during the program’s full execution, including the start time and end time (in microseconds) as well as the total elapsed time for the task execution. This may help in debugging problems with broken snaps, for example an intermediate shell script may be misbehaving before the final program is executed, and this would be apparent to the snap developer who has an expectation of what the the snap should be doing in the normal case.
Tracing file accesses
The third use-case of etrace is to see what files are accessed by a snap at runtime. For example, classic snaps, since they are not confined, may access any file on a user’s system and thus can sometimes run into bugs where they are reading the ‘wrong’ file from the host and not those shipped with the snap itself, the way that snaps are designed to work. This third functionality is also helpful if you’re trying to understand the full scope of a program’s execution. It can also be used to get an explicit list of files that a graphical snap tries to access before displaying a window.
To do this, use the file subcommand. Here we will look at a simple program, hello-world, which accesses a small number of files (many snaps end up accessing hundreds of files):
$ ./etrace file --use-snap-run hello-world --no-window-wait
Hello World!
6 files accessed during snap run:
Filename Size (bytes)
/dev/pts/0 0
/etc/ld.so.cache 157250
/home/user/ 4096
/lib/x86_64-linux-gnu/libc-2.23.so
/lib/x86_64-linux-gnu/libc.so.6 2030544
/snap/hello-world/29/bin/echo 31
Total startup time: 87.038215ms
This will show us the files that were accessed as well as their sizes. There are options to also show what program accessed which file, filter the files by their location, or filter the files by what programs accessed those files, similar to the lsof command. Etrace also supports a rich JSON output format with details such as when the file was accessed, what syscall accessed the file and others.
Summary
Hopefully you find this blog post and etrace useful the next time you need to measure how long snaps take to startup, or debug problems with your snap applications, or even just peek at what is going on under the hood of your favorite snap.
Etrace is undergoing active development and welcomes contributions! We will be posting more about etrace in the coming future with some more use cases and stories. As always, come join us on the snapcraft forum if you have comments or suggestions.
Photo by Steve Harvey on Unsplash.