Linux Static Tracepoints

Linux has had dynamic trace functionality for a long time in the form of kprobes. Kprobes provides a kernel API for placing probes on kernel instructions and they can be exploited directly via a kernel module, or via systemtap which provides a high level scripting language. Dynamic tracing has a number of advantages – it has zero overhead when disabled and probes can be placed on almost any instruction in the kernel, not just where a kernel developer thinks you should.

All this flexibility does have some downsides. An executed kprobe has a significant overhead since it uses breakpoints and exception handlers. Having said that, there are patches that avoid the breakpoint and instead branch directly to the handler. Another issue is probe placement; kprobes are easily placed at function entry and exit but if you need to probe inside a function or probe local variables then you really need systemtap and a kernel compiled with CONFIG_DEBUG_INFO. On the other hand a static tracepoint can be placed anywhere in a function and can be passed any important local variables. Various static tracepoint patches have been available for Linux, but as of 2.6.32 a complete implementation is in mainline.

Adding a static tracepoint is very simple, an example can be found here. In this case I am adding to an existing trace group (irq), so I only need the tracepoint definitions and the tracepoints themselves. An explanation of the 5 parts of a tracepoint definition can be found in linux/samples/trace_events/trace-events-sample.h. For more complicated scenarios, refer to the files in linux/samples/trace_events/

Using static tracepoints

There are only a few steps to make use of static tracepoints. First ensure that debugfs is mounted. Most distros mount it on /sys/kernel/debug:

# mount | grep debugfs

debugfs on /sys/kernel/debug type debugfs (rw)

A list of available tracepoints can be found in tracing/available_events:

# cat /sys/kernel/debug/tracing/available_events

skb:skb_copy_datagram_iovec
skb:kfree_skb
block:block_rq_remap
block:block_remap
block:block_split
block:block_unplug_io
block:block_unplug_timer
...

Since we added our tracepoints to the irq group, we can find them in tracing/events/irq:

# ls /sys/kernel/debug/tracing/events/irq/

enable  irq_handler_entry  softirq_entry  tasklet_entry
filter  irq_handler_exit   softirq_exit   tasklet_exit

Enable the tasklet tracepoints:

# echo 1 >  /sys/kernel/debug/tracing/events/irq/tasklet_entry/enable
# echo 1 >  /sys/kernel/debug/tracing/events/irq/tasklet_exit/enable

And the output is available in the trace buffer:

# cat /sys/kernel/debug/tracing/trace

# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
-0     [000]   327.349213: tasklet_entry: func=.rpavscsi_task
-0     [000]   327.349217: tasklet_exit: func=.rpavscsi_task

When finished, we can disable the tracepoints. There are enable files at all levels of the hierarchy, so we can disable all tracepoints in one go:

# echo 0 > /sys/kernel/debug/tracing/events/enable

Using static tracepoints in kernel modules

Kernel modules can also make use of static tracepoints. A simple module that hooks the tasklet_entry tracepoint and printks the function name of the tasklet might look like (I’ve called it tracepoint-example.c):

#include <linux/module.h>
#include <trace/events/irq.h>

static void probe_tasklet_entry(struct tasklet_struct *t)
{
        printk("tasklet_entry %pf\n", t->func);
}

static int __init trace_init(void)
{
        WARN_ON(register_trace_tasklet_entry(probe_tasklet_entry));
        return 0;
}

static void __exit trace_exit(void)
{
        unregister_trace_tasklet_entry(probe_tasklet_entry);
}

module_init(trace_init)
module_exit(trace_exit)
MODULE_LICENSE("GPL");

If you are wondering, %pf is a printk formatter trick to pretty print a function name so you don’t have to go searching for the address in System.map.

Here is a Makefile to go with it:

obj-m := tracepoint-example.o
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
default:
        $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules