Linux has had dynamic trace functionality for a long time in the form of kprobes. Kprobes provides a kernel API for placing probes on kernel instructions and they can be exploited directly via a kernel module, or via systemtap which provides a high level scripting language. Dynamic tracing has a number of advantages – it has zero overhead when disabled and probes can be placed on almost any instruction in the kernel, not just where a kernel developer thinks you should.
All this flexibility does have some downsides. An executed kprobe has a significant overhead since it uses breakpoints and exception handlers. Having said that, there are patches that avoid the breakpoint and instead branch directly to the handler. Another issue is probe placement; kprobes are easily placed at function entry and exit but if you need to probe inside a function or probe local variables then you really need systemtap and a kernel compiled with CONFIG_DEBUG_INFO. On the other hand a static tracepoint can be placed anywhere in a function and can be passed any important local variables. Various static tracepoint patches have been available for Linux, but as of 2.6.32 a complete implementation is in mainline.
Adding a static tracepoint is very simple, an example can be found here. In this case I am adding to an existing trace group (irq), so I only need the tracepoint definitions and the tracepoints themselves. An explanation of the 5 parts of a tracepoint definition can be found in linux/samples/trace_events/trace-events-sample.h. For more complicated scenarios, refer to the files in linux/samples/trace_events/
Using static tracepoints
There are only a few steps to make use of static tracepoints. First ensure that debugfs is mounted. Most distros mount it on /sys/kernel/debug:
# mount | grep debugfs debugfs on /sys/kernel/debug type debugfs (rw)
A list of available tracepoints can be found in tracing/available_events:
# cat /sys/kernel/debug/tracing/available_events skb:skb_copy_datagram_iovec skb:kfree_skb block:block_rq_remap block:block_remap block:block_split block:block_unplug_io block:block_unplug_timer ...
Since we added our tracepoints to the irq group, we can find them in tracing/events/irq:
# ls /sys/kernel/debug/tracing/events/irq/ enable irq_handler_entry softirq_entry tasklet_entry filter irq_handler_exit softirq_exit tasklet_exit
Enable the tasklet tracepoints:
# echo 1 > /sys/kernel/debug/tracing/events/irq/tasklet_entry/enable # echo 1 > /sys/kernel/debug/tracing/events/irq/tasklet_exit/enable
And the output is available in the trace buffer:
# cat /sys/kernel/debug/tracing/trace # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | -0 [000] 327.349213: tasklet_entry: func=.rpavscsi_task -0 [000] 327.349217: tasklet_exit: func=.rpavscsi_task
When finished, we can disable the tracepoints. There are enable files at all levels of the hierarchy, so we can disable all tracepoints in one go:
# echo 0 > /sys/kernel/debug/tracing/events/enable
Using static tracepoints in kernel modules
Kernel modules can also make use of static tracepoints. A simple module that hooks the tasklet_entry tracepoint and printks the function name of the tasklet might look like (I’ve called it tracepoint-example.c):
#include <linux/module.h> #include <trace/events/irq.h> static void probe_tasklet_entry(struct tasklet_struct *t) { printk("tasklet_entry %pf\n", t->func); } static int __init trace_init(void) { WARN_ON(register_trace_tasklet_entry(probe_tasklet_entry)); return 0; } static void __exit trace_exit(void) { unregister_trace_tasklet_entry(probe_tasklet_entry); } module_init(trace_init) module_exit(trace_exit) MODULE_LICENSE("GPL");
If you are wondering, %pf is a printk formatter trick to pretty print a function name so you don’t have to go searching for the address in System.map.
Here is a Makefile to go with it:
obj-m := tracepoint-example.o KDIR := /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules