{"id":5,"date":"2009-09-04T11:34:00","date_gmt":"2009-09-04T01:34:00","guid":{"rendered":"http:\/\/anton.ozlabs.org\/blog\/?p=5"},"modified":"2010-10-21T09:14:12","modified_gmt":"2010-10-20T23:14:12","slug":"using-performance-counters-for-linux","status":"publish","type":"post","link":"https:\/\/anton.ozlabs.org\/blog\/2009\/09\/04\/using-performance-counters-for-linux\/","title":{"rendered":"Using Performance Counters for Linux"},"content":{"rendered":"<p>The 2.6.31 Linux kernel will add a new performance counter subsystem called Performance Counters for Linux (or perfcounters for short). To use perfcounters, build a kernel with:<\/p>\n<pre>CONFIG_PERF_COUNTERS=y<\/pre>\n<p>You will need elfutils and optionally binutils (for c++ function unmangling). On debian or ubuntu:<\/p>\n<pre>apt-get install libelf-dev binutils-dev<\/pre>\n<p>The tools must be built 64bit on a 64bit kernel. If you have a mixed 64bit kernel\/32bit userspace (like some amd64 and ppc64 distros) then build a 64bit version of elfutils. I usually don&#8217;t bother building the optional 64bit binutils in this case and just put up with mangled c++ names (hint: feed them into c++filt to demangle them). Now build the perf tool:<\/p>\n<pre># cd tools\/perf\r\n# make<\/pre>\n<p>Now we can use the tools to debug a performance issue I was seeing in 2.6.31. A simple page fault <a href=\"http:\/\/ozlabs.org\/~anton\/junkcode\/page_fault.c\">microbenchmark<\/a> was showing scalability issues when running multiple copies at once. When looking into performance issues in the kernel, perf top is a good place to start. It gives a constantly updating kernel profile:<\/p>\n<pre># perf top<\/pre>\n<p style=\"text-align: center;\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-8 aligncenter\" title=\"perf top output\" src=\"http:\/\/anton.ozlabs.org\/wp-uploads\/2009\/08\/perf_top.png\" alt=\"perf top output\" width=\"667\" height=\"414\" srcset=\"https:\/\/anton.ozlabs.org\/wp-uploads\/2009\/08\/perf_top.png 667w, https:\/\/anton.ozlabs.org\/wp-uploads\/2009\/08\/perf_top-300x186.png 300w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/p>\n<p>We are spending over 70% of total time in _spin_lock, so we definitely have an issue that warrants further investigation. To get a more detailed view we can use the perf record tool. The -g option records backtraces which allows us to look at the call graphs responsible for the performance issue:<\/p>\n<pre># perf record -g .\/pagefault<\/pre>\n<p>You can either let the profiled application run to completion, or since this microbenchmark will run forever we can just wait 10 seconds and hit ctrl-c. Two more perf record options you will find useful, is -p to profile a running process and -a to profile the entire system.<\/p>\n<p>Now we have a perf.data output file. I like to start with a high level summary of the recorded data first:<\/p>\n<pre># perf report -g none<\/pre>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-18\" title=\"perf report\" src=\"http:\/\/anton.ozlabs.org\/wp-uploads\/2009\/08\/perf_report.png\" alt=\"perf report\" width=\"747\" height=\"414\" srcset=\"https:\/\/anton.ozlabs.org\/wp-uploads\/2009\/08\/perf_report.png 747w, https:\/\/anton.ozlabs.org\/wp-uploads\/2009\/08\/perf_report-300x166.png 300w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/p>\n<p>The perf report tool gives us some more important information that perf top does not &#8211; it shows the task associated with the function and it also profiles userspace.<\/p>\n<p>Now we have confirmed that our trace has captured the _spin_lock issue, we can look at the call graph data to see what path is causing the problem:<\/p>\n<pre># perf report<\/pre>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-20\" title=\"perf_report_callgraph\" src=\"http:\/\/anton.ozlabs.org\/wp-uploads\/2009\/08\/perf_report_callgraph.png\" alt=\"perf_report_callgraph\" width=\"979\" height=\"782\" srcset=\"https:\/\/anton.ozlabs.org\/wp-uploads\/2009\/08\/perf_report_callgraph.png 979w, https:\/\/anton.ozlabs.org\/wp-uploads\/2009\/08\/perf_report_callgraph-300x239.png 300w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/p>\n<p>At this point its clear that the problem is a spin lock in the memory cgroup code. In order to keep accurate memory usage statistics, the current code uses a global spinlock. One way we can fix this is to use percpu_counters, which Balbir has been working on <a href=\"http:\/\/lwn.net\/Articles\/346304\/\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The 2.6.31 Linux kernel will add a new performance counter subsystem called Performance Counters for Linux (or perfcounters for short). To use perfcounters, build a kernel with: CONFIG_PERF_COUNTERS=y You will need elfutils and optionally binutils (for c++ function unmangling). On debian or ubuntu: apt-get install libelf-dev binutils-dev The tools must be built 64bit on a &hellip; <a href=\"https:\/\/anton.ozlabs.org\/blog\/2009\/09\/04\/using-performance-counters-for-linux\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Using Performance Counters for Linux&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[5,4],"_links":{"self":[{"href":"https:\/\/anton.ozlabs.org\/blog\/wp-json\/wp\/v2\/posts\/5"}],"collection":[{"href":"https:\/\/anton.ozlabs.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/anton.ozlabs.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/anton.ozlabs.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/anton.ozlabs.org\/blog\/wp-json\/wp\/v2\/comments?post=5"}],"version-history":[{"count":57,"href":"https:\/\/anton.ozlabs.org\/blog\/wp-json\/wp\/v2\/posts\/5\/revisions"}],"predecessor-version":[{"id":65,"href":"https:\/\/anton.ozlabs.org\/blog\/wp-json\/wp\/v2\/posts\/5\/revisions\/65"}],"wp:attachment":[{"href":"https:\/\/anton.ozlabs.org\/blog\/wp-json\/wp\/v2\/media?parent=5"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/anton.ozlabs.org\/blog\/wp-json\/wp\/v2\/categories?post=5"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/anton.ozlabs.org\/blog\/wp-json\/wp\/v2\/tags?post=5"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}