linux/samples/bpf/tracex3_kern.c

90 lines
2.1 KiB
C
Raw Normal View History

samples/bpf: Add IO latency analysis (iosnoop/heatmap) tool BPF C program attaches to blk_mq_start_request()/blk_update_request() kprobe events to calculate IO latency. For every completed block IO event it computes the time delta in nsec and records in a histogram map: map[log10(delta)*10]++ User space reads this histogram map every 2 seconds and prints it as a 'heatmap' using gray shades of text terminal. Black spaces have many events and white spaces have very few events. Left most space is the smallest latency, right most space is the largest latency in the range. Usage: $ sudo ./tracex3 and do 'sudo dd if=/dev/sda of=/dev/null' in other terminal. Observe IO latencies and how different activity (like 'make kernel') affects it. Similar experiments can be done for network transmit latencies, syscalls, etc. '-t' flag prints the heatmap using normal ascii characters: $ sudo ./tracex3 -t heatmap of IO latency # - many events with this latency - few events |1us |10us |100us |1ms |10ms |100ms |1s |10s *ooo. *O.#. # 221 . *# . # 125 .. .o#*.. # 55 . . . . .#O # 37 .# # 175 .#*. # 37 # # 199 . . *#*. # 55 *#..* # 42 # # 266 ...***Oo#*OO**o#* . # 629 # # 271 . .#o* o.*o* # 221 . . o* *#O.. # 50 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1427312966-8434-9-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-25 20:49:25 +01:00
/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <linux/skbuff.h>
#include <linux/netdevice.h>
#include <linux/version.h>
#include <uapi/linux/bpf.h>
#include "bpf_helpers.h"
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(long),
.value_size = sizeof(u64),
.max_entries = 4096,
};
/* kprobe is NOT a stable ABI. If kernel internals change this bpf+kprobe
* example will no longer be meaningful
*/
SEC("kprobe/blk_mq_start_request")
int bpf_prog1(struct pt_regs *ctx)
{
long rq = PT_REGS_PARM1(ctx);
samples/bpf: Add IO latency analysis (iosnoop/heatmap) tool BPF C program attaches to blk_mq_start_request()/blk_update_request() kprobe events to calculate IO latency. For every completed block IO event it computes the time delta in nsec and records in a histogram map: map[log10(delta)*10]++ User space reads this histogram map every 2 seconds and prints it as a 'heatmap' using gray shades of text terminal. Black spaces have many events and white spaces have very few events. Left most space is the smallest latency, right most space is the largest latency in the range. Usage: $ sudo ./tracex3 and do 'sudo dd if=/dev/sda of=/dev/null' in other terminal. Observe IO latencies and how different activity (like 'make kernel') affects it. Similar experiments can be done for network transmit latencies, syscalls, etc. '-t' flag prints the heatmap using normal ascii characters: $ sudo ./tracex3 -t heatmap of IO latency # - many events with this latency - few events |1us |10us |100us |1ms |10ms |100ms |1s |10s *ooo. *O.#. # 221 . *# . # 125 .. .o#*.. # 55 . . . . .#O # 37 .# # 175 .#*. # 37 # # 199 . . *#*. # 55 *#..* # 42 # # 266 ...***Oo#*OO**o#* . # 629 # # 271 . .#o* o.*o* # 221 . . o* *#O.. # 50 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1427312966-8434-9-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-25 20:49:25 +01:00
u64 val = bpf_ktime_get_ns();
bpf_map_update_elem(&my_map, &rq, &val, BPF_ANY);
return 0;
}
static unsigned int log2l(unsigned long long n)
{
#define S(k) if (n >= (1ull << k)) { i += k; n >>= k; }
int i = -(n == 0);
S(32); S(16); S(8); S(4); S(2); S(1);
return i;
#undef S
}
#define SLOTS 100
struct bpf_map_def SEC("maps") lat_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u64),
.max_entries = SLOTS,
};
SEC("kprobe/blk_update_request")
int bpf_prog2(struct pt_regs *ctx)
{
long rq = PT_REGS_PARM1(ctx);
samples/bpf: Add IO latency analysis (iosnoop/heatmap) tool BPF C program attaches to blk_mq_start_request()/blk_update_request() kprobe events to calculate IO latency. For every completed block IO event it computes the time delta in nsec and records in a histogram map: map[log10(delta)*10]++ User space reads this histogram map every 2 seconds and prints it as a 'heatmap' using gray shades of text terminal. Black spaces have many events and white spaces have very few events. Left most space is the smallest latency, right most space is the largest latency in the range. Usage: $ sudo ./tracex3 and do 'sudo dd if=/dev/sda of=/dev/null' in other terminal. Observe IO latencies and how different activity (like 'make kernel') affects it. Similar experiments can be done for network transmit latencies, syscalls, etc. '-t' flag prints the heatmap using normal ascii characters: $ sudo ./tracex3 -t heatmap of IO latency # - many events with this latency - few events |1us |10us |100us |1ms |10ms |100ms |1s |10s *ooo. *O.#. # 221 . *# . # 125 .. .o#*.. # 55 . . . . .#O # 37 .# # 175 .#*. # 37 # # 199 . . *#*. # 55 *#..* # 42 # # 266 ...***Oo#*OO**o#* . # 629 # # 271 . .#o* o.*o* # 221 . . o* *#O.. # 50 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1427312966-8434-9-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-25 20:49:25 +01:00
u64 *value, l, base;
u32 index;
value = bpf_map_lookup_elem(&my_map, &rq);
if (!value)
return 0;
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - *value;
bpf_map_delete_elem(&my_map, &rq);
/* the lines below are computing index = log10(delta)*10
* using integer arithmetic
* index = 29 ~ 1 usec
* index = 59 ~ 1 msec
* index = 89 ~ 1 sec
* index = 99 ~ 10sec or more
* log10(x)*10 = log2(x)*10/log2(10) = log2(x)*3
*/
l = log2l(delta);
base = 1ll << l;
index = (l * 64 + (delta - base) * 64 / base) * 3 / 64;
if (index >= SLOTS)
index = SLOTS - 1;
value = bpf_map_lookup_elem(&lat_map, &index);
if (value)
__sync_fetch_and_add((long *)value, 1);
return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;