perf-report (1) - Linux Manuals

perf-report: Read perf.data (created by perf record) and display the profile

Command to display perf-report manual in Linux: $ man 1 perf-report

NAME

perf-report - Read perf.data (created by perf record) and display the profile

SYNOPSIS

perf report [-i <file> | --input=file]

DESCRIPTION

This command displays the performance counter profile information recorded via perf record.

OPTIONS

-i, --input=

: Input file name. (default: perf.data unless stdin is a fifo)

-v, --verbose

: Be more verbose. (show symbol address, etc)

-n, --show-nr-samples

: Show the number of samples for each symbol

--showcpuutilization

: Show sample percentage for different cpu modes.

-T, --threads

: Show per-thread event counters

-c, --comms=

: Only consider symbols in these comms. CSV that understands m[blue]file://filenamem[] entries. This option will affect the percentage of the overhead column. See --percentage for more info.

-d, --dsos=

: Only consider symbols in these dsos. CSV that understands m[blue]file://filenamem[] entries. This option will affect the percentage of the overhead column. See --percentage for more info.

-S, --symbols=

: Only consider these symbols. CSV that understands m[blue]file://filenamem[] entries. This option will affect the percentage of the overhead column. See --percentage for more info.

--symbol-filter=

: Only show symbols that match (partially) with this filter.

-U, --hide-unresolved

: Only display entries resolved to a symbol.

-s, --sort=

Sort histogram entries by given key(s) - multiple keys can be specified in CSV format. Following sort keys are available: pid, comm, dso, symbol, parent, cpu, srcline, weight, local_weight.

Each key has following meaning:

: • comm: command (name) of the task which can be read via /proc/<pid>/comm

: • pid: command and tid of the task

: • dso: name of library or module executed at the time of sample

: • symbol: name of function executed at the time of sample

: • parent: name of function matched to the parent regex filter. Unmatched entries are displayed as "[other]".

: • cpu: cpu number the task ran at the time of sample

: • srcline: filename and line number executed at the time of sample. The DWARF debugging info must be provided.

: • weight: Event specific weight, e.g. memory latency or transaction abort cost. This is the global weight.

: • local_weight: Local weight version of the weight above.

: • transaction: Transaction abort flags.

: • overhead: Overhead percentage of sample

: • overhead_sys: Overhead percentage of sample running in system mode

: • overhead_us: Overhead percentage of sample running in user mode

: • overhead_guest_sys: Overhead percentage of sample running in system mode on guest machine

: • overhead_guest_us: Overhead percentage of sample running in user mode on guest machine

: • sample: Number of sample

• period: Raw number of event count of sample

By default, comm, dso and symbol keys are used.
(i.e. --sort comm,dso,symbol)

If --branch-stack option is used, following sort keys are also
available:
dso_from, dso_to, symbol_from, symbol_to, mispredict.

: • dso_from: name of library or module branched from

: • dso_to: name of library or module branched to

: • symbol_from: name of function branched from

: • symbol_to: name of function branched to

: • mispredict: "N" for predicted branch, "Y" for mispredicted branch

: • in_tx: branch in TSX transaction

• abort: TSX transaction abort.

And default sort keys are changed to comm, dso_from, symbol_from, dso_to
and symbol_to, see '--branch-stack'.

-F, --fields=

Specify output field - multiple keys can be specified in CSV format. Following fields are available: overhead, overhead_sys, overhead_us, overhead_children, sample and period. Also it can contain any sort key(s).

By default, every sort keys not specified in -F will be appended
automatically.

If --mem-mode option is used, following sort keys are also available
(incompatible with --branch-stack):
symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.

: • symbol_daddr: name of data symbol being executed on at the time of sample

: • dso_daddr: name of library or module containing the data being executed on at the time of sample

: • locked: whether the bus was locked at the time of sample

: • tlb: type of tlb access for the data at the time of sample

: • mem: type of memory access for the data at the time of sample

: • snoop: type of snoop (if any) for the data at the time of sample

• dcacheline: the cacheline the data address is on at the time of sample

And default sort keys are changed to local_weight, mem, sym, dso,
symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.

-p, --parent=<regex>

: A regex filter to identify parent. The parent is a caller of this function and searched through the callchain, thus it requires callchain information recorded. The pattern is in the exteneded regex format and defaults to "^sys_|^do_page_fault", see --sort parent.

-x, --exclude-other

: Only display entries with parent-match.

-w, --column-widths=<width[,width...]>

: Force each column width to the provided list, for large terminal readability. 0 means no limit (default behavior).

-t, --field-separator=

: Use a special separator character and don't pad with spaces, replacing all occurrences of this separator in symbol names (and other output) with a . character, that thus it's the only non valid separator.

-D, --dump-raw-trace

: Dump raw trace in ASCII.

-g [type,min[,limit],order[,key]], --call-graph

Display call chains using type, min percent threshold, optional print limit and order. type can be either:

: • flat: single column, linear exposure of call chains.

: • graph: use a graph tree, displaying absolute overhead rates.

• fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object.

order can be either:
- callee: callee based call graph.
- caller: inverted caller based call graph.

key can be:
- function: compare on functions
- address: compare on individual code addresses

Default: fractal,0.5,callee,function.

--children

: Accumulate callchain of children to parent entry so that then can show up in the output. The output will have a new "Children" column and will be sorted on the data. It requires callchains are recorded.

--max-stack

Set the stack depth limit when parsing the callchain, anything beyond the specified depth will be ignored. This is a trade-off between information loss and faster processing especially for workloads that can have a very long callchain stack.

Default: 127

-G, --inverted

: alias for inverted caller based call graph.

--ignore-callees=<regex>

: Ignore callees of the function(s) matching the given regex. This has the effect of collecting the callers of each such function into one place in the call-graph tree.

--pretty=<key>

: Pretty printing style. key: normal, raw

--stdio

: Use the stdio interface.

--tui

: Use the TUI interface, that is integrated with annotate and allows zooming into DSOs or threads, among other features. Use of --tui requires a tty, if one is not present, as when piping to other commands, the stdio interface is used.

--gtk

: Use the GTK2 interface.

-k, --vmlinux=<file>

: vmlinux pathname

--kallsyms=<file>

: kallsyms pathname

-m, --modules

: Load module symbols. WARNING: This should only be used with -k and a LIVE kernel.

-f, --force

: Don't complain, do it.

--symfs=<directory>

: Look for files with symbols relative to this directory.

-C, --cpu

: Only report samples for the list of CPUs provided. Multiple CPUs can be provided as a comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. Default is to report samples on all CPUs.

-M, --disassembler-style=

: Set disassembler style for objdump.

--source

: Interleave source code with assembly code. Enabled by default, disable with --no-source.

--asm-raw

: Show raw instruction encoding of assembly instructions.

--show-total-period

: Show a column with the sum of periods.

-I, --show-info

: Display extended information about the perf.data file. This adds information which may be very large and thus may clutter the display. It currently includes: cpu and numa topology of the host system.

-b, --branch-stack

: Use the addresses of sampled taken branches instead of the instruction address to build the histograms. To generate meaningful output, the perf.data file must have been obtained using perf record -b or perf record --branch-filter xxx where xxx is a branch filter option. perf report is able to auto-detect whether a perf.data file contains branch stacks and it will automatically switch to the branch view mode, unless --no-branch-stack is used.

--objdump=<path>

: Path to objdump binary.

--group

: Show event group information together.

--demangle

: Demangle symbol names to human readable form. It's enabled by default, disable with --no-demangle.

--demangle-kernel

: Demangle kernel symbol names to human readable form (for C++ kernels).

--mem-mode

: Use the data addresses of samples in addition to instruction addresses to build the histograms. To generate meaningful output, the perf.data file must have been obtained using perf record -d -W and using a special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See perf mem for simpler access.

--percent-limit

: Do not show entries which have an overhead under that percent. (Default: 0).

--percentage

Determine how to display the overhead percentage of filtered entries. Filters can be applied by --comms, --dsos and/or --symbols options and Zoom operations on the TUI (thread, dso, etc).

"relative" means it's relative to filtered entries only so that the
sum of shown entries will be always 100%.  "absolute" means it retains
the original value before and after the filter is applied.

--header

: Show header information in the perf.data file. This includes various information like hostname, OS and perf version, cpu/mem info, perf command line, event list and so on. Currently only --stdio output supports this feature.

--header-only

: Show only perf.data header (forces --stdio).