async-profiler / async-profiler Goto Github PK

Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

Home Page: https://github.com/async-profiler/async-profiler/releases

License: Apache License 2.0

C++ 76.79% Makefile 0.74% Shell 0.52% C 4.89% Java 14.86% HTML 2.20%

async-profiler's Introduction

async-profiler

This project is a low overhead sampling profiler for Java that does not suffer from Safepoint bias problem. It features HotSpot-specific APIs to collect stack traces and to track memory allocations. The profiler works with OpenJDK and other Java runtimes based on the HotSpot JVM.

async-profiler can trace the following kinds of events:

CPU cycles
Hardware and Software performance counters like cache misses, branch misses, page faults, context switches etc.
Allocations in Java Heap
Contented lock attempts, including both Java object monitors and ReentrantLocks

See our Wiki or 3 hours playlist to learn about all features.

Download

Current release (3.0):

Linux x64: async-profiler-3.0-linux-x64.tar.gz
Linux arm64: async-profiler-3.0-linux-arm64.tar.gz
macOS x64/arm64: async-profiler-3.0-macos.zip
Converters between profile formats: converter.jar
(JFR to Flame Graph, JFR to pprof, collapsed stacks to Flame Graph)

Previous releases

async-profiler also comes bundled with IntelliJ IDEA Ultimate 2018.3 and later.
For more information refer to IntelliJ IDEA documentation.

Supported platforms

	Officially maintained builds	Other available ports
Linux	x64, arm64	x86, arm32, ppc64le, riscv64, loongarch64
macOS	x64, arm64

CPU profiling

In this mode profiler collects stack trace samples that include Java methods, native calls, JVM code and kernel functions.

The general approach is receiving call stacks generated by perf_events and matching them up with call stacks generated by AsyncGetCallTrace, in order to produce an accurate profile of both Java and native code. Additionally, async-profiler provides a workaround to recover stack traces in some corner cases where AsyncGetCallTrace fails.

This approach has the following advantages compared to using perf_events directly with a Java agent that translates addresses to Java method names:

Does not require -XX:+PreserveFramePointer, which introduces performance overhead that can be sometimes as high as 10%.
Does not require generating a map file for translating Java code addresses to method names.
Displays interpreter frames.
Does not produce large intermediate files (perf.data) for further processing in user space scripts.

If you wish to resolve frames within libjvm, the debug symbols are required.

ALLOCATION profiling

The profiler can be configured to collect call sites where the largest amount of heap memory is allocated.

async-profiler does not use intrusive techniques like bytecode instrumentation or expensive DTrace probes which have significant performance impact. It also does not affect Escape Analysis or prevent from JIT optimizations like allocation elimination. Only actual heap allocations are measured.

The profiler features TLAB-driven sampling. It relies on HotSpot-specific callbacks to receive two kinds of notifications:

when an object is allocated in a newly created TLAB;
when an object is allocated on a slow path outside TLAB.

Sampling interval can be adjusted with --alloc option. For example, --alloc 500k will take one sample after 500 KB of allocated space on average. Prior to JDK 11, intervals less than TLAB size will not take effect.

Installing Debug Symbols

Prior to JDK 11, the allocation profiler required HotSpot debug symbols. Some OpenJDK distributions (Amazon Corretto, Liberica JDK, Azul Zulu) already have them embedded in libjvm.so, other OpenJDK builds typically provide debug symbols in a separate package. For example, to install OpenJDK debug symbols on Debian / Ubuntu, run:

# apt install openjdk-17-dbg

(replace 17 with the desired version of JDK).

On CentOS, RHEL and some other RPM-based distributions, this could be done with debuginfo-install utility:

# debuginfo-install java-1.8.0-openjdk

On Gentoo the icedtea OpenJDK package can be built with the per-package setting FEATURES="nostrip" to retain symbols.

The gdb tool can be used to verify if debug symbols are properly installed for the libjvm library. For example, on Linux:

$ gdb $JAVA_HOME/lib/server/libjvm.so -ex 'info address UseG1GC'

This command's output will either contain Symbol "UseG1GC" is at 0xxxxx or No symbol "UseG1GC" in current context.

Wall-clock profiling

-e wall option tells async-profiler to sample all threads equally every given period of time regardless of thread status: Running, Sleeping or Blocked. For instance, this can be helpful when profiling application start-up time.

Wall-clock profiler is most useful in per-thread mode: -t.

Example: asprof -e wall -t -i 5ms -f result.html 8983

Java method profiling

-e ClassName.methodName option instruments the given Java method in order to record all invocations of this method with the stack traces.

Example: -e java.util.Properties.getProperty will profile all places where getProperty method is called from.

Only non-native Java methods are supported. To profile a native method, use hardware breakpoint event instead, e.g. -e Java_java_lang_Throwable_fillInStackTrace

Be aware that if you attach async-profiler at runtime, the first instrumentation of a non-native Java method may cause the deoptimization of all compiled methods. The subsequent instrumentation flushes only the dependent code.

The massive CodeCache flush doesn't occur if attaching async-profiler as an agent.

Here are some useful native methods that you may want to profile:

G1CollectedHeap::humongous_obj_allocate - trace humongous allocations of the G1 GC,
JVM_StartThread - trace creation of new Java threads,
Java_java_lang_ClassLoader_defineClass1 - trace class loading.

Building

Build status:

Make sure the JAVA_HOME environment variable points to your JDK installation, and then run make. GCC or Clang is required. After building, the profiler binaries will be in the build subdirectory.

Basic Usage

As of Linux 4.6, capturing kernel call stacks using perf_events from a non-root process requires setting two runtime variables. You can set them using sysctl or as follows:

# sysctl kernel.perf_event_paranoid=1
# sysctl kernel.kptr_restrict=0

async-profiler works in the context of the target Java application, i.e. it runs as an agent in the process being profiled. asprof is a tool to attach and control the agent.

A typical workflow would be to launch your Java application, attach the agent and start profiling, exercise your performance scenario, and then stop profiling. The agent's output, including the profiling results, will be displayed on the console where you've started asprof.

Example:

$ jps
9234 Jps
8983 Computey
$ asprof start 8983
$ asprof stop 8983

The following may be used in lieu of the pid (8983):

The keyword jps, which will use the most recently launched Java process.
The application name as it appears in the jps output: e.g. Computey

Alternatively, you may specify -d (duration) argument to profile the application for a fixed period of time with a single command.

$ asprof -d 30 8983

By default, the profiling frequency is 100Hz (every 10ms of CPU time). Here is a sample of the output printed to the Java application's terminal:

--- Execution profile ---
Total samples:           687
Unknown (native):        1 (0.15%)

--- 6790000000 (98.84%) ns, 679 samples
  [ 0] Primes.isPrime
  [ 1] Primes.primesThread
  [ 2] Primes.access$000
  [ 3] Primes$1.run
  [ 4] java.lang.Thread.run

... a lot of output omitted for brevity ...

          ns  percent  samples  top
  ----------  -------  -------  ---
  6790000000   98.84%      679  Primes.isPrime
    40000000    0.58%        4  __do_softirq

... more output omitted ...

This indicates that the hottest method was Primes.isPrime, and the hottest call stack leading to it comes from Primes.primesThread.

Launching as an Agent

If you need to profile some code as soon as the JVM starts up, instead of using the asprof, it is possible to attach async-profiler as an agent on the command line. For example:

$ java -agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=profile.html ...

Agent library is configured through the JVMTI argument interface. The format of the arguments string is described in the source code. asprof actually converts command line arguments to that format.

For instance, -e wall is converted to event=wall, -f profile.html is converted to file=profile.html, and so on. However, some arguments are processed directly by asprof. E.g. -d 5 results in 3 actions: attaching profiler agent with start command, sleeping for 5 seconds, and then attaching the agent again with stop command.

Multiple events

It is possible to profile CPU, allocations, and locks at the same time. Instead of CPU, you may choose any other execution event: wall-clock, perf event, tracepoint, Java method, etc.

The only output format that supports multiple events together is JFR. The recording will contain the following event types:

jdk.ExecutionSample
jdk.ObjectAllocationInNewTLAB (alloc)
jdk.ObjectAllocationOutsideTLAB (alloc)
jdk.JavaMonitorEnter (lock)
jdk.ThreadPark (lock)

To start profiling cpu + allocations + locks together, specify

asprof -e cpu,alloc,lock -f profile.jfr ...

or use --alloc and --lock parameters with the desired threshold:

asprof -e cpu --alloc 2m --lock 10ms -f profile.jfr ...

The same, when starting profiler as an agent:

-agentpath:/path/to/libasyncProfiler.so=start,event=cpu,alloc=2m,lock=10ms,file=profile.jfr

Flame Graph visualization

async-profiler provides out-of-the-box Flame Graph support. Specify -o flamegraph argument to dump profiling results as an interactive HTML Flame Graph. Also, Flame Graph output format will be chosen automatically if the target filename ends with .html.

$ jps
9234 Jps
8983 Computey
$ asprof -d 30 -f /tmp/flamegraph.html 8983

Profiler Options

asprof command-line options.

start - starts profiling in semi-automatic mode, i.e. profiler will run until stop command is explicitly called.
resume - starts or resumes earlier profiling session that has been stopped. All the collected data remains valid. The profiling options are not preserved between sessions, and should be specified again.
stop - stops profiling and prints the report.
dump - dump collected data without stopping profiling session.
check - check if the specified profiling event is available.
status - prints profiling status: whether profiler is active and for how long.
meminfo - prints used memory statistics.
list - show the list of profiling events available for the target process (if PID is specified) or for the default JVM.
-d N - the profiling duration, in seconds. If no start, resume, stop or status option is given, the profiler will run for the specified period of time and then automatically stop.
Example: asprof -d 30 8983
-e event - the profiling event: cpu, alloc, lock, cache-misses etc. Use list to see the complete list of available events.

In allocation profiling mode the top frame of every call trace is the class of the allocated object, and the counter is the heap pressure (the total size of allocated TLABs or objects outside TLAB).

In lock profiling mode the top frame is the class of lock/monitor, and the counter is number of nanoseconds it took to enter this lock/monitor.

Two special event types are supported on Linux: hardware breakpoints and kernel tracepoints:
- -e mem:<func>[:rwx] sets read/write/exec breakpoint at function <func>. The format of mem event is the same as in perf-record. Execution breakpoints can be also specified by the function name, e.g. -e malloc will trace all calls of native malloc function.
- -e trace:<id> sets a kernel tracepoint. It is possible to specify tracepoint symbolic name, e.g. -e syscalls:sys_enter_open will trace all open syscalls.
-i N - sets the profiling interval in nanoseconds or in other units, if N is followed by ms (for milliseconds), us (for microseconds), or s (for seconds). Only CPU active time is counted. No samples are collected while CPU is idle. The default is 10000000 (10ms).
Example: asprof -i 500us 8983
--alloc N - allocation profiling interval in bytes or in other units, if N is followed by k (kilobytes), m (megabytes), or g (gigabytes).
--live - retain allocation samples with live objects only (object that have not been collected by the end of profiling session). Useful for finding Java heap memory leaks.
--lock N - lock profiling threshold in nanoseconds (or other units). In lock profiling mode, record contended locks that the JVM has waited for longer than the specified duration.
-j N - sets the maximum stack depth. The default is 2048.
Example: asprof -j 30 8983
-t - profile threads separately. Each stack trace will end with a frame that denotes a single thread.
Example: asprof -t 8983
-s - print simple class names instead of FQN.
-n - normalize names of hidden classes / lambdas.
-g - print method signatures.
-a - annotate JIT compiled methods with _[j], inlined methods with _[i], interpreted methods with _[0] and C1 compiled methods with _[1].
-l - prepend library names to symbols, e.g. libjvm.so`JVM_DefineClassWithSource.
-o fmt - specifies what information to dump when profiling ends. fmt can be one of the following options:
- traces[=N] - dump call traces (at most N samples);
- flat[=N] - dump flat profile (top N hot methods);
  can be combined with traces, e.g. traces=200,flat=200
- jfr - dump events in Java Flight Recorder format readable by Java Mission Control. This does not require JDK commercial features to be enabled.
- collapsed - dump collapsed call traces in the format used by FlameGraph script. This is a collection of call stacks, where each line is a semicolon separated list of frames followed by a counter.
- flamegraph - produce Flame Graph in HTML format.
- tree - produce Call Tree in HTML format.
  --reverse option will generate backtrace view.
--total - count the total value of the collected metric instead of the number of samples, e.g. total allocation size.
--chunksize N, --chunktime N - approximate size and time limits for a single JFR chunk. A new chunk will be started whenever either limit is reached. The default chunksize is 100MB, and the default chunktime is 1 hour.
Example: asprof -f profile.jfr --chunksize 100m --chunktime 1h 8983
-I include, -X exclude - filter stack traces by the given pattern(s). -I defines the name pattern that must be present in the stack traces, while -X is the pattern that must not occur in any of stack traces in the output. -I and -X options can be specified multiple times. A pattern may begin or end with a star * that denotes any (possibly empty) sequence of characters.
Example: asprof -I 'Primes.*' -I 'java/*' -X '*Unsafe.park*' 8983
-L level - log level: debug, info, warn, error or none.
-F features - comma separated list of HotSpot-specific features to include in stack traces. Supported features are:
- vtable - display targets of megamorphic virtual calls as an extra frame on top of vtable stub or itable stub.
- comptask - display current compilation task (a Java method being compiled) in a JIT compiler stack trace.
--title TITLE, --minwidth PERCENT, --reverse - FlameGraph parameters.
Example: asprof -f profile.html --title "Sample CPU profile" --minwidth 0.5 8983
-f FILENAME - the file name to dump the profile information to.
%p in the file name is expanded to the PID of the target JVM;
%t - to the timestamp;
%n{MAX} - to the sequence number;
%{ENV} - to the value of the given environment variable.
Example: asprof -o collapsed -f /tmp/traces-%t.txt 8983
--loop TIME - run profiler in a loop (continuous profiling). The argument is either a clock time (hh:mm:ss) or a loop duration in seconds, minutes, hours, or days. Make sure the filename includes a timestamp pattern, or the output will be overwritten on each iteration.
Example: asprof --loop 1h -f /var/log/profile-%t.jfr 8983
--all-user - include only user-mode events. This option is helpful when kernel profiling is restricted by perf_event_paranoid settings.
--sched - group threads by Linux-specific scheduling policy: BATCH/IDLE/OTHER.
--cstack MODE - how to walk native frames (C stack). Possible modes are fp (Frame Pointer), dwarf (DWARF unwind info), lbr (Last Branch Record, available on Haswell since Linux 4.1), vm (HotSpot VM Structs) and no (do not collect C stack).

By default, C stack is shown in cpu, ctimer, wall-clock and perf-events profiles. Java-level events like alloc and lock collect only Java stack.
--signal NUM - use alternative signal for cpu or wall clock profiling. To change both signals, specify two numbers separated by a slash: --signal SIGCPU/SIGWALL.
--clock SOURCE - clock source for JFR timestamps: tsc (default) or monotonic (equivalent for CLOCK_MONOTONIC).
--begin function, --end function - automatically start/stop profiling when the specified native function is executed.
--ttsp - time-to-safepoint profiling. An alias for
--begin SafepointSynchronize::begin --end RuntimeService::record_safepoint_synchronized
It is not a separate event type, but rather a constraint. Whatever event type you choose (e.g. cpu or wall), the profiler will work as usual, except that only events between the safepoint request and the start of the VM operation will be recorded.
--jfropts OPTIONS - comma separated list of JFR recording options. Currently, the only available option is mem supported on Linux 3.17+. mem enables accumulating events in memory instead of flushing synchronously to a file.
--jfrsync CONFIG - start Java Flight Recording with the given configuration synchronously with the profiler. The output .jfr file will include all regular JFR events, except that execution samples will be obtained from async-profiler. This option implies -o jfr.
- CONFIG is a predefined JFR profile or a JFR configuration file (.jfc) or a list of JFR events started with +
Example: asprof -e cpu --jfrsync profile -f combined.jfr 8983
--fdtransfer - runs a background process that provides access to perf_events to an unprivileged process. --fdtransfer is useful for profiling a process in a container (which lacks access to perf_events) from the host. See Profiling Java in a container.
-v, --version - prints the version of profiler library. If PID is specified, gets the version of the library loaded into the given process.

Profiling Java in a container

It is possible to profile Java processes running in a Docker or LXC container both from within a container and from the host system.

When profiling from the host, pid should be the Java process ID in the host namespace. Use ps aux | grep java or docker top <container> to find the process ID.

async-profiler should be run from the host by a privileged user - it will automatically switch to the proper pid/mount namespace and change user credentials to match the target process. Also make sure that the target container can access libasyncProfiler.so by the same absolute path as on the host.

By default, Docker container restricts the access to perf_event_open syscall. There are 3 alternatives to allow profiling in a container:

You can modify the seccomp profile or disable it altogether with --security-opt seccomp=unconfined option. In addition, --cap-add SYS_ADMIN may be required.
You can use "fdtransfer": see the help for --fdtransfer.
Last, you may fall back to -e ctimer profiling mode, see Troubleshooting.

Restrictions/Limitations

macOS profiling is limited to user space code only.
On most Linux systems, perf_events captures call stacks with a maximum depth of 127 frames. On recent Linux kernels, this can be configured using sysctl kernel.perf_event_max_stack or by writing to the /proc/sys/kernel/perf_event_max_stack file.
Profiler allocates 8kB perf_event buffer for each thread of the target process. Make sure /proc/sys/kernel/perf_event_mlock_kb value is large enough (more than 8 * threads) when running under unprivileged user. Otherwise the message "perf_event mmap failed: Operation not permitted" will be printed, and no native stack traces will be collected.
You will not see the non-Java frames preceding the Java frames on the stack, unless --cstack vm is specified. For example, if start_thread called JavaMain and then your Java code started running, you will not see the first two frames in the resulting stack. On the other hand, you will see non-Java frames (user and kernel) invoked by your Java code.
No Java stacks will be collected if -XX:MaxJavaStackTraceDepth is zero or negative. The exception is --cstack vm mode, which does not take MaxJavaStackTraceDepth into account.
Too short profiling interval may cause continuous interruption of heavy system calls like clone(), so that it will never complete; see #97. The workaround is simply to increase the interval.
When agent is not loaded at JVM startup (by using -agentpath option) it is highly recommended to use -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints JVM flags. Without those flags the profiler will still work correctly but results might be less accurate. For example, without -XX:+DebugNonSafepoints there is a high chance that simple inlined methods will not appear in the profile. When the agent is attached at runtime, CompiledMethodLoad JVMTI event enables debug info, but only for methods compiled after attaching.

Troubleshooting

Failed to change credentials to match the target process: Operation not permitted

Due to limitation of HotSpot Dynamic Attach mechanism, the profiler must be run by exactly the same user (and group) as the owner of target JVM process. If profiler is run by a different user, it will try to automatically change current user and group. This will likely succeed for root, but not for other users, resulting in the above error.

Could not start attach mechanism: No such file or directory

The profiler cannot establish communication with the target JVM through UNIX domain socket.

Usually this happens in one of the following cases:

Attach socket /tmp/.java_pidNNN has been deleted. It is a common practice to clean /tmp automatically with some scheduled script. Configure the cleanup software to exclude .java_pid* files from deletion.
How to check: run lsof -p PID | grep java_pid
If it lists a socket file, but the file does not exist, then this is exactly the described problem.
JVM is started with -XX:+DisableAttachMechanism option.
/tmp directory of Java process is not physically the same directory as /tmp of your shell, because Java is running in a container or in chroot environment. jattach attempts to solve this automatically, but it might lack the required permissions to do so.
Check strace build/jattach PID properties
JVM is busy and cannot reach a safepoint. For instance, JVM is in the middle of long-running garbage collection.
How to check: run kill -3 PID. Healthy JVM process should print a thread dump and heap info in its console.

Target JVM failed to load libasyncProfiler.so

The connection with the target JVM has been established, but JVM is unable to load profiler shared library. Make sure the user of JVM process has permissions to access libasyncProfiler.so by exactly the same absolute path. For more information see #78.

No access to perf events. Try --fdtransfer or --all-user option or 'sysctl kernel.perf_event_paranoid=1'

Perf events unavailable

perf_event_open() syscall has failed.

Typical reasons include:

/proc/sys/kernel/perf_event_paranoid is set to restricted mode (>=2).
seccomp disables perf_event_open API in a container.
OS runs under a hypervisor that does not virtualize performance counters.
perf_event_open API is not supported on this system, e.g. WSL.

For permissions-related reasons (such as 1 and 2), using --fdtransfer while running the profiler as a privileged user may solve the issue.

If changing the configuration is not possible, you may fall back to -e ctimer profiling mode. It is similar to cpu mode, but does not require perf_events support. As a drawback, there will be no kernel stack traces.

No AllocTracer symbols found. Are JDK debug symbols installed?

The OpenJDK debug symbols are required for allocation profiling. See Installing Debug Symbols for more details. If the error message persists after a successful installation of the debug symbols, it is possible that the JDK was upgraded when installing the debug symbols. In this case, profiling any Java process which had started prior to the installation will continue to display this message, since the process had loaded the older version of the JDK which lacked debug symbols. Restarting the affected Java processes should resolve the issue.

VMStructs unavailable. Unsupported JVM?

JVM shared library does not export gHotSpotVMStructs* symbols - apparently this is not a HotSpot JVM. Sometimes the same message can be also caused by an incorrectly built JDK (see #218). In these cases installing JDK debug symbols may solve the problem.

Could not parse symbols from <libname.so>

Async-profiler was unable to parse non-Java function names because of the corrupted contents in /proc/[pid]/maps. The problem is known to occur in a container when running Ubuntu with Linux kernel 5.x. This is the OS bug, see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1843018.

Could not open output file

Output file is written by the target JVM process, not by the profiler script. Make sure the path specified in -f option is correct and is accessible by the JVM.

async-profiler's People

Contributors

Stargazers

Watchers

Forkers

khrupalyk kofemann serkan-ozal xsires goldshtn drozd0 gregggreggreg incubos sjanulonoks isalnikov fuud yongjianxu huangyoje innorules taskx sattishv rajeshbalamohan kowalczykbartek cookingcodewithme retronym goryszewskig cheremin fallinsky sergey-nbl nitsanw ismlsmile chinamerp ikalachy vsarunas fdesu apoorv007 stincmale dponomarenko luliu27 sosojustdo duanshuaimin zhaiyuyong feng-tao binlijin ricordo dchuyko zouzhirong hackbaby drupanchal inikolaev mihn getsudhagar himanshpal ngammoudi georgecao epickrram alexspeter jeroentervoorde xinglongbing icebloom lixiaojiee nishantmonu51 caojiguang alanfx linpingchuan yuanyuanlove timhartley fengjiachun tfedyanin taratutastan hl123123 cbin07 fengfu ashwin-n leventov cxz peterwoj qianqianbigzhang sammso suhuaguo rpulle jesperpedersen olehtarapata dsyer wonechao chenk008 ibuystuff nero520 yajxh qianqian20170929 sergei-doroshenko kirilltim suqcnn zhangweixin tangpeng18 alvison saranganj mylovetop hefei1986 zhsker warrior701 hanqian200705 tinswzy turbanoff archerny

async-profiler's Issues

Parse external debug symbols

Migrate Travis CI to build the new repository

Compile error on alpine linux

Compilation fails with this error:

g++ -O2 -I/opt/jdk/include -I/opt/jdk/include/linux -fPIC -shared -o build/libasyncProfiler.so src/*.cpp -ldl -lpthread
In file included from /usr/include/limits.h:8:0,
from /usr/include/sys/param.h:33,
from src/profiler.cpp:23:
src/arch.h:23:21: error: expected unqualified-id before numeric constant
const unsigned long PAGE_SIZE = 4096;
^
make: *** [Makefile:34: build/libasyncProfiler.so] Error 1

Page size is defined in /usr/include/bits/limits.h and set to 4096.

This fixes it:
#85

CI build

Would we want to integrate with some CI build solution so that we can 1) verify pull requests automatically before merging, and 2) produce binary packages for Debian/RH?

ReentrantLock contention

Hi guys,

First on above: thanks for this awesome project and for the hard work on it!!!

Now my question: would be possible to have a contention profiling for Java's ReentrantLocks similar to the -e lock option?

I was looking http://www.brendangregg.com/offcpuanalysis.html to find something similar, given that under contention ReentrantLock uses Unsafe::park/unpark, but having it on the async-profiler toolkit would be awesome.
If there is something I could help to make it possible I would be very happy to contribute 👍

Thanks,
Franz

"Failed to inject profiler into <pid>"

I'm running the following command:

$ ./profiler.sh list 5015

And am getting the following output:

Failed to inject profiler into 5015
	linux-vdso.so.1 (0x00007ffd4ddcb000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f62c2080000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f62c1e63000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f62c1ae1000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f62c17dd000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f62c15c6000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f62c1227000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f62c291a000)

The process I'm trying to attach to is running:

$ java -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

The OS is:

$ uname -a
Linux ip-<snip> 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux

Profiler doesn't work when MaxJavaStackTraceDepth set to -1

When I run my application with -XX:MaxJavaStackTraceDepth=-1 the profiler doesn't print any stack trace. The output consists of the following.

--- Execution profile ---
>Total samples:         38441
>Non-Java:              3681 (9.58%)
>GC active:             490 (1.27%)
>Unknown (native):      21475 (55.86%)
>Not walkable (native): 9409 (24.48%)
>Unknown (Java):        838 (2.18%)
>Not walkable (Java):   2533 (6.59%)
>Deopt:                 15 (0.04%)
>
>Frame buffer usage:    0%

Removing the option solves the problem. If it's expected behavior, it'd be great to document it in README.

Could not start attach mechanism: No such file or directory

sudo ./profiler.sh -d 15 2381

Could not start attach mechanism: No such file or directory

Cannot figure out what's wrong.

Automatically stop profiling after a specified duration

Reintroduce the "duration:N" parameter support in the profiler by having it detect that the specified number of seconds has elapsed and stop automatically. This requires stopping the profiler from a separate thread, or after returning from the signal handler, because it is not safe to stop the profiler from the signal handler that detects the timeout.

Perhaps we should consider leaving this up to the user, and removing the "duration" parameter altogether? Alternatively, the helper script could accept a duration parameter and do this automatically:

$ ./profiler.sh -p 8983 -a start -d 10      # duration: 10 seconds, script stops automatically

Implement stack walking without AsyncGetCallTrace

There is a number of problems with AsyncGetCallTrace API.

The idea is to implement Java stack walking on our own without relying on AGCT. Since the agent runs in the context of JVM, it can access VM structures, especially those exported through VMStructs. It should be possible to replicate stack walking logic of the VM inside async-profiler, though it might be challenging. The main risk is that differrent versions of JVM may have different stack layout, but VMStructs along with special handling of the known versions is likely to help.

Automatic tests

I'd like to tackle some automatic tests for the agent, so that we have a better assurance nothing is being broken. While ideally I'd like to have unit tests for the various components (e.g. the symbol parsing, the code cache, the perf_events layer etc.), I think a good start would be to just do a black-box test that verifies we haven't broken anything important.

To that end, I propose implementing a small Java application (no dependencies outside of JDK), which the test will build, run, attach the profiler, start profiling, dump traces to a file, and verify that we are getting something sensible. Not a line-by-line diff, but just that symbols are being resolved, and that we're getting the obviously-hottest methods in the trace. This can also serve as documentation, in a way.

Add "info" command

Please to add "jattach.sh -p XXXX -i" - print info (profile was (not)attached and any other information).

Compilation warning

mkdir -p build
g++ -O2 -I/usr/lib/jvm/java-7-oracle/include -I/usr/lib/jvm/java-7-oracle/include/linux -fPIC -shared -o build/libasyncProfiler.so -Wl,-soname,libasyncProfiler.so src/*.cpp
src/asyncProfiler.cpp: In member function 'void Profiler::recordSample(void*)':
src/asyncProfiler.cpp:166:52: warning: ignoring return value of 'ssize_t write(int, const void*, size_t)', declared with attribute warn_unused_result [-Wunused-result]
         write(STDOUT_FILENO, ERROR, sizeof (ERROR));

Profiler options (-o, -t) are "persisted" between profiler runs (pairs of attach-stop)

async-profiler version
8b2fd87

Description
Profiler options (at least -o and -t) are "persisted" between profiler runs (pairs of attach-stop). As a result, one can not disable them once they have been set.

Steps to reproduce

Start or restart a Java process that will be profiled.
Run profiler.sh -d 5 -f prof.txt -o summary,traces=1 -t 1234.
Try turning off -t option and disable traces: profiler.sh -d 5 -f prof.txt -o summary 1234.
Observe that profiler's output still works as if options traces=1 -t have been set.

Workaround
Restart the Java process that is being profiled. This means that the profiler options are either stored in the memory of the process, or in a temporary file that is deleted once the process quits.

Allow redirecting all output to a file

Currently, only the raw stack dumps can be redirected to a file. The rest of the output goes to the Java program's stdout, which can be undesired in case that output is parsed or redirected in the first place.

I propose another agent parameter: file:<path-to-file>, which can be appended to the start command. For example: jattach ... file:/tmp/profile.log,interval:1000000,start.

JVM Frames seem to appear only as "softnet_data"

I am using the Oralce JDK/ServerJRE tar.gz distributions from java.oracle.com to run my applications on SLES11 SP4. When I profile my applications, everything works fine, except that all frames withing the JVM seem to appear as "softnet_data" in the profiler output. Some examples (collapsed format):

# Application code that ends up in Arrays.hashCode()
<some application methods>;java/util/Objects.hash;java/util/Arrays.hashCode;java/util/AbstractList.hashCode;java/util/AbstractList$Itr.next;java/util/Arrays$ArrayList.get;softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k] 8

# Application code that writes to a file
<some application methods>;java/io/BufferedWriter.flush;java/io/OutputStreamWriter.flush;sun/nio/cs/StreamEncoder.flush;sun/nio/cs/StreamEncoder.implFlush;sun/nio/cs/StreamEncoder.implFlushBuffer;sun/nio/cs/StreamEncoder.writeBytes;java/io/FileOutputStream.write;java/io/FileOutputStream.writeBytes;softnet_data_[k];softnet_data_[k];softnet_data_[k];system_call_fastpath_[k];sys_write_[k];vfs_write_[k];do_sync_write_[k];generic_file_aio_write_[k];__generic_file_aio_write_[k];generic_file_buffered_write_[k];generic_perform_write_[k];ext3_write_begin   [ext3]_[k];__block_write_begin_[k];ext3_get_block       [ext3]_[k];ext3_get_blocks_handle       [ext3]_[k];ext3_alloc_branch    [ext3]_[k];ext3_new_blocks      [ext3]_[k];__ext3_journal_get_write_access      [ext3]_[k];journal_get_write_access     [jbd]_[k];journal_put_journal_head      [jbd]_[k] 8

# Application code doing reflection. Ends up in String.intern()
<some application methods>;org/springframework/aop/support/AopUtils.getMostSpecificMethod;org/springframework/util/ClassUtils.getMostSpecificMethod;java/lang/Class.getMethod;java/lang/Class.getMethod0;java/lang/Class.privateGetMethodRecursive;java/lang/Class.getMethod0;java/lang/Class.privateGetMethodRecursive;java/lang/Class.searchMethods;java/lang/String.intern;softnet_data_[k];softnet_data_[k];softnet_data_[k] 88
...

# Probably GC or JIT
softnet_data_[k];softnet_data_[k];[unknown];softnet_data_[k];softnet_data_[k];system_call_fastpath_[k];sys_futex_[k];do_futex_[k];futex_wait_[k];futex_wait_queue_me_[k];__hrtimer_start_range_ns_[k];_raw_spin_unlock_irqrestore_[k] 209
softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k] 6
...

Could the installation location of the JDK/Server-JRE be the problem? We just unzip the distro into a directory on our server, e.g. /app/installation/java/jdk1.8.0_144/.

Wall-clock profiling

It may be helpful to measure not only the time that method spent consuming the CPU but also the time method was waiting for lock or I/O. Please implement 'wall clock' profiling mode.

Alloc smoke test fails with java9

It seems with Java 9 the heap profile is missing stacktraces of where objects are allocated. Is this expected behavior? I can't share anything, but the alloc-smoke-test.sh file fails on my CentOS machine using Java9

Java -version output:

java version "9"
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)

Support Allocation profiler on JDK 7

Currently Allocation profiler works only on JDK 8+.
The problem is KlassHandle argument in AllocTracert functions.

In JDK 8 KlassHandle is Klass*
In JDK 7 KlassHandle is a pointer to klassOop which has one more level of indirection

It seems that huge amount of sampling is omited from dump (and flame graph)

Hello,

I am profiling Cassandra.
At the end of profiling I get:

--- Execution profile ---
Total:                 230393
Non-Java:              26372 (11.45%)
GC active:             62 (0.03%)
Unknown (native):      83399 (36.20%)
Not walkable (native): 6 (0.00%)
Unknown (Java):        1016 (0.44%)
Not walkable (Java):   7 (0.00%)
Deopt:                 322 (0.14%)

As I can understand, ~50% of perf events was captured at points where asyncGetCallTrace can not reconstruct Stack Traces. Am I right?
And this events does not included in Flame Graph. May be them should be included as "unknown" method to indicate that signigicant amount of CPU usage can not be recognized.

Thanks.

An option to profile threads separately

It is suggested to add -t option which will add an extra frame with a thread ID to the bottom of each stack trace. For Java threads a thread name can be printed instead of thread ID.

Snapshot publishing?

First of all, thanks a ton for this profiler - I have been playing with it at $work and it seems to produce quite reasonable results. We're looking into ways to adopt it in our simulation runs (to identify performance problems earlier).

The problem I'm facing has probably nothing to do with the project itself - I'm quite terrible at building native packages. Is there a way to publish snapshot on regular basis (I'm not sure how this supposed to be distributed through)? I'd probably be just as happy with a new release containing binaries for Linux.

Report for number of allocations, in addition to aggregate size

Ideally, both would be recorded and available to the output formatter.

A new command line option -a size|count could control which counter the current formatters choose to report.

More advanced formatters/visualization tools could display both.

Measure and document overhead

The overhead of the profiler should be measured and documented. We should test CPU-bound code as well as I/O-bound code, with varying stack depths (up to 1000s of frames).

Consider moving async-profiler repo under the jvm-profiling-tools

This has been setup recently to host JVM profiling tools under an org rather than personal repos and thus encourage more community investment.
For your consideration.

Some JVM processes return -1 on attach attempt

I cannot attach to some JVM processes.
I made profiler.sh output more info:

    collect)
        $JATTACH $PID load $PROFILER true start,file=$FILE$INTERVAL$FRAMEBUF #> /dev/null
        if [ $? -ne 0 ]; then
            exit 1
        fi

Got this:

~/async-profiler $ sudo -u xncore ./profiler.sh -d 10 3515
Connected to remote JVM
Response code = -1

Is there a way to debug it further?

Marry async-profiler and JMH

It would be nice to filter async-profiler's output when profiling JMH runs to include only traces from the benchmarked methods, excluding e. g. @Setup and @Teardown.

Can't build project

Hi!
I can't compile project on CentOS release 6.6
g++ version:

g++ -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)

error:

g++ -O2 -I/opt/jdk/jdk1.7.0_75//include -I/opt/jdk/jdk1.7.0_75//include/linux -fPIC -shared -o build/libasyncProfiler.so src/*.cpp -ldl -lpthread
In file included from src/profiler.h:22,
                 from src/javaApi.cpp:18:
src/spinLock.h: In member function ‘void SpinLock::spinLock()’:
src/spinLock.h:39: error: ‘__builtin_ia32_pause’ was not declared in this scope
In file included from src/perfEvent.h:23,
                 from src/perfEvent.cpp:25:
src/spinLock.h: In member function ‘void SpinLock::spinLock()’:
src/spinLock.h:39: error: ‘__builtin_ia32_pause’ was not declared in this scope
In file included from src/profiler.h:22,
                 from src/profiler.cpp:24:
src/spinLock.h: In member function ‘void SpinLock::spinLock()’:
src/spinLock.h:39: error: ‘__builtin_ia32_pause’ was not declared in this scope
In file included from src/profiler.h:22,
                 from src/vmEntry.cpp:22:
src/spinLock.h: In member function ‘void SpinLock::spinLock()’:
src/spinLock.h:39: error: ‘__builtin_ia32_pause’ was not declared in this scope
make: *** [build/libasyncProfiler.so] Error 1

linux binaries in 1.1 release seems not to support svg format

I download binaries from here: https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.1/async-profiler-1.1-linux-x64.zip

After some trial and error I got to this:

Unrecognized option: --title
Usage: ./profiler.sh [action] [options]
Actions:
start start profiling and return immediately
stop stop profiling
status print profiling status
list list profiling events supported by the target JVM
collect collect profile for the specified period of time
and then stop (default action)
Options:
-e event profiling event: cpu|alloc|lock|cache-misses etc.
-d duration run profiling for seconds
-f filename dump output to
-i interval sampling interval in nanoseconds
-b bufsize frame buffer size
-t profile different threads separately
-o fmt[,fmt...] output format: summary|traces|flat|collapsed

is a numeric process ID of the target JVM
or 'jps' keyword to find running JVM automatically using jps tool

Example: ./profiler.sh -d 30 -f profile.fg -o collapsed 3456
./profiler.sh start -i 999000 jps
./profiler.sh stop -o summary,flat jps

note svg format is not there.

(time to) safepoint profiler

Profile safepoints / time to safepoint with perf uprobes.
See https://groups.google.com/forum/#!topic/mechanical-sympathy/tepoA7PRFRU

I forgot to say in the message that the cpu burned by the not-stopping thread (here 17998) can be sampled to get an accurate picture of why was preventing it from reaching safepoint. The final 'block' event will likely provide a good idea but the cpu samples data should avoid any guessing.

Accuracy of heap profile

I've got a profile where 90%+ of allocations are object array allocations in one method which does not do any allocations in bytecode. What kind of inaccuracies are expected when doing heap profiling? Could it be allocated in inlined methods which are not shown?

async-profiler causes SIGSEGV nmethod::get_deopt_original_pc

Environment

async-profiler version
1063a82
java -version

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

uname -a
Linux app1-us-va.blabla.com 3.13.0-101-generic #148-Ubuntu SMP Thu Oct 20 22:08:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Description
We are repeatedly launching async-profiler on one of our production servers as follows:
profiler.sh -d 600 -i 1000000 -f prof.txt -o summary,traces=200,flat=200,collapsed=samples -e cpu -b 5000000 <pid>
And it always eventually (in about 1-2 hours) leads to the same fatal error (for full log see https://gist.github.com/stIncMale/802da958cb928758029e4567b7e943cd):

#  SIGSEGV (0xb) at pc=0x00007f142aaf166e, pid=5951, tid=139719484446464
#
# JRE version: Java(TM) SE Runtime Environment (8.0_25-b17) (build 1.8.0_25-b17)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x8ab66e]  nmethod::get_deopt_original_pc(frame const*)+0x6e

Allocation profiling reports in sample unit other than bytes

To test I used a benchmark workload and compared JMC reporting to AP, benchmark code:

public class AllocationProfilingBench
{
    static class A {} // 16b
    static class B extends A {long f1,f2;} // 32b
    static class C extends B {long g1,g2,g3,g4;} // 64b

    A newA() { return new A();}
    B newB() { return new B();}
    C newC() { return new C();}

    @Benchmark
    public Object allocate()
    {
        Object result;
        double rnd = ThreadLocalRandom.current().nextDouble();
        if (rnd < 0.333)
            result = newA();
        else if (rnd < 0.666)
            result = newB();
        else
            result = newC();
        return result;
    }
}

In JMC I see the following breakdown:

Class				Size		Pressure(%)
AllocationProfilingBench$C	64b		57%
AllocationProfilingBench$B	32b		28%
AllocationProfilingBench$A	16b		14%

This is as expected. In AP I would expect an output in the object size multiple, but I'm getting this (trimmed the stacks):

AllocationProfilingBench$A_[k] 89
AllocationProfilingBench$A_[i] 5368
AllocationProfilingBench$B_[k] 213
AllocationProfilingBench$B_[i] 10599
AllocationProfilingBench$C_[k] 422
AllocationProfilingBench$C_[i] 20001

I assume the _[i] are in TLAB and _[k] are out of TLAB (nice touch, I'll add a wiki page on alloc profiling to give some details). The ratios look right, but the quantities reported are not in the unit I expected.

Dumping profiles can take a long time

First of, I would thank you for creating this great tool. It is really invaluable for diagnosis JVM performance issues.

When I run the profiler for a long time writing a collapsed profile can take a long time. For example a 30 second run can create a file of ~150 MB which takes 10 min to write. Which is kinda slow (250 kb/s). I did take a look at the source, and cannot find any obvious problem with it (no excessive flushing).

I am wondering if this can be sped up. A viable alternative solution (for me) would be to dump the traces and the method names to disk and to build the collapsed profiles (or any other profile) later on. That should relatively compact and should easy to dump to disk.

Collapsed output format could better represent missing stacktraces

When first using this tool I was confused about the "Ignored .... lines with invalid format" error reported by the flamegraph script.

Inspection of the generated file showed lines with just a frequency and no stacktrace.

Running without -o collapsed -f ... revealed that the framebuffer was undersized. Increasing its size fixed the problem.

Usability suggestions:

Render <unknown>, with a reason if known, as the stacktrace in the output format
Refer back tothe -b option in the Troubleshooting section of the README, describing what to do when <unknown> traces are reported
Perhaps explicitly mention to run -o summary as a troubleshooting step for people like me who dived straight into flamegraph generation.

Rare deadlock in safepoint code during allocation profiling

I've seen the following assertion once:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (safepoint.cpp:687), pid=1919, tid=0x00007f385cb40700
#  fatal error: Deadlock in safepoint code.  Should have called back to the VM before blocking.
#
# JRE version: Java(TM) SE Runtime Environment (8.0_152-b16) (build 1.8.0_152-b16)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.152-b16 mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x00007f385400a000):  JavaThread "main" [_thread_in_native_trans, id=1920, stack(0x00007f385ca40000,0x00007f385cb41000)]

Stack: [0x00007f385ca40000,0x00007f385cb41000],  sp=0x00007f385cb3e3e0,  free space=1016k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xacf62a]  VMError::report_and_die()+0x2ba
V  [libjvm.so+0x500479]  report_fatal(char const*, int, char const*)+0x59
V  [libjvm.so+0x9c4b38]  SafepointSynchronize::block(JavaThread*)+0x268
V  [libjvm.so+0xa7a72d]  JavaThread::check_safepoint_and_suspend_for_native_trans(JavaThread*)+0x2ad
V  [libjvm.so+0x74795a]  jvmti_GetStackTrace+0xfa
C  [libasyncProfiler.so+0xa210]  Profiler::getJavaTraceJVMTI(_jvmtiFrameInfo*, ASGCT_CallFrame*, int)+0x40

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 131  java.lang.Object.clone()Ljava/lang/Object; (0 bytes) @ 0x00007f38451321f8 [0x00007f38451321c0+0x38]
J 178 C1 sun.util.calendar.ZoneInfo.clone()Ljava/lang/Object; (15 bytes) @ 0x00007f384514145c [0x00007f3845141420+0x3c]
J 228% C1 Allocate.main([Ljava/lang/String;)V (153 bytes) @ 0x00007f384515e724 [0x00007f384515e160+0x5c4]
v  ~StubRoutines::call_stub

The problem is understood: it happens because JVMTI GetStackTrace is called from VM context. Need to switch thread state before calling JVMTI functions. To fix this correctly I need a reliable reproducer. I wasn't able to repeat the problem so far.

Option to elide package names from output, etc.

jfr-flamegraph has a convenient option:

             -s, --use-simple-names
              Use simple names instead of qualified names in the stack
              Default: false

This reduces the size of dumped CSV files and the resulting SVG files, and makes the flamegraph easier to read (at the cost of losing some accuracy).

One can post process the files with a regex to achieve this, but now that async-profiler supports direct flamegraph output, it would be handy to support it directly.

Java Mission Control has further options to control how stack trace elements are grouped/displayed:

Simple vs FQN name for the method owner, as above
show/hide method parameters
show/hide line number
collapse or separate interpreted vs compiled frames.

perf-map-agent displays inlined vs non-inlined frames as distinct (which lets you get a nicer visualization of inlining that -XX:PrintInlining)

Oracle Performance Studio lets you group the profiles by line-number, byte-code-index, or assembly PC address.

Where do these features lie on the spectrum of "you're planning to implement it" vs "you'd accept a contribution" vs "out of scope")?

Flame Graph visualization doesn't work out of box

When I execute ./profiler.sh -d 10 -f /tmp/flamegraph.svg {precess id}
/tmp/flamegraph.svg contains following text:
`--- Execution profile ---
Total samples: 17
Non-Java: 16 (94.12%)
Unknown (native): 1 (5.88%)

Frame buffer usage: 0%
`

Expected: file containing SVG data.

Any plans to support macOS at least for heap-profiling ?

Not sure if this is the right place for asking but I was wondering if you are considering adding support for heap profiling on macOS or will this stay linux only ?

Licensing and distribution question

Hi, I'd like to create a Clojure wrapper library around async-profiler so that the profiler can be used from within the JVM process itself. This would require me to package the pre-built libAsyncprofiler.so file into a JAR, later to be deployed to https://clojars.org/. The wrapper itself is going to be open-source and hosted on Github, and it goes without saying that I will provide links and references to the original async-profiler library wherever necessary. My questions are:

Am I allowed to do that at all?
What license should I pick for that project then? Most Clojure projects are licensed under EPL, can I do that, or should I stick to Apache-2.0?

Incorrect behavior when JVM exits before profiling duration elapsed

Hi! Thank you for an awesome tool, it is a pleasure to use and has been very helpful to discover some subtle bugs in my code :)

I noticed a few minor issues when using time-capped collect command. All of them happen, when JVM exits before the configured duration of profiling session elapses:

profiler does not exit immediately after JVM, but remains alive until the time elapses
upon exiting the profiler prints Could not start attach mechanism: No such file or directory (I've commented redirection of output to /dev/null for debug purposes)
if an output file was specified, then it contains only one line: CPU profiling started

Cheers,
Andrei

Update ReadMe.md with Example about "How to use this profile on remote JVM"

Steps I follow:

download
compile
Write Wrap class which call throw JNI start and stop profiler method.
Run java application
invoke wrap class methods
See in output stream summary of profilling

How can I do this for remote jvm?

"-o flamegraph" should be "-o collapsed"

The flamegraphs script takes collapsed stacks as input. The output of async-profiler is the collapsed stacks data, not the flame graph SVG. The collapsed stacks format is a common summary profile that can be presented many different ways.
In future it should be possible to feed collapsed stacks data to other views/tools, see for instance jvm-profiling-tools/honest-profiler#199

comparison with Honest Profiler

I was wondering if there is any documented comparison of Honest Profiler vs. Async Profiler. I can see from the README that Async Profiler offers some features like heap profiling that are not present in Honest Profiler. OTOH, Honest Profiler has a GUI interface that I don't see in Async Profiler. It would be awesome if there were a high-level comparison in the README somewhere, indicating when each tool is most appropriate, differences in approach, etc. I think many users will have this question, given that both tools live in the jvm-profiling-tools GitHub namespace.

Thanks for all the awesome work! Very excited to try out Async Profiler on our code base.

Make FlameGraph parameters configurable

Add command-line option to tune certain FlameGraph attributes:

imagewidth
frameheight
minwidth
etc.

Revise usage of jmethodID to distinguish between Java methods and native symbols

jmethodID has ambiguous meaning in the profiler: it can denote either Java method or a native symbol. Currently this type is encoded in the highest bit of jmethodID pointer. This works well on 64-bit Linux, but may not be appropriate for 32-bit platforms (e.g. linux-arm).

additional view

Hello!

Nice work!

In addition to cmdline and flamegraph you might consider using this as view generator: https://github.com/01org/IntelSEAPI/wiki

It supports JIT symbols, stacks and many more things.

Please contact me if interested.

Hard crash (core dump) while dumping traces

I have a fairly reproducible hard crash when dumping traces. This happened as I was working on an overhead testing script. To try and reproduce, clone the goldshtn:overhead branch from my repo, build the agent, and then run overhead-test.sh from the test directory. After a couple of lines, I get a hard crash. Some basic analysis:

SIGSEGV in "Attach Listener" thread
Crash address is in strlen called by vsnprintf
After building with -g and looking at the core file in gdb, here's the full stack trace:

$ gdb java -c core.25942
GNU gdb (GDB) Fedora 7.12.1-47.fc25
...
Core was generated by `java Busy 1 1100 7'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
58	}
[Current thread is 1 (Thread 0x7f5e0c7f7700 (LWP 25958))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1  0x00007f5e2333951a in __GI_abort () at abort.c:89
#2  0x00007f5e22be74d9 in os::abort (dump_core=<optimized out>) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1500
#3  0x00007f5e22e12968 in VMError::report_and_die (this=this@entry=0x7f5e0c7f3bd0) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060
#4  0x00007f5e22bf09b5 in JVM_handle_linux_signal (sig=sig@entry=11, info=info@entry=0x7f5e0c7f3e70, ucVoid=ucVoid@entry=0x7f5e0c7f3d40, abort_if_unrecognized=abort_if_unrecognized@entry=1)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:556
#5  0x00007f5e22be4758 in signalHandler (sig=11, info=0x7f5e0c7f3e70, uc=0x7f5e0c7f3d40) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:4341
#6  <signal handler called>
#7  strlen () at ../sysdeps/x86_64/strlen.S:106
#8  0x00007f5e23351ad1 in _IO_vfprintf_internal (s=s@entry=0x7f5e0c7f4840, format=<optimized out>, format@entry=0x7f5e0c2ab75a "%s.%s", ap=ap@entry=0x7f5e0c7f49a8) at vfprintf.c:1637
#9  0x00007f5e2337a489 in _IO_vsnprintf (string=0x7f5e0c7f4af0 ".usy.eatCpu", maxlen=<optimized out>, format=0x7f5e0c2ab75a "%s.%s", args=args@entry=0x7f5e0c7f49a8) at vsnprintf.c:114
#10 0x00007f5e233587b2 in __snprintf (s=s@entry=0x7f5e0c7f4af0 ".usy.eatCpu", maxlen=maxlen@entry=520, format=format@entry=0x7f5e0c2ab75a "%s.%s") at snprintf.c:33
#11 0x00007f5e0c2a6fc0 in MethodName::MethodName (this=0x7f5e0c7f4af0, method=<optimized out>, dotted=<optimized out>) at src/codeCache.cpp:54
#12 0x00007f5e0c2a94f1 in Profiler::dumpTraces (this=this@entry=0x7f5e0c4ae480 <Profiler::_instance>, out=..., max_traces=max_traces@entry=500) at src/profiler.cpp:385
#13 0x00007f5e0c2ab1e0 in Agent_OnAttach (vm=<optimized out>, options=<optimized out>, reserved=<optimized out>) at src/vmEntry.cpp:174
#14 0x00007f5e22a554ac in JvmtiExport::load_agent_library (op=0x7f5df0039e90, st=0x7f5e0c7f6d30) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/share/vm/prims/jvmtiExport.cpp:2275
#15 0x00007f5e225fba78 in attach_listener_thread_entry (thread=<optimized out>, __the_thread__=<optimized out>) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/share/vm/services/attachListener.cpp:454
#16 0x00007f5e22db6bfb in JavaThread::thread_main_inner (this=this@entry=0x7f5df4001000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:1710
#17 0x00007f5e22db70df in JavaThread::run (this=0x7f5df4001000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:1690
#18 0x00007f5e22be6492 in java_start (thread=0x7f5df4001000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:782
#19 0x00007f5e23cf86ca in start_thread (arg=0x7f5e0c7f7700) at pthread_create.c:333
#20 0x00007f5e23409f7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
(gdb) frame 7
#7  strlen () at ../sysdeps/x86_64/strlen.S:106
106		movdqu	(%rax), %xmm4
(gdb) x/s $rax
0x1e4:	<error: Cannot access memory at address 0x1e4>
(gdb) frame 11
#11 0x00007f5e0c2a6fc0 in MethodName::MethodName (this=0x7f5e0c7f4af0, method=<optimized out>, dotted=<optimized out>) at src/codeCache.cpp:54
54	        snprintf(_buf, sizeof(_buf), "%s.%s", fixClassName(class_name, dotted), method_name);
(gdb) p method_name
$7 = 0x1e4 <error: Cannot access memory at address 0x1e4>
(gdb) p class_name
$8 = 0x7f5df000e5a0 "/etc/localtime"
(gdb) p _buf
$9 = ".usy.eatCpu\000read.run\000\000\000\000NH*\f^\177\000\000\020\064*\f^\177\000\000\210K\177\f^\177\000\000\307R\273\365\000\000\000\000K\355\326\003\000\000\000\000\a\000\000\000\003V\000\000\000\070*\f^\177\000\000`L\177\f^\177\000\000\240%*\f^\177\000\000\204K\177\f^\177\000\000!\264\063#^\177\000\000\020P\000\360]\177\000\000\000\000\000\000\000\000\000\000\030\000\000\000\000\000\000\000\340M\177\f^\177\000\000x\214J\360]\177\000\000`\214J\360]\177\000\000\260s*\f^\177\000\000\002\000\000\000\000\000\000\000\360L\177\f^\177\000\000\030\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\000"...
(gdb) quit

Here's the full HotSpot error log, but I haven't seen anything interesting. I can share the core file too if needed.

Environment:

$ uname -a
Linux fedora25-bcc_default_1487706588981_73712 4.10.0-1.vanilla.knurd.1.fc25.x86_64 #1 SMP Mon Feb 20 06:57:37 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ java -version
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-b14)
OpenJDK 64-Bit Server VM (build 25.121-b14, mixed mode)

show detailed info in stacktrace

I mean the actual filename and source line number should be showed, like:
Counter: 5871 (51.35%)
[ 0] jshort_disjoint_arraycopy( XXX.java: xx)
[ 1] java.util.Arrays.copyOf( XXX.java: xx)
[ 2] java.lang.AbstractStringBuilder.ensureCapacityInternal( XXX.java: xx)
[ 3] java.lang.AbstractStringBuilder.append( XXX.java: xx)
[ 4] java.lang.StringBuilder.append( XXX.java: xx)
[ 5] com.company.Main.main( Main.java: 12)

Thus it is easy to debug which line code is the problem.

Document interaction of profiler with DebugNonSafepoints

The README calls out the use of AsyncGetStackTrace to avoid the safepoint bias problem.

It isn't clear to me if I have to remember to add -XX:+DebugNonSafepoints to the JVM options to profit from this. (My assumption is that I do.)

https://github.com/jvm-profiling-tools/honest-profiler/blob/master/src/main/cpp/agent.cpp#L35 mentions this JVM option in a comment about why an empty method is needed.

async-profiler / async-profiler Goto Github PK

async-profiler's Introduction

async-profiler

Download

Supported platforms

CPU profiling

ALLOCATION profiling

Installing Debug Symbols

Wall-clock profiling

Java method profiling

Building

Basic Usage

Launching as an Agent

Multiple events

Flame Graph visualization

Profiler Options

Profiling Java in a container

Restrictions/Limitations

Troubleshooting

async-profiler's People

Contributors

Stargazers

Watchers

Forkers

async-profiler's Issues

Recommend Projects

Recommend Topics

Recommend Org