async-profiler / async-profiler Goto Github PK

Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

Home Page: https://github.com/async-profiler/async-profiler/releases

License: Apache License 2.0

C++ 77.19% Makefile 0.68% Shell 0.52% C 4.95% Java 14.42% HTML 2.23%

async-profiler's Issues

Revise usage of jmethodID to distinguish between Java methods and native symbols

jmethodID has ambiguous meaning in the profiler: it can denote either Java method or a native symbol. Currently this type is encoded in the highest bit of jmethodID pointer. This works well on 64-bit Linux, but may not be appropriate for 32-bit platforms (e.g. linux-arm).

Measure and document overhead

The overhead of the profiler should be measured and documented. We should test CPU-bound code as well as I/O-bound code, with varying stack depths (up to 1000s of frames).

Allocation profiling reports in sample unit other than bytes

To test I used a benchmark workload and compared JMC reporting to AP, benchmark code:

public class AllocationProfilingBench
{
    static class A {} // 16b
    static class B extends A {long f1,f2;} // 32b
    static class C extends B {long g1,g2,g3,g4;} // 64b

    A newA() { return new A();}
    B newB() { return new B();}
    C newC() { return new C();}

    @Benchmark
    public Object allocate()
    {
        Object result;
        double rnd = ThreadLocalRandom.current().nextDouble();
        if (rnd < 0.333)
            result = newA();
        else if (rnd < 0.666)
            result = newB();
        else
            result = newC();
        return result;
    }
}

In JMC I see the following breakdown:

Class				Size		Pressure(%)
AllocationProfilingBench$C	64b		57%
AllocationProfilingBench$B	32b		28%
AllocationProfilingBench$A	16b		14%

This is as expected. In AP I would expect an output in the object size multiple, but I'm getting this (trimmed the stacks):

AllocationProfilingBench$A_[k] 89
AllocationProfilingBench$A_[i] 5368
AllocationProfilingBench$B_[k] 213
AllocationProfilingBench$B_[i] 10599
AllocationProfilingBench$C_[k] 422
AllocationProfilingBench$C_[i] 20001

I assume the _[i] are in TLAB and _[k] are out of TLAB (nice touch, I'll add a wiki page on alloc profiling to give some details). The ratios look right, but the quantities reported are not in the unit I expected.

additional view

Hello!

Nice work!

In addition to cmdline and flamegraph you might consider using this as view generator: https://github.com/01org/IntelSEAPI/wiki

It supports JIT symbols, stacks and many more things.

Please contact me if interested.

It seems that huge amount of sampling is omited from dump (and flame graph)

Hello,

I am profiling Cassandra.
At the end of profiling I get:

--- Execution profile ---
Total:                 230393
Non-Java:              26372 (11.45%)
GC active:             62 (0.03%)
Unknown (native):      83399 (36.20%)
Not walkable (native): 6 (0.00%)
Unknown (Java):        1016 (0.44%)
Not walkable (Java):   7 (0.00%)
Deopt:                 322 (0.14%)

As I can understand, ~50% of perf events was captured at points where asyncGetCallTrace can not reconstruct Stack Traces. Am I right?
And this events does not included in Flame Graph. May be them should be included as "unknown" method to indicate that signigicant amount of CPU usage can not be recognized.

Thanks.

Automatically stop profiling after a specified duration

Reintroduce the "duration:N" parameter support in the profiler by having it detect that the specified number of seconds has elapsed and stop automatically. This requires stopping the profiler from a separate thread, or after returning from the signal handler, because it is not safe to stop the profiler from the signal handler that detects the timeout.

Perhaps we should consider leaving this up to the user, and removing the "duration" parameter altogether? Alternatively, the helper script could accept a duration parameter and do this automatically:

$ ./profiler.sh -p 8983 -a start -d 10      # duration: 10 seconds, script stops automatically

Flame Graph visualization doesn't work out of box

When I execute ./profiler.sh -d 10 -f /tmp/flamegraph.svg {precess id}
/tmp/flamegraph.svg contains following text:
`--- Execution profile ---
Total samples: 17
Non-Java: 16 (94.12%)
Unknown (native): 1 (5.88%)

Frame buffer usage: 0%
`

Expected: file containing SVG data.

Compile error on alpine linux

Compilation fails with this error:

g++ -O2 -I/opt/jdk/include -I/opt/jdk/include/linux -fPIC -shared -o build/libasyncProfiler.so src/*.cpp -ldl -lpthread
In file included from /usr/include/limits.h:8:0,
from /usr/include/sys/param.h:33,
from src/profiler.cpp:23:
src/arch.h:23:21: error: expected unqualified-id before numeric constant
const unsigned long PAGE_SIZE = 4096;
^
make: *** [Makefile:34: build/libasyncProfiler.so] Error 1

Page size is defined in /usr/include/bits/limits.h and set to 4096.

This fixes it:
#85

Report for number of allocations, in addition to aggregate size

Ideally, both would be recorded and available to the output formatter.

A new command line option -a size|count could control which counter the current formatters choose to report.

More advanced formatters/visualization tools could display both.

async-profiler causes SIGSEGV nmethod::get_deopt_original_pc

Environment

async-profiler version
1063a82
java -version

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

uname -a
Linux app1-us-va.blabla.com 3.13.0-101-generic #148-Ubuntu SMP Thu Oct 20 22:08:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Description
We are repeatedly launching async-profiler on one of our production servers as follows:
profiler.sh -d 600 -i 1000000 -f prof.txt -o summary,traces=200,flat=200,collapsed=samples -e cpu -b 5000000 <pid>
And it always eventually (in about 1-2 hours) leads to the same fatal error (for full log see https://gist.github.com/stIncMale/802da958cb928758029e4567b7e943cd):

#  SIGSEGV (0xb) at pc=0x00007f142aaf166e, pid=5951, tid=139719484446464
#
# JRE version: Java(TM) SE Runtime Environment (8.0_25-b17) (build 1.8.0_25-b17)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x8ab66e]  nmethod::get_deopt_original_pc(frame const*)+0x6e

Could not start attach mechanism: No such file or directory

sudo ./profiler.sh -d 15 2381

Could not start attach mechanism: No such file or directory

Cannot figure out what's wrong.

Rare deadlock in safepoint code during allocation profiling

I've seen the following assertion once:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (safepoint.cpp:687), pid=1919, tid=0x00007f385cb40700
#  fatal error: Deadlock in safepoint code.  Should have called back to the VM before blocking.
#
# JRE version: Java(TM) SE Runtime Environment (8.0_152-b16) (build 1.8.0_152-b16)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.152-b16 mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x00007f385400a000):  JavaThread "main" [_thread_in_native_trans, id=1920, stack(0x00007f385ca40000,0x00007f385cb41000)]

Stack: [0x00007f385ca40000,0x00007f385cb41000],  sp=0x00007f385cb3e3e0,  free space=1016k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xacf62a]  VMError::report_and_die()+0x2ba
V  [libjvm.so+0x500479]  report_fatal(char const*, int, char const*)+0x59
V  [libjvm.so+0x9c4b38]  SafepointSynchronize::block(JavaThread*)+0x268
V  [libjvm.so+0xa7a72d]  JavaThread::check_safepoint_and_suspend_for_native_trans(JavaThread*)+0x2ad
V  [libjvm.so+0x74795a]  jvmti_GetStackTrace+0xfa
C  [libasyncProfiler.so+0xa210]  Profiler::getJavaTraceJVMTI(_jvmtiFrameInfo*, ASGCT_CallFrame*, int)+0x40

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 131  java.lang.Object.clone()Ljava/lang/Object; (0 bytes) @ 0x00007f38451321f8 [0x00007f38451321c0+0x38]
J 178 C1 sun.util.calendar.ZoneInfo.clone()Ljava/lang/Object; (15 bytes) @ 0x00007f384514145c [0x00007f3845141420+0x3c]
J 228% C1 Allocate.main([Ljava/lang/String;)V (153 bytes) @ 0x00007f384515e724 [0x00007f384515e160+0x5c4]
v  ~StubRoutines::call_stub

The problem is understood: it happens because JVMTI GetStackTrace is called from VM context. Need to switch thread state before calling JVMTI functions. To fix this correctly I need a reliable reproducer. I wasn't able to repeat the problem so far.

Allow redirecting all output to a file

Currently, only the raw stack dumps can be redirected to a file. The rest of the output goes to the Java program's stdout, which can be undesired in case that output is parsed or redirected in the first place.

I propose another agent parameter: file:<path-to-file>, which can be appended to the start command. For example: jattach ... file:/tmp/profile.log,interval:1000000,start.

Incorrect behavior when JVM exits before profiling duration elapsed

Hi! Thank you for an awesome tool, it is a pleasure to use and has been very helpful to discover some subtle bugs in my code :)

I noticed a few minor issues when using time-capped collect command. All of them happen, when JVM exits before the configured duration of profiling session elapses:

profiler does not exit immediately after JVM, but remains alive until the time elapses
upon exiting the profiler prints Could not start attach mechanism: No such file or directory (I've commented redirection of output to /dev/null for debug purposes)
if an output file was specified, then it contains only one line: CPU profiling started

Cheers,
Andrei

Automatic tests

I'd like to tackle some automatic tests for the agent, so that we have a better assurance nothing is being broken. While ideally I'd like to have unit tests for the various components (e.g. the symbol parsing, the code cache, the perf_events layer etc.), I think a good start would be to just do a black-box test that verifies we haven't broken anything important.

To that end, I propose implementing a small Java application (no dependencies outside of JDK), which the test will build, run, attach the profiler, start profiling, dump traces to a file, and verify that we are getting something sensible. Not a line-by-line diff, but just that symbols are being resolved, and that we're getting the obviously-hottest methods in the trace. This can also serve as documentation, in a way.

Profiler options (-o, -t) are "persisted" between profiler runs (pairs of attach-stop)

async-profiler version
8b2fd87

Description
Profiler options (at least -o and -t) are "persisted" between profiler runs (pairs of attach-stop). As a result, one can not disable them once they have been set.

Steps to reproduce

Start or restart a Java process that will be profiled.
Run profiler.sh -d 5 -f prof.txt -o summary,traces=1 -t 1234.
Try turning off -t option and disable traces: profiler.sh -d 5 -f prof.txt -o summary 1234.
Observe that profiler's output still works as if options traces=1 -t have been set.

Workaround
Restart the Java process that is being profiled. This means that the profiler options are either stored in the memory of the process, or in a temporary file that is deleted once the process quits.

Implement stack walking without AsyncGetCallTrace

There is a number of problems with AsyncGetCallTrace API.

The idea is to implement Java stack walking on our own without relying on AGCT. Since the agent runs in the context of JVM, it can access VM structures, especially those exported through VMStructs. It should be possible to replicate stack walking logic of the VM inside async-profiler, though it might be challenging. The main risk is that differrent versions of JVM may have different stack layout, but VMStructs along with special handling of the known versions is likely to help.

Support Allocation profiler on JDK 7

Currently Allocation profiler works only on JDK 8+.
The problem is KlassHandle argument in AllocTracert functions.

In JDK 8 KlassHandle is Klass*
In JDK 7 KlassHandle is a pointer to klassOop which has one more level of indirection

Any plans to support macOS at least for heap-profiling ?

Not sure if this is the right place for asking but I was wondering if you are considering adding support for heap profiling on macOS or will this stay linux only ?

Profiler doesn't work when MaxJavaStackTraceDepth set to -1

When I run my application with -XX:MaxJavaStackTraceDepth=-1 the profiler doesn't print any stack trace. The output consists of the following.

--- Execution profile ---
>Total samples:         38441
>Non-Java:              3681 (9.58%)
>GC active:             490 (1.27%)
>Unknown (native):      21475 (55.86%)
>Not walkable (native): 9409 (24.48%)
>Unknown (Java):        838 (2.18%)
>Not walkable (Java):   2533 (6.59%)
>Deopt:                 15 (0.04%)
>
>Frame buffer usage:    0%

Removing the option solves the problem. If it's expected behavior, it'd be great to document it in README.

comparison with Honest Profiler

I was wondering if there is any documented comparison of Honest Profiler vs. Async Profiler. I can see from the README that Async Profiler offers some features like heap profiling that are not present in Honest Profiler. OTOH, Honest Profiler has a GUI interface that I don't see in Async Profiler. It would be awesome if there were a high-level comparison in the README somewhere, indicating when each tool is most appropriate, differences in approach, etc. I think many users will have this question, given that both tools live in the jvm-profiling-tools GitHub namespace.

Thanks for all the awesome work! Very excited to try out Async Profiler on our code base.

An option to profile threads separately

It is suggested to add -t option which will add an extra frame with a thread ID to the bottom of each stack trace. For Java threads a thread name can be printed instead of thread ID.

Alloc smoke test fails with java9

It seems with Java 9 the heap profile is missing stacktraces of where objects are allocated. Is this expected behavior? I can't share anything, but the alloc-smoke-test.sh file fails on my CentOS machine using Java9

Java -version output:

java version "9"
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)

Snapshot publishing?

First of all, thanks a ton for this profiler - I have been playing with it at $work and it seems to produce quite reasonable results. We're looking into ways to adopt it in our simulation runs (to identify performance problems earlier).

The problem I'm facing has probably nothing to do with the project itself - I'm quite terrible at building native packages. Is there a way to publish snapshot on regular basis (I'm not sure how this supposed to be distributed through)? I'd probably be just as happy with a new release containing binaries for Linux.

linux binaries in 1.1 release seems not to support svg format

I download binaries from here: https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.1/async-profiler-1.1-linux-x64.zip

After some trial and error I got to this:

Unrecognized option: --title
Usage: ./profiler.sh [action] [options]
Actions:
start start profiling and return immediately
stop stop profiling
status print profiling status
list list profiling events supported by the target JVM
collect collect profile for the specified period of time
and then stop (default action)
Options:
-e event profiling event: cpu|alloc|lock|cache-misses etc.
-d duration run profiling for seconds
-f filename dump output to
-i interval sampling interval in nanoseconds
-b bufsize frame buffer size
-t profile different threads separately
-o fmt[,fmt...] output format: summary|traces|flat|collapsed

is a numeric process ID of the target JVM
or 'jps' keyword to find running JVM automatically using jps tool

Example: ./profiler.sh -d 30 -f profile.fg -o collapsed 3456
./profiler.sh start -i 999000 jps
./profiler.sh stop -o summary,flat jps

note svg format is not there.

CI build

Would we want to integrate with some CI build solution so that we can 1) verify pull requests automatically before merging, and 2) produce binary packages for Debian/RH?

Wall-clock profiling

It may be helpful to measure not only the time that method spent consuming the CPU but also the time method was waiting for lock or I/O. Please implement 'wall clock' profiling mode.

Consider moving async-profiler repo under the jvm-profiling-tools

This has been setup recently to host JVM profiling tools under an org rather than personal repos and thus encourage more community investment.
For your consideration.

ReentrantLock contention

Hi guys,

First on above: thanks for this awesome project and for the hard work on it!!!

Now my question: would be possible to have a contention profiling for Java's ReentrantLocks similar to the -e lock option?

I was looking http://www.brendangregg.com/offcpuanalysis.html to find something similar, given that under contention ReentrantLock uses Unsafe::park/unpark, but having it on the async-profiler toolkit would be awesome.
If there is something I could help to make it possible I would be very happy to contribute 👍

Thanks,
Franz

Dumping profiles can take a long time

First of, I would thank you for creating this great tool. It is really invaluable for diagnosis JVM performance issues.

When I run the profiler for a long time writing a collapsed profile can take a long time. For example a 30 second run can create a file of ~150 MB which takes 10 min to write. Which is kinda slow (250 kb/s). I did take a look at the source, and cannot find any obvious problem with it (no excessive flushing).

I am wondering if this can be sped up. A viable alternative solution (for me) would be to dump the traces and the method names to disk and to build the collapsed profiles (or any other profile) later on. That should relatively compact and should easy to dump to disk.

Compilation warning

mkdir -p build
g++ -O2 -I/usr/lib/jvm/java-7-oracle/include -I/usr/lib/jvm/java-7-oracle/include/linux -fPIC -shared -o build/libasyncProfiler.so -Wl,-soname,libasyncProfiler.so src/*.cpp
src/asyncProfiler.cpp: In member function 'void Profiler::recordSample(void*)':
src/asyncProfiler.cpp:166:52: warning: ignoring return value of 'ssize_t write(int, const void*, size_t)', declared with attribute warn_unused_result [-Wunused-result]
         write(STDOUT_FILENO, ERROR, sizeof (ERROR));

JVM Frames seem to appear only as "softnet_data"

I am using the Oralce JDK/ServerJRE tar.gz distributions from java.oracle.com to run my applications on SLES11 SP4. When I profile my applications, everything works fine, except that all frames withing the JVM seem to appear as "softnet_data" in the profiler output. Some examples (collapsed format):

# Application code that ends up in Arrays.hashCode()
<some application methods>;java/util/Objects.hash;java/util/Arrays.hashCode;java/util/AbstractList.hashCode;java/util/AbstractList$Itr.next;java/util/Arrays$ArrayList.get;softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k] 8

# Application code that writes to a file
<some application methods>;java/io/BufferedWriter.flush;java/io/OutputStreamWriter.flush;sun/nio/cs/StreamEncoder.flush;sun/nio/cs/StreamEncoder.implFlush;sun/nio/cs/StreamEncoder.implFlushBuffer;sun/nio/cs/StreamEncoder.writeBytes;java/io/FileOutputStream.write;java/io/FileOutputStream.writeBytes;softnet_data_[k];softnet_data_[k];softnet_data_[k];system_call_fastpath_[k];sys_write_[k];vfs_write_[k];do_sync_write_[k];generic_file_aio_write_[k];__generic_file_aio_write_[k];generic_file_buffered_write_[k];generic_perform_write_[k];ext3_write_begin   [ext3]_[k];__block_write_begin_[k];ext3_get_block       [ext3]_[k];ext3_get_blocks_handle       [ext3]_[k];ext3_alloc_branch    [ext3]_[k];ext3_new_blocks      [ext3]_[k];__ext3_journal_get_write_access      [ext3]_[k];journal_get_write_access     [jbd]_[k];journal_put_journal_head      [jbd]_[k] 8

# Application code doing reflection. Ends up in String.intern()
<some application methods>;org/springframework/aop/support/AopUtils.getMostSpecificMethod;org/springframework/util/ClassUtils.getMostSpecificMethod;java/lang/Class.getMethod;java/lang/Class.getMethod0;java/lang/Class.privateGetMethodRecursive;java/lang/Class.getMethod0;java/lang/Class.privateGetMethodRecursive;java/lang/Class.searchMethods;java/lang/String.intern;softnet_data_[k];softnet_data_[k];softnet_data_[k] 88
...

# Probably GC or JIT
softnet_data_[k];softnet_data_[k];[unknown];softnet_data_[k];softnet_data_[k];system_call_fastpath_[k];sys_futex_[k];do_futex_[k];futex_wait_[k];futex_wait_queue_me_[k];__hrtimer_start_range_ns_[k];_raw_spin_unlock_irqrestore_[k] 209
softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k];softnet_data_[k] 6
...

Could the installation location of the JDK/Server-JRE be the problem? We just unzip the distro into a directory on our server, e.g. /app/installation/java/jdk1.8.0_144/.

Migrate Travis CI to build the new repository

Add "info" command

Please to add "jattach.sh -p XXXX -i" - print info (profile was (not)attached and any other information).

Accuracy of heap profile

I've got a profile where 90%+ of allocations are object array allocations in one method which does not do any allocations in bytecode. What kind of inaccuracies are expected when doing heap profiling? Could it be allocated in inlined methods which are not shown?

"Failed to inject profiler into <pid>"

I'm running the following command:

$ ./profiler.sh list 5015

And am getting the following output:

Failed to inject profiler into 5015
	linux-vdso.so.1 (0x00007ffd4ddcb000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f62c2080000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f62c1e63000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f62c1ae1000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f62c17dd000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f62c15c6000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f62c1227000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f62c291a000)

The process I'm trying to attach to is running:

$ java -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

The OS is:

$ uname -a
Linux ip-<snip> 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux

"-o flamegraph" should be "-o collapsed"

The flamegraphs script takes collapsed stacks as input. The output of async-profiler is the collapsed stacks data, not the flame graph SVG. The collapsed stacks format is a common summary profile that can be presented many different ways.
In future it should be possible to feed collapsed stacks data to other views/tools, see for instance jvm-profiling-tools/honest-profiler#199

Licensing and distribution question

Hi, I'd like to create a Clojure wrapper library around async-profiler so that the profiler can be used from within the JVM process itself. This would require me to package the pre-built libAsyncprofiler.so file into a JAR, later to be deployed to https://clojars.org/. The wrapper itself is going to be open-source and hosted on Github, and it goes without saying that I will provide links and references to the original async-profiler library wherever necessary. My questions are:

Am I allowed to do that at all?
What license should I pick for that project then? Most Clojure projects are licensed under EPL, can I do that, or should I stick to Apache-2.0?

Collapsed output format could better represent missing stacktraces

When first using this tool I was confused about the "Ignored .... lines with invalid format" error reported by the flamegraph script.

Inspection of the generated file showed lines with just a frequency and no stacktrace.

Running without -o collapsed -f ... revealed that the framebuffer was undersized. Increasing its size fixed the problem.

Usability suggestions:

Render <unknown>, with a reason if known, as the stacktrace in the output format
Refer back tothe -b option in the Troubleshooting section of the README, describing what to do when <unknown> traces are reported
Perhaps explicitly mention to run -o summary as a troubleshooting step for people like me who dived straight into flamegraph generation.

Document interaction of profiler with DebugNonSafepoints

The README calls out the use of AsyncGetStackTrace to avoid the safepoint bias problem.

It isn't clear to me if I have to remember to add -XX:+DebugNonSafepoints to the JVM options to profit from this. (My assumption is that I do.)

https://github.com/jvm-profiling-tools/honest-profiler/blob/master/src/main/cpp/agent.cpp#L35 mentions this JVM option in a comment about why an empty method is needed.

Marry async-profiler and JMH

It would be nice to filter async-profiler's output when profiling JMH runs to include only traces from the benchmarked methods, excluding e. g. @Setup and @Teardown.

show detailed info in stacktrace

I mean the actual filename and source line number should be showed, like:
Counter: 5871 (51.35%)
[ 0] jshort_disjoint_arraycopy( XXX.java: xx)
[ 1] java.util.Arrays.copyOf( XXX.java: xx)
[ 2] java.lang.AbstractStringBuilder.ensureCapacityInternal( XXX.java: xx)
[ 3] java.lang.AbstractStringBuilder.append( XXX.java: xx)
[ 4] java.lang.StringBuilder.append( XXX.java: xx)
[ 5] com.company.Main.main( Main.java: 12)

Thus it is easy to debug which line code is the problem.

Option to elide package names from output, etc.

jfr-flamegraph has a convenient option:

             -s, --use-simple-names
              Use simple names instead of qualified names in the stack
              Default: false

This reduces the size of dumped CSV files and the resulting SVG files, and makes the flamegraph easier to read (at the cost of losing some accuracy).

One can post process the files with a regex to achieve this, but now that async-profiler supports direct flamegraph output, it would be handy to support it directly.

Java Mission Control has further options to control how stack trace elements are grouped/displayed:

Simple vs FQN name for the method owner, as above
show/hide method parameters
show/hide line number
collapse or separate interpreted vs compiled frames.

perf-map-agent displays inlined vs non-inlined frames as distinct (which lets you get a nicer visualization of inlining that -XX:PrintInlining)

Oracle Performance Studio lets you group the profiles by line-number, byte-code-index, or assembly PC address.

Where do these features lie on the spectrum of "you're planning to implement it" vs "you'd accept a contribution" vs "out of scope")?

Some JVM processes return -1 on attach attempt

I cannot attach to some JVM processes.
I made profiler.sh output more info:

    collect)
        $JATTACH $PID load $PROFILER true start,file=$FILE$INTERVAL$FRAMEBUF #> /dev/null
        if [ $? -ne 0 ]; then
            exit 1
        fi

Got this:

~/async-profiler $ sudo -u xncore ./profiler.sh -d 10 3515
Connected to remote JVM
Response code = -1

Is there a way to debug it further?

(time to) safepoint profiler

Profile safepoints / time to safepoint with perf uprobes.
See https://groups.google.com/forum/#!topic/mechanical-sympathy/tepoA7PRFRU

I forgot to say in the message that the cpu burned by the not-stopping thread (here 17998) can be sampled to get an accurate picture of why was preventing it from reaching safepoint. The final 'block' event will likely provide a good idea but the cpu samples data should avoid any guessing.

Hard crash (core dump) while dumping traces

I have a fairly reproducible hard crash when dumping traces. This happened as I was working on an overhead testing script. To try and reproduce, clone the goldshtn:overhead branch from my repo, build the agent, and then run overhead-test.sh from the test directory. After a couple of lines, I get a hard crash. Some basic analysis:

SIGSEGV in "Attach Listener" thread
Crash address is in strlen called by vsnprintf
After building with -g and looking at the core file in gdb, here's the full stack trace:

$ gdb java -c core.25942
GNU gdb (GDB) Fedora 7.12.1-47.fc25
...
Core was generated by `java Busy 1 1100 7'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
58	}
[Current thread is 1 (Thread 0x7f5e0c7f7700 (LWP 25958))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1  0x00007f5e2333951a in __GI_abort () at abort.c:89
#2  0x00007f5e22be74d9 in os::abort (dump_core=<optimized out>) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1500
#3  0x00007f5e22e12968 in VMError::report_and_die (this=this@entry=0x7f5e0c7f3bd0) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060
#4  0x00007f5e22bf09b5 in JVM_handle_linux_signal (sig=sig@entry=11, info=info@entry=0x7f5e0c7f3e70, ucVoid=ucVoid@entry=0x7f5e0c7f3d40, abort_if_unrecognized=abort_if_unrecognized@entry=1)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:556
#5  0x00007f5e22be4758 in signalHandler (sig=11, info=0x7f5e0c7f3e70, uc=0x7f5e0c7f3d40) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:4341
#6  <signal handler called>
#7  strlen () at ../sysdeps/x86_64/strlen.S:106
#8  0x00007f5e23351ad1 in _IO_vfprintf_internal (s=s@entry=0x7f5e0c7f4840, format=<optimized out>, format@entry=0x7f5e0c2ab75a "%s.%s", ap=ap@entry=0x7f5e0c7f49a8) at vfprintf.c:1637
#9  0x00007f5e2337a489 in _IO_vsnprintf (string=0x7f5e0c7f4af0 ".usy.eatCpu", maxlen=<optimized out>, format=0x7f5e0c2ab75a "%s.%s", args=args@entry=0x7f5e0c7f49a8) at vsnprintf.c:114
#10 0x00007f5e233587b2 in __snprintf (s=s@entry=0x7f5e0c7f4af0 ".usy.eatCpu", maxlen=maxlen@entry=520, format=format@entry=0x7f5e0c2ab75a "%s.%s") at snprintf.c:33
#11 0x00007f5e0c2a6fc0 in MethodName::MethodName (this=0x7f5e0c7f4af0, method=<optimized out>, dotted=<optimized out>) at src/codeCache.cpp:54
#12 0x00007f5e0c2a94f1 in Profiler::dumpTraces (this=this@entry=0x7f5e0c4ae480 <Profiler::_instance>, out=..., max_traces=max_traces@entry=500) at src/profiler.cpp:385
#13 0x00007f5e0c2ab1e0 in Agent_OnAttach (vm=<optimized out>, options=<optimized out>, reserved=<optimized out>) at src/vmEntry.cpp:174
#14 0x00007f5e22a554ac in JvmtiExport::load_agent_library (op=0x7f5df0039e90, st=0x7f5e0c7f6d30) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/share/vm/prims/jvmtiExport.cpp:2275
#15 0x00007f5e225fba78 in attach_listener_thread_entry (thread=<optimized out>, __the_thread__=<optimized out>) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/share/vm/services/attachListener.cpp:454
#16 0x00007f5e22db6bfb in JavaThread::thread_main_inner (this=this@entry=0x7f5df4001000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:1710
#17 0x00007f5e22db70df in JavaThread::run (this=0x7f5df4001000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:1690
#18 0x00007f5e22be6492 in java_start (thread=0x7f5df4001000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.121-10.b14.fc25.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:782
#19 0x00007f5e23cf86ca in start_thread (arg=0x7f5e0c7f7700) at pthread_create.c:333
#20 0x00007f5e23409f7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
(gdb) frame 7
#7  strlen () at ../sysdeps/x86_64/strlen.S:106
106		movdqu	(%rax), %xmm4
(gdb) x/s $rax
0x1e4:	<error: Cannot access memory at address 0x1e4>
(gdb) frame 11
#11 0x00007f5e0c2a6fc0 in MethodName::MethodName (this=0x7f5e0c7f4af0, method=<optimized out>, dotted=<optimized out>) at src/codeCache.cpp:54
54	        snprintf(_buf, sizeof(_buf), "%s.%s", fixClassName(class_name, dotted), method_name);
(gdb) p method_name
$7 = 0x1e4 <error: Cannot access memory at address 0x1e4>
(gdb) p class_name
$8 = 0x7f5df000e5a0 "/etc/localtime"
(gdb) p _buf
$9 = ".usy.eatCpu\000read.run\000\000\000\000NH*\f^\177\000\000\020\064*\f^\177\000\000\210K\177\f^\177\000\000\307R\273\365\000\000\000\000K\355\326\003\000\000\000\000\a\000\000\000\003V\000\000\000\070*\f^\177\000\000`L\177\f^\177\000\000\240%*\f^\177\000\000\204K\177\f^\177\000\000!\264\063#^\177\000\000\020P\000\360]\177\000\000\000\000\000\000\000\000\000\000\030\000\000\000\000\000\000\000\340M\177\f^\177\000\000x\214J\360]\177\000\000`\214J\360]\177\000\000\260s*\f^\177\000\000\002\000\000\000\000\000\000\000\360L\177\f^\177\000\000\030\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\000"...
(gdb) quit

Here's the full HotSpot error log, but I haven't seen anything interesting. I can share the core file too if needed.

Environment:

$ uname -a
Linux fedora25-bcc_default_1487706588981_73712 4.10.0-1.vanilla.knurd.1.fc25.x86_64 #1 SMP Mon Feb 20 06:57:37 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ java -version
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-b14)
OpenJDK 64-Bit Server VM (build 25.121-b14, mixed mode)

Update ReadMe.md with Example about "How to use this profile on remote JVM"

Steps I follow:

download
compile
Write Wrap class which call throw JNI start and stop profiler method.
Run java application
invoke wrap class methods
See in output stream summary of profilling

How can I do this for remote jvm?

Parse external debug symbols

Make FlameGraph parameters configurable

Add command-line option to tune certain FlameGraph attributes:

imagewidth
frameheight
minwidth
etc.

Can't build project

Hi!
I can't compile project on CentOS release 6.6
g++ version:

g++ -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)

error:

g++ -O2 -I/opt/jdk/jdk1.7.0_75//include -I/opt/jdk/jdk1.7.0_75//include/linux -fPIC -shared -o build/libasyncProfiler.so src/*.cpp -ldl -lpthread
In file included from src/profiler.h:22,
                 from src/javaApi.cpp:18:
src/spinLock.h: In member function ‘void SpinLock::spinLock()’:
src/spinLock.h:39: error: ‘__builtin_ia32_pause’ was not declared in this scope
In file included from src/perfEvent.h:23,
                 from src/perfEvent.cpp:25:
src/spinLock.h: In member function ‘void SpinLock::spinLock()’:
src/spinLock.h:39: error: ‘__builtin_ia32_pause’ was not declared in this scope
In file included from src/profiler.h:22,
                 from src/profiler.cpp:24:
src/spinLock.h: In member function ‘void SpinLock::spinLock()’:
src/spinLock.h:39: error: ‘__builtin_ia32_pause’ was not declared in this scope
In file included from src/profiler.h:22,
                 from src/vmEntry.cpp:22:
src/spinLock.h: In member function ‘void SpinLock::spinLock()’:
src/spinLock.h:39: error: ‘__builtin_ia32_pause’ was not declared in this scope
make: *** [build/libasyncProfiler.so] Error 1

async-profiler / async-profiler Goto Github PK

async-profiler's Issues

Recommend Projects

Recommend Topics

Recommend Org