Recently I’ve run into some misterious Java VM crashes. After narrowing down the crashing code, I’ve ended up with this POC:
package org.axt.cr;
public class Crash {
@SuppressWarnings("unused")
private static int val;
private static double arr[] = new double[2048];
public static void cr() {
for(int i=0; i<arr.length; i++) {
arr[i] = Double.NaN;
}
val = (arr[0] == 0 ? 1: 2);
}
public static void main(String[] args) {
cr();
try {Thread.sleep(2000);} catch(InterruptedException e) {}
cr();
}
}
It must be run with -XX:CompileThreshold=100 VM option, to enable the JIT compiler to do it’s work “early”. With this setting the ‘cr’ method is JITed during the sleep, and the second time the JITed code gets executed. Note that you can trace what the JIT compiler is doing with the -XX:+PrintCompilation option. Adding some sysout lines, we can see, that the ‘cr’ method is fully executed, and the crash occurs while returning from the method.
# # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0000000000000000, pid=13364, tid=140737327884032 # # JRE version: 6.0_26-b03 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [error occurred during error reporting (printing problematic frame), id 0xb] # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp #
It looks like a stack corruption, and has something to do with NaN-s.
After running the POC with different JavaVM-s, it turned out that only 64bit builds are affected since 1.6.0_25. I figured out if I replace the ?: operator with an if-else statement:
if(arr[0] == 0) val = 1; else val = 2;
the bug disappears. First I’ve dumped the java bytecode, to check for differences (Here you can find an overview of the bytecode instructions):
$ javap -c org.axt.cr.Crash
This is the bytecode with the ?: operator:
[...] 24: getstatic #12; //Field arr:[D 27: iconst_0 28: daload 29: dconst_0 30: dcmpl 31: ifne 38 34: iconst_1 35: goto 39 38: iconst_2 39: putstatic #24; //Field val:I 42: return
And this with the if-else statement:
[...] 24: getstatic #12; //Field arr:[D 27: iconst_0 28: daload 29: dconst_0 30: dcmpl 31: ifne 41 34: iconst_1 35: putstatic #24; //Field val:I 38: goto 45 41: iconst_2 42: putstatic #24; //Field val:I 45: return
You can see that after the comparison (24-30) the new value for the variable 'val' is pushed onto the stack, and 'putstatic' is called (#24 is the reference of field 'val' in the constant pool). In the first case it is called after the branching is done, in the second case it is called in the branches. Not a big difference. Afterall in the POC, the condition is always false, so exactly the same code get executed in both of the cases:
dcmpl -> ifne -> iconst_2 -> putstatic -> return
It's time to check the JITed code. JDK has support for dumping the compiled code. You need a disassembler plugin which you can download from Project Kenai's homepage or you can build your own, it is part of OpenJDK. You need to rename it to 'hsdis-amd64.so' and add it to your library path. You will also need some VM options, so our launcher code will look like this:
LD_LIBRARY_PATH=. java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+PrintCompilation -XX:CompileThreshold=100 -cp . org.axt.cr.Crash
Here is the dumped JITed code (the second block is the interesting part):
Decoding compiled method 0x00007fffed05e1d0:
[...]
# {method} 'cr' '()V' in 'org/axt/cr/Crash'
0x00007fffed05e360: callq 0x00007ffff6e53710 ; {no_reloc}
0x00007fffed05e365: data32 data32 nopw 0x0(%rax,%rax,1)
0x00007fffed05e370: mov %eax,-0x6000(%rsp)
0x00007fffed05e377: push %rbp
0x00007fffed05e378: sub $0x20,%rsp
0x00007fffed05e37c: rex mov (%rsi),%ebp
0x00007fffed05e37f: mov %rsi,%rdi
0x00007fffed05e382: movabs $0x7ffff6ed2880,%r10
0x00007fffed05e38c: callq *%r10
0x00007fffed05e38f: movabs $0xc1269410,%r10 ; {oop('org/axt/cr/Crash')}
0x00007fffed05e399: mov 0x258(%r10),%edi ;*getstatic arr
0x00007fffed05e3a0: mov 0xc(%rdi),%r8d
0x00007fffed05e3a4: cmp %r8d,%ebp
0x00007fffed05e3a7: jae 0x00007fffed05e511 ;*dastore
0x00007fffed05e3ad: mov %ebp,%r9d
0x00007fffed05e3b0: inc %r9d ;*iinc
0x00007fffed05e3b3: movsd -0x7b(%rip),%xmm0
0x00007fffed05e3bb: movsd %xmm0,0x10(%rdi,%rbp,8) ;*dastore
0x00007fffed05e3c1: cmp %r8d,%r9d
0x00007fffed05e3c4: jge 0x00007fffed05e4b2 ;*if_icmplt
0x00007fffed05e3ca: add $0x2,%ebp
0x00007fffed05e3cd: xor %ecx,%ecx
0x00007fffed05e3cf: cmp %ecx,%ebp
0x00007fffed05e3d1: cmovl %ecx,%ebp
0x00007fffed05e3d4: cmp %r8d,%r9d
0x00007fffed05e3d7: jae 0x00007fffed05e4f7
0x00007fffed05e3dd: movsd %xmm0,0x10(%rdi,%r9,8) ;*dastore
0x00007fffed05e3e4: inc %r9d ;*iinc
0x00007fffed05e3e7: cmp %ebp,%r9d
0x00007fffed05e3ea: jl 0x00007fffed05e3d4
0x00007fffed05e3ec: mov %r8d,%ecx
0x00007fffed05e3ef: sub %r9d,%ecx
0x00007fffed05e3f2: and $0xfffffff0,%ecx
0x00007fffed05e3f5: add %r9d,%ecx
0x00007fffed05e3f8: cmp %ecx,%r9d
0x00007fffed05e3fb: jl 0x00007fffed05e413
0x00007fffed05e3fd: mov %r9d,%ebx
0x00007fffed05e400: jmpq 0x00007fffed05e49a
0x00007fffed05e405: data32 data32 nopw 0x0(%rax,%rax,1)
0x00007fffed05e410: mov %ebx,%r9d
0x00007fffed05e413: movsd %xmm0,0x10(%rdi,%r9,8) ;*dastore
0x00007fffed05e41a: mov %r9d,%ebx
0x00007fffed05e41d: add $0x10,%ebx ;*iinc
0x00007fffed05e420: movslq %r9d,%r11
0x00007fffed05e423: movsd %xmm0,0x18(%rdi,%r11,8)
0x00007fffed05e42a: movsd %xmm0,0x20(%rdi,%r11,8)
0x00007fffed05e431: movsd %xmm0,0x28(%rdi,%r11,8)
0x00007fffed05e438: movsd %xmm0,0x30(%rdi,%r11,8)
0x00007fffed05e43f: movsd %xmm0,0x38(%rdi,%r11,8)
0x00007fffed05e446: movsd %xmm0,0x40(%rdi,%r11,8)
0x00007fffed05e44d: movsd %xmm0,0x48(%rdi,%r11,8)
0x00007fffed05e454: movsd %xmm0,0x50(%rdi,%r11,8)
0x00007fffed05e45b: movsd %xmm0,0x58(%rdi,%r11,8)
0x00007fffed05e462: movsd %xmm0,0x60(%rdi,%r11,8)
0x00007fffed05e469: movsd %xmm0,0x68(%rdi,%r11,8)
0x00007fffed05e470: movsd %xmm0,0x70(%rdi,%r11,8)
0x00007fffed05e477: movsd %xmm0,0x78(%rdi,%r11,8)
0x00007fffed05e47e: movsd %xmm0,0x80(%rdi,%r11,8)
0x00007fffed05e488: movsd %xmm0,0x88(%rdi,%r11,8)
0x00007fffed05e492: cmp %ecx,%ebx
0x00007fffed05e494: jl 0x00007fffed05e410
0x00007fffed05e49a: cmp %r8d,%ebx
0x00007fffed05e49d: jge 0x00007fffed05e4b2
0x00007fffed05e49f: nop
0x00007fffed05e4a0: cmp %r8d,%ebx
0x00007fffed05e4a3: jae 0x00007fffed05e4fa
0x00007fffed05e4a5: movsd %xmm0,0x10(%rdi,%rbx,8) ;*dastore
0x00007fffed05e4ab: inc %ebx ;*iinc
0x00007fffed05e4ad: cmp %r8d,%ebx
0x00007fffed05e4b0: jl 0x00007fffed05e4a0
0x00007fffed05e4b2: test %r8d,%r8d
0x00007fffed05e4b5: jbe 0x00007fffed05e525
0x00007fffed05e4b7: movsd 0x10(%rdi),%xmm0 ;*daload 0x00007fffed05e4bc: mov $0x1,%r11d 0x00007fffed05e4c2: mov $0x2,%r9d 0x00007fffed05e4c8: ucomisd -0x188(%rip),%xmm0 0x00007fffed05e4d0: jnp 0x00007fffed05e4df 0x00007fffed05e4d6: pushfq 0x00007fffed05e4d7: and $0xffffffffffffff2b,%rsp 0x00007fffed05e4de: popfq 0x00007fffed05e4df: nop 0x00007fffed05e4e0: cmovne %r9d,%r11d 0x00007fffed05e4e4: mov %r11d,0x25c(%r10) ;*getstatic arr 0x00007fffed05e4eb: add $0x20,%rsp 0x00007fffed05e4ef: pop %rbp 0x00007fffed05e4f0: test %eax,0xaf99b0a(%rip) 0x00007fffed05e4f6: retq [...]
I’ve increased the sleep time to 20 seconds, so I could easily attach with ‘gdb’. I’ve set a breakpont to the entry point of the JIT-ed function, and traced it line-by-line. The first part of the function executes SharedRuntime::OSR_migration_end, the next few dozens of instruction are responsible for filling the array, and the last part is responsible for evaluating the condition and the branching. I’ve noticed that during the last part the code the stack pointer is changed so at the end retq will read some garbage from the stack and the VM crashes while trying to execute code from an unmapped address. Just for comparison, here is the dump of the if-else code:
0x00007fffed06c3f7: movsd 0x10(%rdi),%xmm0 0x00007fffed06c3fc: ucomisd -0x17c(%rip),%xmm0 0x00007fffed06c404: jp 0x00007fffed06c408 0x00007fffed06c406: je 0x00007fffed06c41f ;*ifne 0x00007fffed06c408: movl $0x2,0x25c(%r10) ;*getstatic arr 0x00007fffed06c413: add $0x20,%rsp 0x00007fffed06c417: pop %rbp 0x00007fffed06c418: test %eax,0xaf8bbe2(%rip) 0x00007fffed06c41e: retq 0x00007fffed06c41f: movl $0x1,0x25c(%r10) ;*putstatic val 0x00007fffed06c42a: jmp 0x00007fffed06c413
Funny fact: the JITed code of the if-else is using fewer instructions than the code with ?: operator.
The main difference is in these lines:
0x00007fffed05e4d6: pushfq 0x00007fffed05e4d7: and $0xffffffffffffff2b,%rsp 0x00007fffed05e4de: popfq
We can guess that this lines intended to fixup the flags after the ucomisd instruction, after the result of the comparison is ‘unordered’ (at least one of the operands is NaN or Inf), which is indicated through the parity flag. But instead of operating on the flags register , the code operates on the stack pointer. The good code should be like this:
0x00007fffed05e4d6: pushfq 0x00007fffed05e4d7: and $0xffffffffffffff2b,(%rsp) 0x00007fffed05e4de: popfq
After hunting down the problem, I’ve tried to search for the bug, and found it in Oracle’s bugzilla, and the fix also in OpenJDK7. Grepping through the source it turned out, the following “code blobs” are affected:
instruct cmpF_cc_reg(rFlagsRegU cr, regF src1, regF src2) instruct cmpF_cc_mem(rFlagsRegU cr, regF src1, memory src2) instruct cmpD_cc_reg(rFlagsRegU cr, regD src1, regD src2) instruct cmpD_cc_mem(rFlagsRegU cr, regD src1, memory src2)
In conclusion we can see, that the error is caused by wrongly JITed double or float comparison, the for loop was only needed to trigger the JIT compilation.
If we add ‘-Xcomp’ VM option to the command line, which causes a pre-init JIT compilation of all the functions, we can trigger the bug with the following simplified POC too:
package org.axt.cr;
public class Cr2 {
@SuppressWarnings("unused")
private static int val;
public static void cr(double x) {
val = (x == 0 ? 1: 2);
}
public static void main(String[] args) {
cr(Double.NaN);
}
}
Four questions left:
- why we don’t need the cleanup code in the if-else case? is it normal, or is it a bug?
- can we turn this into an infoleak?
- can we control the stack to achieve code execution?
- why Oracle didn’t fix this in the last 10 months?
I will try to investigate the first three questions, and hopefully you will see the results in a followup post.

that’s cool bro!
Hey there, I think your blog might be having browser compatibility issues.
When I look at your website in Firefox, it looks fine but when opening
in Internet Explorer, it has some overlapping. I just wanted to give you
a quick heads up! Other then that, awesome blog!