Debugging Java VM’s JITed code

Recently I’ve run into some misterious Java VM crashes. After narrowing down the crashing code, I’ve ended up with this POC:

package org.axt.cr;

public class Crash {
	@SuppressWarnings("unused")
	private static int val;
	private static double arr[] = new double[2048];
	
	public static void cr() {
		for(int i=0; i<arr.length; i++) {
			arr[i] = Double.NaN;
		}
		val = (arr[0] == 0 ? 1: 2);
	}
	public static void main(String[] args) {
		cr();
		try {Thread.sleep(2000);} catch(InterruptedException e) {}
		cr();
	}
}

It must be run with -XX:CompileThreshold=100 VM option, to enable the JIT compiler to do it’s work “early”. With this setting the ‘cr’ method is JITed during the sleep, and the second time the JITed code gets executed. Note that you can trace what the JIT compiler is doing with the -XX:+PrintCompilation option. Adding some sysout lines, we can see, that the ‘cr’ method is fully executed, and the crash occurs while returning from the method.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000000000, pid=13364, tid=140737327884032
#
# JRE version: 6.0_26-b03
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C
[error occurred during error reporting (printing problematic frame), id 0xb]
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

It looks like a stack corruption, and has something to do with NaN-s.

After running the POC with different JavaVM-s, it turned out that only 64bit builds are affected since 1.6.0_25. I figured out if I replace the ?: operator with an if-else statement:

if(arr[0] == 0) val = 1; else val = 2;

the bug disappears. First I’ve dumped the java bytecode, to check for differences (Here you can find an overview of the bytecode instructions):

$ javap -c org.axt.cr.Crash

This is the bytecode with the ?: operator:

   [...]
   24:  getstatic       #12; //Field arr:[D
   27:  iconst_0
   28:  daload
   29:  dconst_0
   30:  dcmpl
   31:  ifne    38
   34:  iconst_1
   35:  goto    39
   38:  iconst_2
   39:  putstatic       #24; //Field val:I
   42:  return

And this with the if-else statement:

   [...]
   24:  getstatic       #12; //Field arr:[D
   27:  iconst_0
   28:  daload
   29:  dconst_0
   30:  dcmpl
   31:  ifne    41
   34:  iconst_1
   35:  putstatic       #24; //Field val:I
   38:  goto    45
   41:  iconst_2
   42:  putstatic       #24; //Field val:I
   45:  return

You can see that after the comparison (24-30) the new value for the variable ‘val’ is pushed onto the stack, and ‘putstatic’ is called (#24 is the reference of field ‘val’ in the constant pool). In the first case it is called after the branching is done, in the second case it is called in the branches. Not a big difference. Afterall in the POC, the condition is always false, so exactly the same code get executed in both of the cases:

dcmpl -> ifne -> iconst_2 -> putstatic -> return

It’s time to check the JITed code. JDK has support for dumping the compiled code. You need a disassembler plugin which you can download from Project Kenai’s homepage or you can build your own, it is part of OpenJDK. You need to rename it to ‘hsdis-amd64.so’ and add it to your library path. You will also need some VM options, so our launcher code will look like this:

LD_LIBRARY_PATH=. java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+PrintCompilation -XX:CompileThreshold=100 -cp . org.axt.cr.Crash

Here is the dumped JITed code (the second block is the interesting part):

Decoding compiled method 0x00007fffed05e1d0:
[...]
  # {method} 'cr' '()V' in 'org/axt/cr/Crash'
  0x00007fffed05e360: callq  0x00007ffff6e53710  ;   {no_reloc}
  0x00007fffed05e365: data32 data32 nopw 0x0(%rax,%rax,1)
  0x00007fffed05e370: mov    %eax,-0x6000(%rsp)
  0x00007fffed05e377: push   %rbp
  0x00007fffed05e378: sub    $0x20,%rsp
  0x00007fffed05e37c: rex mov (%rsi),%ebp
  0x00007fffed05e37f: mov    %rsi,%rdi
  0x00007fffed05e382: movabs $0x7ffff6ed2880,%r10
  0x00007fffed05e38c: callq  *%r10
  0x00007fffed05e38f: movabs $0xc1269410,%r10   ;   {oop('org/axt/cr/Crash')}
  0x00007fffed05e399: mov    0x258(%r10),%edi   ;*getstatic arr
  0x00007fffed05e3a0: mov    0xc(%rdi),%r8d     
  0x00007fffed05e3a4: cmp    %r8d,%ebp
  0x00007fffed05e3a7: jae    0x00007fffed05e511  ;*dastore
  0x00007fffed05e3ad: mov    %ebp,%r9d
  0x00007fffed05e3b0: inc    %r9d               ;*iinc
  0x00007fffed05e3b3: movsd  -0x7b(%rip),%xmm0  
  0x00007fffed05e3bb: movsd  %xmm0,0x10(%rdi,%rbp,8)  ;*dastore
  0x00007fffed05e3c1: cmp    %r8d,%r9d
  0x00007fffed05e3c4: jge    0x00007fffed05e4b2  ;*if_icmplt
  0x00007fffed05e3ca: add    $0x2,%ebp
  0x00007fffed05e3cd: xor    %ecx,%ecx
  0x00007fffed05e3cf: cmp    %ecx,%ebp
  0x00007fffed05e3d1: cmovl  %ecx,%ebp
  0x00007fffed05e3d4: cmp    %r8d,%r9d
  0x00007fffed05e3d7: jae    0x00007fffed05e4f7
  0x00007fffed05e3dd: movsd  %xmm0,0x10(%rdi,%r9,8)  ;*dastore
  0x00007fffed05e3e4: inc    %r9d               ;*iinc
  0x00007fffed05e3e7: cmp    %ebp,%r9d
  0x00007fffed05e3ea: jl     0x00007fffed05e3d4
  0x00007fffed05e3ec: mov    %r8d,%ecx
  0x00007fffed05e3ef: sub    %r9d,%ecx
  0x00007fffed05e3f2: and    $0xfffffff0,%ecx
  0x00007fffed05e3f5: add    %r9d,%ecx
  0x00007fffed05e3f8: cmp    %ecx,%r9d
  0x00007fffed05e3fb: jl     0x00007fffed05e413
  0x00007fffed05e3fd: mov    %r9d,%ebx
  0x00007fffed05e400: jmpq   0x00007fffed05e49a
  0x00007fffed05e405: data32 data32 nopw 0x0(%rax,%rax,1)
  0x00007fffed05e410: mov    %ebx,%r9d
  0x00007fffed05e413: movsd  %xmm0,0x10(%rdi,%r9,8)  ;*dastore
  0x00007fffed05e41a: mov    %r9d,%ebx
  0x00007fffed05e41d: add    $0x10,%ebx         ;*iinc
  0x00007fffed05e420: movslq %r9d,%r11
  0x00007fffed05e423: movsd  %xmm0,0x18(%rdi,%r11,8)
  0x00007fffed05e42a: movsd  %xmm0,0x20(%rdi,%r11,8)
  0x00007fffed05e431: movsd  %xmm0,0x28(%rdi,%r11,8)
  0x00007fffed05e438: movsd  %xmm0,0x30(%rdi,%r11,8)
  0x00007fffed05e43f: movsd  %xmm0,0x38(%rdi,%r11,8)
  0x00007fffed05e446: movsd  %xmm0,0x40(%rdi,%r11,8)
  0x00007fffed05e44d: movsd  %xmm0,0x48(%rdi,%r11,8)
  0x00007fffed05e454: movsd  %xmm0,0x50(%rdi,%r11,8)
  0x00007fffed05e45b: movsd  %xmm0,0x58(%rdi,%r11,8)
  0x00007fffed05e462: movsd  %xmm0,0x60(%rdi,%r11,8)
  0x00007fffed05e469: movsd  %xmm0,0x68(%rdi,%r11,8)
  0x00007fffed05e470: movsd  %xmm0,0x70(%rdi,%r11,8)
  0x00007fffed05e477: movsd  %xmm0,0x78(%rdi,%r11,8)
  0x00007fffed05e47e: movsd  %xmm0,0x80(%rdi,%r11,8)
  0x00007fffed05e488: movsd  %xmm0,0x88(%rdi,%r11,8)
  0x00007fffed05e492: cmp    %ecx,%ebx
  0x00007fffed05e494: jl     0x00007fffed05e410
  0x00007fffed05e49a: cmp    %r8d,%ebx
  0x00007fffed05e49d: jge    0x00007fffed05e4b2
  0x00007fffed05e49f: nop
  0x00007fffed05e4a0: cmp    %r8d,%ebx
  0x00007fffed05e4a3: jae    0x00007fffed05e4fa
  0x00007fffed05e4a5: movsd  %xmm0,0x10(%rdi,%rbx,8)  ;*dastore
  0x00007fffed05e4ab: inc    %ebx               ;*iinc
  0x00007fffed05e4ad: cmp    %r8d,%ebx
  0x00007fffed05e4b0: jl     0x00007fffed05e4a0
  0x00007fffed05e4b2: test   %r8d,%r8d
  0x00007fffed05e4b5: jbe    0x00007fffed05e525
  0x00007fffed05e4b7: movsd  0x10(%rdi),%xmm0   ;*daload
  0x00007fffed05e4bc: mov    $0x1,%r11d
  0x00007fffed05e4c2: mov    $0x2,%r9d
  0x00007fffed05e4c8: ucomisd -0x188(%rip),%xmm0  
  0x00007fffed05e4d0: jnp    0x00007fffed05e4df
  0x00007fffed05e4d6: pushfq 
  0x00007fffed05e4d7: and    $0xffffffffffffff2b,%rsp
  0x00007fffed05e4de: popfq  
  0x00007fffed05e4df: nop
  0x00007fffed05e4e0: cmovne %r9d,%r11d
  0x00007fffed05e4e4: mov    %r11d,0x25c(%r10)  ;*getstatic arr
  0x00007fffed05e4eb: add    $0x20,%rsp
  0x00007fffed05e4ef: pop    %rbp
  0x00007fffed05e4f0: test   %eax,0xaf99b0a(%rip) 
  0x00007fffed05e4f6: retq   
[...]

I’ve increased the sleep time to 20 seconds, so I could easily attach with ‘gdb’. I’ve set a breakpont to the entry point of the JIT-ed function, and traced it line-by-line. The first part of the function executes SharedRuntime::OSR_migration_end, the next few dozens of instruction are responsible for filling the array, and the last part is responsible for evaluating the condition and the branching. I’ve noticed that during the last part the code the stack pointer is changed so at the end retq will read some garbage from the stack and the VM crashes while trying to execute code from an unmapped address. Just for comparison, here is the dump of the if-else code:

  0x00007fffed06c3f7: movsd  0x10(%rdi),%xmm0
  0x00007fffed06c3fc: ucomisd -0x17c(%rip),%xmm0  
  0x00007fffed06c404: jp     0x00007fffed06c408
  0x00007fffed06c406: je     0x00007fffed06c41f  ;*ifne
  0x00007fffed06c408: movl   $0x2,0x25c(%r10)   ;*getstatic arr
  0x00007fffed06c413: add    $0x20,%rsp
  0x00007fffed06c417: pop    %rbp
  0x00007fffed06c418: test   %eax,0xaf8bbe2(%rip) 
  0x00007fffed06c41e: retq   
  0x00007fffed06c41f: movl   $0x1,0x25c(%r10)   ;*putstatic val
  0x00007fffed06c42a: jmp    0x00007fffed06c413

Funny fact: the JITed code of the if-else is using fewer instructions than the code with ?: operator.

The main difference is in these lines:

  0x00007fffed05e4d6: pushfq 
  0x00007fffed05e4d7: and    $0xffffffffffffff2b,%rsp
  0x00007fffed05e4de: popfq  

We can guess that this lines intended to fixup the flags after the ucomisd instruction, after the result of the comparison is ‘unordered’ (at least one of the operands is NaN or Inf), which is indicated through the parity flag. But instead of operating on the flags register , the code operates on the stack pointer. The good code should be like this:

  0x00007fffed05e4d6: pushfq 
  0x00007fffed05e4d7: and    $0xffffffffffffff2b,(%rsp)
  0x00007fffed05e4de: popfq  

After hunting down the problem, I’ve tried to search for the bug, and found it in Oracle’s bugzilla, and the fix also in OpenJDK7. Grepping through the source it turned out, the following “code blobs” are affected:

instruct cmpF_cc_reg(rFlagsRegU cr, regF src1, regF src2)
instruct cmpF_cc_mem(rFlagsRegU cr, regF src1, memory src2)
instruct cmpD_cc_reg(rFlagsRegU cr, regD src1, regD src2)
instruct cmpD_cc_mem(rFlagsRegU cr, regD src1, memory src2)

In conclusion we can see, that the error is caused by wrongly JITed double or float comparison, the for loop was only needed to trigger the JIT compilation.

If we add ‘-Xcomp’ VM option to the command line, which causes a pre-init JIT compilation of all the functions, we can trigger the bug with the following simplified POC too:

package org.axt.cr;

public class Cr2 {
	@SuppressWarnings("unused")
	private static int val;
	
	public static void cr(double x) {
		val = (x == 0 ? 1: 2);
	}

	public static void main(String[] args) {
		cr(Double.NaN);
	}
}

Four questions left:

  • why we don’t need the cleanup code in the if-else case? is it normal, or is it a bug?
  • can we turn this into an infoleak?
  • can we control the stack to achieve code execution?
  • why Oracle didn’t fix this in the last 10 months?

I will try to investigate the first three questions, and hopefully you will see the results in a followup post.

Advertisements
This entry was posted in Bugs and tagged , , , , , , . Bookmark the permalink.

2 Responses to Debugging Java VM’s JITed code

  1. cecan basescu says:

    that’s cool bro!

  2. Lori says:

    Hey there, I think your blog might be having browser compatibility issues.
    When I look at your website in Firefox, it looks fine but when opening
    in Internet Explorer, it has some overlapping. I just wanted to give you
    a quick heads up! Other then that, awesome blog!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s