Lambda Notes 1 - Getting Started
(Last updated Sep 2, 2018)
Motivation
Here are my notes on what I learned about the implementation of InvokeDynamic (JSR 292) and Lambda in JDK 10, from the point of view of a HotSpot JVM engineer.
These notes assume that you already have a working knowledge with the HotSpot JVM, and have a fairly good understanding of the “classic” JVM specification (i.e. before JDK 7):
-
Before JDK 7, the life of a JVM engineer wasn’t too complicated. Yes, the JIT, GC and Verifier were not for the faint of heart, but in most cases you can treat them as black boxes. By reading the bytecode specification, you can pretty much explain to yourself how Java programs are executed.
-
However, after the
invokedynamic
bytecode was introduced in JDK 7, and Lambda expressions were introduced in JDK 8, it appears that only the priviledged and gifted can follow a listing of Java bytecodes like this, and tell you what it does, much less what makes it work, and yet much less how to make it work better.invokedynamic InvokeDynamic REF_invokeStatic:\ java/lang/invoke/LambdaMetafactory.metafactory:\ "(Ljava/lang/invoke/MethodHandles$Lookup;\ Ljava/lang/String;Ljava/lang/invoke/MethodType;\ Ljava/lang/invoke/MethodType;\ Ljava/lang/invoke/MethodHandle;\ Ljava/lang/invoke/MethodType;\ )Ljava/lang/invoke/CallSite;"\ :run:\ "()Ljava/lang/Runnable;" \ MethodType "()V", \ MethodHandle REF_invokeStatic:LambHello.lambda$main$0:"()V", \ MethodType "()V";
So here it is – a record of my journey to figure out how a Lambda expression is stored and executed by the HotSpot JVM, so that I can implement JDK-8198698 - a caching mechainsm to remove the initialization overhead of using Lambda expressions.
I hope these notes would be useful for other engineers who are trying to understand or improve the related areas in the HotSpot JVM.
Links
Here are some good pointers to get you started. Click on each link below. Some of them point to a specific part of the document, such as the invokedynamic bytecode specification in The Java® Virtual Machine Specification (SE 7 Edition), Chapter 6.
- java.lang.invoke package (MethodHandle, MethodType, CallSite, etc)
- Java Virtual Machine Spec
- Chapter 4 - The class File Format
- Chapter 5 - Loading, Linking, and Initializing
- Chapter 6 - The Java Virtual Machine Instruction Set
- JEP 109: Enhance Core Libraries with Lambda (JSR 335)
- JEP 126: Lambda expressions and virtual extension methods (JSR 335)
- JEP 160: Lambda-Form Representation for Method Handles
- Brian Goetz’s notes on translating Lambda expressions into Java bytecodes
Setup and Tools
JDK 10
In these pages, I assume that you’re running on 64-bit Linux and have already build the JDK yourself. All examples below are using JDK 10 (http://hg.openjdk.java.net/jdk/jdk10/)
If you’re new to building JDK 10, see this link.
asmtools
asmtools has a better Java disassembler than javap
. You can download
it from
here. E.g.,
asmtools-6.0.tar.gz.
export ASMTOOLS_JAR=${HOME}/asmtools-6.0/lib/asmtools.jar
alias jdis='java -jar ${ASMTOOLS_JAR} jdis'
alias jasm='java -jar ${ASMTOOLS_JAR} jasm'
alias jdec='java -jar ${ASMTOOLS_JAR} jdec'
hsdis
hsdis is a tool for disassembling the code generated by the JVM’s JIT compiler. You won’t need to use hsdis now, but we will use it in subsequent blog posts. See this blog post for details.
Hello Lambda
Here’s a Java Lambda expression example.
It passes a block of code, which prints a message, as a parameter to the doit
method.
(If you’re new to Lambda, here’s a very good intro video.)
public class LambHello {
public static void main(String[] args) {
doit(() -> {
System.out.println("Hello from Lambda");
});
}
static void doit(Runnable t) {
t.run();
}
}
$ javac -g LambHello.java
$ java -cp . LambHello
Hello from Lambda
Compiling Lambda Expressions using the InvokeDynamic Bytecode
You might have thought, “Oh, we can
translate the above Lambda expression into an inner
class
that captures the code block { System.err.println("Hello from Lambda"); }
”.
And you are correct, as that’s how
Retrolambda does it.
However, this document explains why it isn’t done that way in JDK
8 and up.
Instead, for various reasons, we have a much more complex design that
relies on invokedynamic
.
$ javap -c LambHello.class
public class LambHello {
public LambHello();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: invokedynamic #2, 0 // InvokeDynamic #0:run:()Ljava/lang/Runnable;
^^^^^ HERE ^^^^^
5: invokestatic #3 // Method doit:(Ljava/lang/Runnable;)V
8: return
static void doit(java.lang.Runnable);
Code:
0: aload_0
1: invokeinterface #4, 1 // InterfaceMethod java/lang/Runnable.run:()V
6: return
}
But the above output from javap
is a simplified form, which doesn’t
really tell you what’s happening. Here’s the real thing, dumped using
jdis
(see notes on asmtools above):
# Long lines have been broken up to improve readability
$ jdis LambHello.class
super public class LambHello
version 54:0
{
public Method "<init>":"()V"
stack 1 locals 1
{
aload_0;
invokespecial Method java/lang/Object."<init>":"()V";
return;
}
public static Method main:"([Ljava/lang/String;)V"
stack 1 locals 1
{
invokedynamic InvokeDynamic REF_invokeStatic:\
java/lang/invoke/LambdaMetafactory.metafactory:\
"(Ljava/lang/invoke/MethodHandles$Lookup;\
Ljava/lang/String;\
Ljava/lang/invoke/MethodType;\
Ljava/lang/invoke/MethodType;\
Ljava/lang/invoke/MethodHandle;\
Ljava/lang/invoke/MethodType;\
)Ljava/lang/invoke/CallSite;"\
:run:\
"()Ljava/lang/Runnable;" \
MethodType "()V", \
MethodHandle REF_invokeStatic:LambHello.lambda$main$0:"()V", \
MethodType "()V";
invokestatic Method doit:"(Ljava/lang/Runnable;)V";
return;
}
static Method doit:"(Ljava/lang/Runnable;)V"
stack 1 locals 1
{
aload_0;
invokeinterface InterfaceMethod java/lang/Runnable.run:"()V", 1;
return;
}
private static synthetic Method lambda$main$0:"()V"
stack 2 locals 0
{
getstatic Field java/lang/System.out:"Ljava/io/PrintStream;";
ldc String "Hello from Lambda";
invokevirtual Method java/io/PrintStream.println:"(Ljava/lang/String;)V";
return;
}
public static final InnerClass Lookup= \
class java/lang/invoke/MethodHandles$Lookup of \
class java/lang/invoke/MethodHandles;
}
Compiling Non-capturing Lambdas
So what’s happening inside the main
method?
First of all, the JVM doesn’t really treat “function” as a first class citizen.
Although our intention is: “let’s pass a function to the doit
method”, the actual implementation is:
“put our function into a class that implements the Runnable
interface, and pass an instance of this class to doit
.”
We can get a better idea with this test program:
public class LambHello2 {
static Runnable last = null;
public static void main(String[] args) {
for (int i=0; i<2; i++) {
System.out.println("Loop: " + i);
doit(() -> {
System.out.println("Hello from Lambda");
});
}
}
static void doit(Runnable t) {
String s = (t == last) ? "reused" : "new ";
System.out.println("Got a " + s + " instance\n " +
t + " of\n " + t.getClass());
t.run();
last = t;
}
}
$ java -cp . LambHello2
Loop: 0
Got a new instance
LambHello2$$Lambda$1/232824863@17d99928 of
class LambHello2$$Lambda$1/232824863
Hello from Lambda
Loop: 1
Got a reused instance
LambHello2$$Lambda$1/232824863@17d99928 of
class LambHello2$$Lambda$1/232824863
Hello from Lambda
Here, you can see that doit
’s parameter t
is LambHello2$$Lambda$1/232824863@17d99928
, which is
an instance of the LambHello2$$Lambda$1/232824863
class.
Our Lambda expression is “non capturing” – i.e., it doesn’t refer to any local variables in its context. The JVM is smart enough to reuse the same instance across different invocations. This is an advantage over comparable code that uses anonymous inner classes, which would create an instance for every invocation.
doit(new Runnable() {
public void run() {
System.out.println("Hello from inner class");
}});
Compiling Capturing Lambdas
Here’s a Lambda that captures the local variable x
:
public class LambHello3 {
static Runnable last = null;
static Class lastCls = null;
public static void main(String[] args) {
for (int i=0; i<2; i++) {
System.out.println("Loop: " + i);
final int x = i;
doit(() -> {
System.out.println("Hello from Lambda: " + x);
});
}
}
static void doit(Runnable t) {
String s1 = (t == last) ? "reused" : "new";
String s2 = (t.getClass() == lastCls) ? "reused" : "new";
System.out.println("Got a " + s1 + " instance\n " +
t + " of a " + s2 + " class\n " + t.getClass());
t.run();
last = t;
lastCls = t.getClass();
}
}
$ java -cp . LambHello3
Loop: 0
Got a new instance
LambHello3$$Lambda$1/611437735@1ae369b7 of a new class
class LambHello3$$Lambda$1/611437735
Hello from Lambda: 0
Loop: 1
Got a new instance
LambHello3$$Lambda$1/611437735@6fffcba5 of a reused class
class LambHello3$$Lambda$1/611437735
Hello from Lambda: 1
Here, a difference instance of the same LambHello3$$Lambda$1/611437735
class is
created for each invocation of doit
.
Using Lambdas without Calling a Method
Most “howto” Lambda examples are like what I presented above – putting a Lambda expression as a parameter to a method. However, this doens’t have to be the case. You can create a Lambda without calling a method.
public class LambHello4 {
static Object x;
public static void main(String[] args) {
final Runnable a = () -> {};
final Runnable b = () -> { x = a;};
}
}
$ jdis LambHello4.class
super public class LambHello4
version 54:0
{
static Field x:"Ljava/lang/Object;";
public Method "<init>":"()V"
stack 1 locals 1
{
aload_0;
invokespecial Method java/lang/Object."<init>":"()V";
return;
}
public static Method main:"([Ljava/lang/String;)V"
stack 1 locals 3
{
/* final Runnable a = () -> {}; */
invokedynamic InvokeDynamic REF_invokeStatic:\
java/lang/invoke/LambdaMetafactory.metafactory:\
"(Ljava/lang/invoke/MethodHandles$Lookup;\
Ljava/lang/String;\
Ljava/lang/invoke/MethodType;\
Ljava/lang/invoke/MethodType;\
Ljava/lang/invoke/MethodHandle;\
Ljava/lang/invoke/MethodType;\
)Ljava/lang/invoke/CallSite;"\
:run:"()Ljava/lang/Runnable;" \
MethodType "()V", \
MethodHandle REF_invokeStatic:LambHello4.lambda$main$0:\
"()V",\
MethodType "()V";
astore_1;
/* final Runnable b = () -> { x = a;}; */
aload_1;
invokedynamic InvokeDynamic REF_invokeStatic:\
java/lang/invoke/LambdaMetafactory.metafactory:\
"(Ljava/lang/invoke/MethodHandles$Lookup;\
Ljava/lang/String;\
Ljava/lang/invoke/MethodType;\
Ljava/lang/invoke/MethodType;\
Ljava/lang/invoke/MethodHandle;\
Ljava/lang/invoke/MethodType;\
)Ljava/lang/invoke/CallSite;"\
:run:"(Ljava/lang/Runnable;)Ljava/lang/Runnable;"\
MethodType "()V",\
MethodHandle REF_invokeStatic:LambHello4.lambda$main$1:\
"(Ljava/lang/Runnable;)V",\
MethodType "()V";
astore_2;
return;
}
private static synthetic Method lambda$main$1:"(Ljava/lang/Runnable;)V"
stack 1 locals 1
{
aload_0;
putstatic Field x:"Ljava/lang/Object;";
return;
}
private static synthetic Method lambda$main$0:"()V"
stack 0 locals 0
{
return;
}
public static final InnerClass Lookup=\
class java/lang/invoke/MethodHandles$Lookup of \
class java/lang/invoke/MethodHandles;
}
From the above experiments, we can see that for each (params...) -> {code...}
expression in the source code:
- Each
->
operator is translated into oneinvokedynamic
bytecode, whose behavior is defined by theInvokeDynamic
constant pool entry. - Any captured local variables are pushed onto the stack before the
invokedynamic
is executed. In our example, theaload_1
bytecode pushes the captured variable,a
, onto the stack. - The code in the Lambda is compiled into a private method, such as
Lambda4.lambda$main$1
- The first time the
invokedynamic
bytecode is executed, a class is dynamically generated (see more below). This class implements the specified interface. - When the
invokedynamic
bytecode finishes execution, it returns an instance of this generated class.- For non-capturing Lambdas, always the same instance is returned.
- For capturing Lambdas, each execution of the
invokedynamic
bytecode returns a different instance.
So why do we need different instances for capturing Lambdas?
In the LambHello4.java
example, the code inside the Lambda expressions aren’t actually executed – we
just created a
and b
, but we have not executed a.run()
or b.run()
.
We can see that the invokedynamic
bytecode does not execute the
code in the Lambda expression. Rather, it just prepares the code to
be executed in the future. In particular, the captured variable a
for the second Lambda is stored as a private field inside b
. For example, a
program like this:
final Runnable b = () -> { x = a;};
b.run();
is conceptually executed like this:
class Dummy implements Runnable {
Runnable tmp;
void run() {
Lambda4.lambda$main$1(tmp);
}
}
b = new Dummy;
b.tmp = a;
b.run();
Dynamically Generated Classes
The following example shows how the class LambHello5$$Lambda$1
(which is generated by the LambdaMetafactory
) connects the t.run()
call to the LambHello5.lambda$main$0
method:
public class LambHello5 {
public static void main(String[] args) {
doit(() -> {
Thread.dumpStack();
});
}
static void doit(Runnable t) {
t.run();
}
}
$java -cp . -XX:+UnlockDiagnosticVMOptions -XX:+ShowHiddenFrames LambHello5
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1435)
at LambHello5.lambda$main$0(LambHello5.java:4)
at LambHello5$$Lambda$1/874088044.run(<Unknown>:1000000)
at LambHello5.doit(LambHello5.java:8)
at LambHello5.main(LambHello5.java:3)
You can tell the JVM to save the dynamically generated classes by using the
-Djdk.internal.lambda.dumpProxyClasses
command-option switch:
$ mkdir -p DUMP_CLASS_FILES
$ java -Djdk.internal.lambda.dumpProxyClasses=DUMP_CLASS_FILES \
-XX:+UnlockDiagnosticVMOptions -XX:+ShowHiddenFrames \
-cp ~/tmp LambHello5
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1435)
at LambHello5.lambda$main$0(LambHello5.java:4)
at LambHello5$$Lambda$1/33524623.run(<Unknown>:1000000)
at LambHello5.doit(LambHello5.java:8)
at LambHello5.main(LambHello5.java:3)
$ find DUMP_CLASS_FILES/ -type f
DUMP_CLASS_FILES/LambHello5$$Lambda$1.class
$ jdis 'DUMP_CLASS_FILES/LambHello5$$Lambda$1.class'
super final synthetic class LambHello5$$Lambda$1
implements java/lang/Runnable
version 52:0
{
private Method "<init>":"()V"
stack 1 locals 1
{
aload_0;
invokespecial Method java/lang/Object."<init>":"()V";
return;
}
@+java/lang/invoke/LambdaForm$Hidden { }
public Method run:"()V"
stack 0 locals 1
{
invokestatic Method LambHello5.lambda$main$0:"()V";
return;
}
} // end Class LambHello5$$Lambda$1
Note that in the stack trace the generated class is printed as
LambHello5$$Lambda$1/
followed by an integer. This indicates that
the class is a JVM anonymous
class (not to be confused by Anonymous Inner Class at the Java language level :-( ). E.g., see this code in
instanceKlass.cpp that prints out this special form of class name.
Summary
So by now, you should have a pretty good idea of how the Lambda expressions
are translated to bytecodes. Of course, you still wonder, “what the heck is
in that big lump of InvokeDynamic
constant”? We’ll try to figure that out
in subsequent blog posts.