Have you ever thought about how Java compilation works and what JVM is doing under the hood. Why HotSpot is even called HotSpot or what is the Tiered Compilation and how it relates to Java?
Answering such questions will be the main focus of today’s article. I will begin this by explaining a few things about compilation itself and the theory behind it.
Types of Compilation
In general, we can differentiate two basic ways of translating human readable code to instructions that can be understood by our computers:
- Static (native, AOT) Compilation After code is written, a compiler will take it and produce a binary executable file. This file will contain a set of machine code instructions targeted for particular CPU architecture.
Of course the same binary should be able to run on CPUs with similar set of instructions but in more complex cases your binary may fail to run and may require recompiling to meet server requirements.
We lose the ability to run on multiple platforms for the benefit of faster execution on a dedicated platform.
- Interpretation Already existing source code will be run and turned into binary code line by line by the interpreter while the exact line is being executed. Thanks to this feature, the application may run on every CPU that has the correct interpreter.
On the other hand, it will make the execution slower than in the case of statically compiled languages. We benefit from the ability to run on multiple platforms but lose on execution time.
As you can see both types have their advantages and disadvantages and are dedicated to specific use cases and will probably fail if not used in the correct case. You may ask — if there are only two ways does it mean that Java is an interpreted or a statically compiled language?
JIT Compilation To The Rescue
We may have two basic ways of translating the code, but we as humans always want to improve the things we use. That is why we created a thing called JIT. It stands for just-in-time compilation and is an attempt to combine the pros of static compilation and interpretation.
In most cases, such compiler creates some sort of an intermediate level code — in the case of Java, it is known as bytecode — which is further read and translated by the specific interpreter — for Java it is the JVM.
Thanks to just-in-time compilation we still have the ability to run the application on multiple platforms, as long as they have the correct interpreter and performance overhead is far lower than for a standard interpreted code.
How Many Compilers Do We Have in Java?
A quick answer to this one — we have two compilers in Java. The most commonly used names for them are C1 and C2 so I will use these names throughout this article.
For a start, a short story of Java — a long long time ago (in times before Java 8) JDK was available in two versions: client and server, each of them having its own compiler. C1 is the name of the client compiler while C2 is the server compiler’s name.
Moreover, we had to specify which one exactly we wanted to use while compiling our code. As for now this distinction is not as important as it was but nevertheless it is good to know the difference between these two compilers.
The main one is the moment when they start compiling code and effecting code performance:
- C1 starts compilation sooner than C2 and does not try to perform many costly performance optimizations. At the beginning of program execution C1 compiler will be faster because it will compile more code in the same time as C2.
- C2 starts compiling later than C1 but it collects plenty of useful information while waiting. Thanks to all this information it can perform complex optimizations. In the end, C2 compiled code will be much faster than the one compiled by C1 — the performance can even compete with compiled C++ code.
In the old days if you were interested in the optimization time of application startup C1 compiler was the best option but if you preferred a long-running application with strict performance restriction, C2 was the choice for you.
Additionally, it is widely believed that at this point because of various reasons no more major enhancements for C2 compiler are possible.
The next logical questions are:
- Does this differentiation even have to exist?
- Can we maybe optimize it even further and use both compilers at the same time?
It turns out that Java creators asked themselves very similar questions (at least I suppose so).
Therefore, since Java 8 this differentiation has been removed and Tiered Compilation, introduced in Java 7, became the basic technique of compiling code by the JVM. In Java 7 it was not enabled by default and you could turn it on with the -XX:+TieredCompialtion flag.
However, in Java 8 it still can be disabled by setting the flag from the previous line to false.
What Is Tiered Compilation?
It is a technique which combines both C1 and C2 compilers together. JVM will start with C1 as default compiler and then use C2 to compile code when it gets hotter.
There are five tiers of Tiered Compilation in Java:
0 — Interpreted code (bytecode after javac command)
1 — Simple C1 code
2 — Limited C1 code
3 — Full C1 code
4 — C2 compiled code
From the performance point of view the most profitable transition is 0 → 3 → 4. Our code gains the most in performance while spending as few CPU cycles as it is possible. In fact, it is also the most common case for methods to be compiled to level 3 after the first C1 compilation. Transitions between 1 ↔ 2 ↔ 3 and other states are more complex and will not be mentioned in this article.
Tiered Compilation Pros
Firstly, Tiered Compilation will optimize startup time to levels above using C1 alone because the final code produced by the C2 may be already available during the early stages of application initialization.
Moreover, C1 will generate complicated versions of profiling methods and because compiled code is faster than the interpreted one, C2 compiler will be able to get more information during the profiling phase.
Due to this small feature in Tiered Compilation, we may achieve better peak performance than by using a regular C2 compiler alone. The more information C2 has, the more complex optimization it is able to do.
When JVM Will Use C2?
JVM uses two counters to determine if the method is “worthy” of C2 compilation. Before each execution of a method, JVM will check these counters and decide if this method is worth compiling. T
he first counter is a simple method call counter while the second one is responsible for storing a number which represents how many times each loop within a method was executed. Here concept know as Back Branching start to show up — loop is said to branch back if it reaches the end of itself or executes statement like continue.
If a particular loop within a method exceeds a defined threshold it is marked as “worth” of compilation. It is important to note that the only loop within the method will be compiled, not the entire method.
To be clear if Tiered Complication is enabled then compilation thresholds, max values of counters from the previous paragraph, are counted dynamically and cannot be changed. If you want to change it you first need to disable Tried Compilation and then set -XX flag.
Tiered Compilation JVM Settings
There are more flags connected to thresholds and Tiered Compilation but they will not be mentioned here as I believe them to be too complex. Just remember that in most cases such JVM tuning will not bring you any performance benefits and can mostly result in a performance decrease.
Furthermore, keep in mind that not whole bytecode will be eventually compiled by C2, because in very specific situations JVM will reduce the value of both counters.
In short JVM actually measures only the recent method “hotness” not overall “hotness” so even if our application runs forever, not the whole code base will be compiled by C2.
Why Is HotSpot Even Called HotSpot?
This name is connected to the way of compiling the code by the JVM. In general, every application has fragments of code that are executed with very high frequency.
These fragments play the biggest part in overall application performance. Such places are called “hotspots” — the more frequently a particular fragment is executed, the “hotter” it gets from JVM perspective.
Essentially, not every piece of code from our bytecode is going to be compiled. For mostly performance-connected reasons, in case of sections with small call frequency, or with only one call, it may be more efficient to simply interpret and run the bytecode directly.
On the other hand, if a particular section is called frequently then its compilation is worth spending CPU cycles. An additional plus of frequently called methods is that the JVM is able to get more information about them. Based on this information it is able to make more complex optimizations.
GraalVM Revolution
Now it is time to go to more recent times. In this and the next paragraphs, I will describe some ideas behind the new (the first production-ready release took place in May 2019) VM, which is based on an already existing JVM.
First of all, it is named GraalVM and brings a few interesting new features to the Java ecosystem. GraalVM comes in two versions — Enterprise and Community. Both include support for Java 11 and 17 (current LTS). The Enterprise version is based on Oracle JDK while Community is based on OpenJDK.
Below I have listed (with some more insight) three GraalVM features which, I my opinion, are the most notable ones:
- Polyglot VMIt can provide runtime for applications written in many languages so you are able to run Python applications on the same VM as Ruby and Java.
- Native ImageWith Graal, we are able to compile our jar to native platform executable. What can greatly reduce memory footprint and startup time — crucial parameters in cloud based world.
- New Implementation of C2 CompilerIt should provide a notable performance increase, especially in case of new features from the latest Java releases.
Native Image In Detail
With Native Image technology, we are able to compile our jars to the native platform executable. Such executable files are called native images and contain all code necessary for the application to run.
Moreover, they include all necessary components like memory management, thread scheduling, and so on from different runtime components.
Furthermore, AOT compilation greatly reduces the memory footprint and startup time. It can be a great advantage if you are using cloud or prefer microservice architecture.
On the other hand, code optimization is not as good as the one done by C2 compiler so we can expect a decrease in terms of performance.
Graal New Compiler
The new implementation is written in Java as opposed to the previous one in C++. Graal was able to achieve performance increase by utilizing the new and more aggressive/complex compiler optimization. Quoting from Graal page — “the compiler in GraalVM Enterprise includes 62 optimizations”.
Additionally, Graal compiler is able to remove costly object allocation so applications which run on GraalVM need to spend less time on memory management and garbage collection.
Summing Up
That is all for today, I hope that I gave you some better understanding of what is going on inside your JVMs and how you can turn it into our advantage.
Thank you for your time.
Comments are closed.