The Evolution of .NET 8 Performance: Tiering and Dynamic PGO

Based on the post that Stephen Toub made (Performance Improvements in .NET 8), I wanted to highlight an important part that will make .NET 8 a very interesting release! First, let’s recap some things. What is the JIT? When you write a .NET application, the code you author in languages like C# or F# isn’t…

Evangelos Boltsis

November 4, 2023

7 minutes

.NET, c#, JIT, performance, PGO

What is the JIT?

When you write a .NET application, the code you author in languages like C# or F# isn’t directly executed by the machine. Instead, it undergoes a transformation into what we call “Intermediate Language” (IL), which is a CPU-agnostic instruction set. This IL code is then compiled into native code that the CPU can execute. This final stage of compilation happens through a Just-In-Time (JIT) compiler, which is a fundamental component of the .NET runtime environment.

What Does the JIT Do?

The JIT compiler’s role is to convert the IL code of your .NET application into native machine code. Unlike traditional compilers that perform this task before an application is ever run (known as Ahead-Of-Time, or AOT, compilation), the JIT compiles code at the point of execution, i.e., while the application is running. This strategy brings several advantages:

Platform-Specific Optimization: The JIT compiler tailors the native machine code to the specific capabilities of the CPU running the application, optimizing for speed and efficiency.
Memory Efficiency: Because only the parts of the application being used are compiled, memory usage can be kept to a minimum, as unused code never occupies space in the working memory.
Adaptive Performance: The JIT can observe the behavior of the application at runtime and optimize the machine code based on actual usage patterns, a feature known as Profile-Guided Optimization (PGO).

What is Profile-Guided Optimization (PGO)?

Profile-Guided Optimization is a sophisticated technique used by compilers to improve the performance of applications. It involves collecting data about the application’s runtime behavior, identifying hot paths (frequently executed paths), and optimizing those paths for better performance during subsequent executions.

Here’s how it works in a typical PGO process:

An application is initially compiled with instrumentation added to gather execution data.
The instrumented application is run, and it collects data on various aspects of execution, such as which methods are called most often.
This profile data is then fed back into the compiler.
The compiler recompiles the application, this time using the profile data to optimize the execution paths that are used most frequently.

Dynamic PGO: Taking Optimization to the Next Level

Dynamic PGO is the next evolution of PGO, and it’s particularly exciting in the .NET 8 environment. While traditional PGO relies on static profile data collected from prior runs of an application, Dynamic PGO can collect and apply profile data on-the-fly, in real-time as the application executes.

This has several powerful implications:

No Separate Compilation Steps: Unlike traditional PGO, which requires a separate profile run, Dynamic PGO integrates profiling into the regular execution of the application, eliminating the need for separate compilation steps.
Continuous Optimization: Dynamic PGO can adjust optimizations as the application’s execution patterns change, making it more adaptable to the way the application is actually used by end-users.
Up-to-Date Performance Tuning: Since the profile data is always current, the optimizations are based on the most recent usage patterns, which can significantly improve performance, especially in applications where usage patterns change over time.

Why Dynamic PGO Matters for .NET Developers

Dynamic PGO is particularly impactful for .NET applications, which often run in diverse environments and can benefit from optimizations that adapt to their current execution context. With Dynamic PGO, .NET developers can expect their applications to perform better without the overhead of managing the profiling and recompilation process manually.

It’s like having a co-pilot for performance who’s always tuning the application to ensure it’s running at its best. As .NET continues to evolve, features like Dynamic PGO keep it at the cutting edge of high-performance, enterprise-grade application development.

Tiered Compilation

When you run a .NET application, the CLR compiles the intermediate language (IL) code into native machine code. This process can be done either ahead of time (AOT) or just-in-time (JIT). AOT compilation happens before the application is run, while JIT compilation happens on the fly as the application is executed.

Tiered compilation combines these approaches to provide both fast startup times and efficient execution. It introduces the concept of “tiers” of compilation, each with different optimization levels:

Tier 0: This is a “quick JIT” tier that compiles the code very fast with minimal optimizations. It’s designed to get the application up and running as quickly as possible.
Tier 1: After the application is running, the CLR monitors which methods are frequently used. These “hot” methods are then recompiled in the background with more aggressive optimizations to improve their execution speed. This tier represents the “optimized JIT” compilation.

How Tiering Works in Practice

When an application starts, all methods are compiled with Tier 0 compilation by default, which prioritizes speed over efficiency. As the application runs, the runtime gathers information about which methods are executed most often. Once a method is identified as “hot,” the runtime schedules it for recompilation at Tier 1. This recompiled code replaces the original Tier 0 code, and because it’s more highly optimized, it runs faster.

The beauty of tiered compilation lies in its adaptability:

Quick Response: For methods that are called only a few times, the Tier 0 compilation is sufficient and avoids the overhead of more time-consuming optimizations.
Adaptive Performance: For methods that are part of the application’s critical path, the Tier 1 compilation ensures that they are executed as efficiently as possible.

Benefits of Tiered Compilation

Improved Startup Performance: Since the initial compilation is faster, applications can start quicker, which is particularly important in cloud environments where scalability and responsiveness are key.
Optimized Execution: Long-running applications benefit from optimized code paths, resulting in better performance over time.
Resource Efficiency: By only optimizing “hot” methods, tiered compilation saves CPU time and memory that would otherwise be spent optimizing rarely executed code.

In summary, tiered compilation in .NET is an intelligent and dynamic way to balance the trade-off between the need for quick application startup and the desire for high-performance code execution. It provides a just-in-time approach that adapts to the application’s actual workload, optimizing the performance over the lifecycle of the application.

Improvements in .NET 8:

Previously, .NET methods were compiled once upon their first invocation. .NET Core 3.0 introduced tiered compilation where a method could be compiled multiple times, initially quickly with minimal optimization, and then potentially again with full optimizations if it’s frequently used. This approach aimed to balance the trade-off between code optimization and compilation time, improving both startup time and throughput.
Dynamic PGO, which was introduced in .NET 6 and improved in .NET 7, is now enabled by default in .NET 8, reflecting its maturity and benefits.
Additional tiers have been added to improve the process further, especially for handling more complex scenarios.
Changes have also been made to improve tier 0 optimizations, allowing for faster compilation and better initial code quality. This includes constant folding and handling JIT intrinsics earlier in the compilation process.

Instrumentation and Efficiency:

The instrumentation of code to support dynamic PGO in tier 0 faced challenges, especially regarding how to count branch usage in a thread-safe and efficient manner.
Initially, non-synchronized updates were used, which provided an approximate count. However, to improve accuracy without introducing bottlenecks, .NET 8 uses a scalable approximate counter with some randomness in how it synchronizes and updates counts.

Overall, these improvements in .NET 8 are focused on making the JIT compilation smarter and more efficient, with tiered compilation and dynamic PGO leading to better runtime performance for .NET applications.

Coding Bolt