What are Virtual Threads and Why we need them?

What are Virtual threads 

Before we even look into definition of Virtual threads, lets clarify some terminology which will help us understanding Virtual threads better.



1. Platform Thread

The term 'Platform Thread' is used in two contexts:

1.1 Traditional Java Thread

- For non Virtual Thread, the thread which we had/have before Virtual thread, the traditional OS(Operating System) backed Java thread and scheduled by OS. Platform thread run Java code on its underlying OS thread and uses OS thread for  its entire life cycle which means the OS thread can not be used by any other request/task because of which the number of available platform threads is limited to the number of available OS threads. Note that these traditional Platform threads are not Carrier threads, as their purpose is not to run Virtual threads.

They are instance of java.lang.Thread

1.2 Carrier Thread

They are special Platform threads created and managed by JVM to carry or run Virtual threads. 

To do its task, a thread needs to be scheduled or assigned for execution onto the CPU core processor. Traditional Java threads are scheduled by OS scheduler to the processor. In contrast, Virtual threads are not scheduled by OS to processor but by JVM and to the Carrier threads which are platform threads. 

If there is blocking call from within Virtual thread then Virtual thread is unmounted from the Carrier thread and stack of the Virtual thread is copied to the heap which makes Carrier thread free to run another Virtual thread.

Once the blocking call is finished, Virtual thread stack is brought back from heap and it is mounted on Carrier thread and this can be same Carrier thread from which it was unmounted Or a new Carrier thread, so Virtual thread is not tied to a single Carrier thread for its whole lifecycle.

Platform threads(Traditional or Carrier) are relatively heavy weight compared to Virtual thread. A stack for Platform thread takes around 1 MB of memory.

They are also instance of java.lang.Thread

Now let us understand what Virtual threads are.

2. Virtual Thread

They are relatively light weight threads with size of around 2-10 KB(can grow dynamically) and are managed entirely by the JVM rather than the operating system. They are not directly mapped 1:1 to OS threads. Instead, the JVM schedules many virtual threads onto a smaller pool of Platform threads called Carrier threads.  As mentioned before, If there is blocking call from within Virtual thread then Virtual thread is unmounted from the Carrier thread and stack of the Virtual thread is copied to the heap which makes Platform/Carrier/OS thread free to run another Virtual thread and hence allowing multiple Virtual Threads to share a ew Carrier/Platform/OS threads.

Once the blocking call is finished, Virtual thread stack is brought back from heap and it is mounted on Carrier thread and this can be same Carrier thread from which it was unmounted Or a new Carrier thread, so Virtual thread is not tied to a single Carrier thread for its whole lifecycle.

We will look into scalability in some time soon in one of next few paragraphs.

Remember that all Carrier threads are Platform threads, but not all Platform threads are Carrier threads.

And they are also instance of java.lang.Thread but following new class hierarchy has been added in JDK or VirtualThread

BaseVirtualThread extends Thread 

VirtualThread extends BaseVirtualThread

How to create Platform Threads V/s Virtual Threads V/s Carrier Threads

Platform Threads:

Creating a Thread with java.lang.Thread class or using ExecutorService give us platform thread, each of which is backed 1:1 by an OS thread. Actually the term ‘Platform thread’ was introduced in JDK 19 when Virtual Thread was added as a preview feature, to distinguish these threads from newly introduced Virtual threads, but as mentioned before that these Platform threads are not the Carrier threads.

So using old Thread API, all following returns Platform thread which is mapped 1:1 to OS thread:

1) Using java.lang.Thread class to create Thread

1.1 ) 

        Thread t = new Thread(() -> {

            System.out.println("Running on: " + Thread.currentThread().getName());

        });

       t.start();

1.2 Extending Thread class

     public class MyThread extends Thread {

        @Override

         public void run() { 

            System.out.println("Thread t1");    

         } 

     }

  public static void main(String[] args) {

        MyThread t1 = new MyThread(); 

        t1.start();                  

        System.out.println("Main thread finished");

}

2) Using ExecutorService

As creating Platform threads is expensive, we can not just create thousands of them, so to get around that problem, java has Thread Pools where in we can have defined pool of Threads which can be reused for multiple tasks.

2.1 )

ExecutorService executorService = Executors.newFixedThreadPool(8);

executorService.submit(() -> System.out.println("Hi"));

Threads are created lazily by thread pool on demand which means when a task is submitted then a thread is created to execute this task. When first task is submitted, first thread is created. If next task is submitted after first task is completed, same existing thread will be reused. If 2nd task is submitted before first task is completed, a new thread will be created to execute 2nd task.

So if 8 tasks are submitted concurrently, 8 threads will be created.

If 9 threads are submitted concurrently, still 8 threads will be created to execute 8 tasks as the max and core pool size defined via the Executors.newFixedThreadPool(8) is 8 and 9th tasks will be put in the queue(a blocking queue is used by fixed thread pool). Once one of the thread executing 8 tasks is freed, it will pick the 9th task.

But what will happen if multiple executing/running threads become idle at the same time, then who will pick the 9th task from the queue ?

And this is the reason fixed thread pool uses blocking queue. If it is not a blocking queue, threads will be in race condition. All idle threads might read the queue as not empty and will try to execute the task which can result in tasks being executed multiple times, exception or corrupted state. With blocking queue, while one thread is reading the task from queue, it will block the queue for further read.

Fixed thread pool is ideal to use for:

-  Long running or blocking tasks (I/O tasks like reading/writing to database, files, Http call(e.g Rest API).

-  For bounded concurrency, as we don't want to overwhelm CPU or memory. At any time, max n threads will be be in the pool, so 8 in above example.

2.2)

ExecutorService executorService = Executors.newSingleThreadExecutor();

executorService.submit(() -> System.out.println("Hi"));

Here also, thread is created lazily by thread pool on demand when task is submitted. However only one thread is created and is ever there. When more than one task(s) are submitted then remaining tasks are put in the blocking queue. Once the thread has completed the currently executing task, it picks the task from queue from front(FIFO), so the tasks are processed sequentially in the order they are submitted.

Single thread pool is ideal to use for:

- Sequential execution of tasks

- Decoupling submission from execution

2.3)

ExecutorService executorService = Executors.newCachedThreadPool();

executorService.submit(() -> System.out.println("Hi"));

Threads are created on demand as tasks are submitted. Uses blocking synchronous queue in which each insert operation must wait for a corresponding remove operation by another thread.

When 1st task is submitted, a new thread is created for executing the task. When 2nd task is submitted, if the first thread has become idle after executing first task, it executes the 2nd task else a new thread is created.

For n tasks submitted concurrently, n threads are created. As these are platform threads and are mapped to operating system threads and we have limited operating system threads, we can not create infinite number of threads via cachedThreadPool, so threads which are idle for more than 60 secs are terminated.

No definite core pool size needs to be maintained and maximum pool size theoretically can be as many as Integer.MA_VALUE.

Cached thread pool is ideal to use for:

- Short lived asynchronous tasks.

- Tasks coming intermittently or in burst.

2.4)

ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(8);

scheduledExecutorService = scheduledExecutorService.schedule(() -> System.out.println("Hi"), 1000,  TimeUnit.MILLISECONDS);

Threads in a ScheduledThreadPool are created lazily on demand, similar to a fixed thread pool. When the first scheduled task is scheduled, task is added to the delay queue and the 1st thread is created to execute it. Task is executed after give delay period. If the 2nd task is submitted after the 1st task completes, the same existing thread is reused. If a 2nd task is scheduled while the 1st is still running, a new thread is created (up to the core pool size) to execute it. A core pool size of 8 threads is maintained in the pool, even if they are idle.

3) Using CompletableFuture

CompletableFuture.supplyAsync(dbCall(), executor)

4) Using new Thread.Builder API

4.1

 Thread.ofPlatform().start(() -> System.out.println("Platform Thread"));

4.2

ThreadFactory factory = Thread.ofPlatform().factory();

Thread t = factory.newThread(() -> System.out.println("task"));

t.start();


Virtual Threads:

And here is how we can create Virtual threads :

1. Using Thread.Builder API

1.1 

Thread.ofVirtual()

              .name("virtual thread-", 0)

              .start(() -> System.out.println("Task"));

1.2

Thread.ofVirtual()

              .factory()

              .newThread(() -> System.out.println("Task"))

              .start();

1.3

Thread.startVirtualThread(() ->

                                      System.out.println("Creating and starting Virtual thread"));

2. Using ExecutorService

2.1

ThreadFactory factory = Thread.ofVirtual().factory();

ExecutorService executor = Executors.newThreadPerTaskExecutor(factory);

2.2

try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {

            executor.submit(() -> System.out.println("task"));

 }

3. Using Structured Concurrency

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {

            Subtask<String> subtask1 = scope.fork(() -> callServiceA());

            Subtask<String> subtask2 = scope.fork(() -> callServiceB());

            scope.join();

            return subtask1.get() + subtask2.get();

 }

How Async API like CompletableFuture helps with scaling and issues with it

CompletableFuture frees the request thread immediately and runs the task asynchronously. It takes a thread from its default ForkJoin commonpool(if we don't assign our Executor otherwise thread from assigned thread pool is provided) and runs task on it.

We can chain multiple tasks using then..() methods.

For Example:

Say we have following three bocking operations to perform from our service method with mentioned time they take. So basically, getUser() is making DB call which takes around 300 ms, getOrders() is making call to another Rest API that takes around 400 ms and getRecommendations() also makes call to another Rest API that takes around 500 ms.

getUser()                       // 300ms DB call      (blocking)
getOrders()                    // 400ms HTTP call (blocking)
getRecommendations() // 500ms HTTP call (blocking)


1. If we use Async methods, all the chained tasks are executed in a separate threads from thread pool

ExecutorService executorService = Executors.newFixedThreadPool(20);

CompletableFuture.supplyAsync(() -> getUser(userId), executorService)  
                               .thenApplyAsync(user -> getOrders(user), executorService)
                               .thenApplyAsync(user -> getRecommendations(user), executorService);

Here getOrders() and getRecommendations() will be using separate threads from getUser(). so for these 3 operations, there will be 3 separate threads.

thenApplyAsync() is executed only after we have result from suppyAsync() and so on...

As each of the operation performed in these methods is blocking, so respective threads are blocked for mentioned times. Advantage here of doing each operation in separate thread is that, say thread 1 completed the getUser(), so thread 1 will be released, so  now when thread 2 is running getOrders() code, thread 1 can be reused.

2. If we don't use Async methods, all the chained tasks are executed in a same thread from thread pool

ExecutorService executorService = Executors.newFixedThreadPool(20);

CompletableFuture.supplyAsync(() -> getUser(userId), executorService)
                               .thenApply(user -> getOrders(user), executorService)
                               .thenApply(user -> getRecommendations(user), executorService);

Here getOrders() and getRecommendations() will be using same thread as getUser().

Why Virtual Threads ? What problem they solve ?

Java server side applications typically follow thread-per-request model which means for each incoming HTTP request to the server, server will assign a thread from its thread pool to handle the request. These were Platform threads with 1:1 mapping to OS threads, meaning each Java thread directly occupied an underlying OS thread while processing the request.

However, this model had following limitations :

- As the incoming HTTP request is executed by a Platform thread backed by OS thread, when this request comes across to the blocking operation(s) like writing to database, it blocks the Platform/OS thread and as each request is executed by a single thread, this thread just stays idle waiting for blocking operation(s) to finish, which means this thread can not utilize CPU, hence inefficient use of CPU.

If we consider example of Tomcat, typically it can handle maximum 200 concurrent requests with default thread pool size of 200 which means 200 Platform threads. If there are more than 200 concurrent requests those are either queued or rejected. And if these requests have lots o blocking operations like database call , writing to File or making HTTP call to another service, most of these threads will be sitting idle for quite some time for blocking operations to complete.

Also from memory perspective, for 200 requests

Memory used : 200 * 2 MB = 400 MB

It is not that much.

so which means we are not even hitting memory limits but with this model, CPU is highly under utilized and if want to scale our application, we need to add more severs which we also call horizontal scaling but horizontal scaling comes at a cost of buying more hardware.

To solve this, asynchronous programming constructs like CompletableFuture was introduced in Java 8, which makes sure that thread to handle incoming request is immediately freed up to handle other requests and the blocking operations are performed in separate thread(s) but the problem with CompletableFuture was/is that developers need to know lots of API related to it, business logic gets embedded in async code jargons and hence makes code difficult to read as well as difficult to debug.

Hence Java designers with Project Loom introduced Virtual threads.

Virtual threads make blocking cheap and they are also cheap to create.

Now for each incoming request, Tomcat uses VirtualThreadPerTaskExecutor, so instead of Platform thread, a Virtual thread is assigned by sever. JVM mounts this Virtual thread on to the Carrier thread.

When a blocking call happens, JVM saves the stack of this Virtual thread to the heap and unmounts this Virtual thread from Carrier thread which makes Carrier thread free and another Virtual thread can now  be mounted onto it and hence another request can be handled, which results in the efficient use of CPU, increased throughput or number of requests handled per second.

So now with Virtual threads, if we want to scale, we can first scale vertically as same machine can handle more concurrent requests per second but eventually we need to add as needed server(s) horizontally for example to add more CPU cores or for high availability, fault isolation etc.