What is race condition and how to fix it

Race condition:


In a multithreaded environment, when two or more threads access the shared data and at least one of them is writing and the result depends upon order of execution of threads, it is called race condition.

Race because two  ore more threads are racing to perform operation on the shared data.

There are 2 main patterns where race condition happens:

1. Check then update

2. Read modify write


1. Check then update

Consider following Account class which has 'balance' field which is shared across multiple threads.

public class Account {
private int balance;

public Account(int balance) {
this.balance = balance;
}

public void withdraw(int withdrawlAmount) {
        System.out.println(Thread.currentThread().getName() + " entered withdraw method");
        if (balance < amount) {     // check for balance
        throw new IllegalArgumentException("balance is less than withdrawal amount");
        }

balance = balance - withdrawalAmount; // update balance
        System.out.println(Thread.currentThread().getName() + " exited withdraw method");
}

public int getBalance() {
System.out.println(Thread.currentThread().getName() + " entered getBalance method");
return balance;
}
}

Here in above program at following line in withdraw() method, first we are checking the balance(shared data between threads)

if (balance withdrawalAmount)  

In case balance is less than withdrawalAmount, we throw exception and if not then in the next line we are updating the balance(shared data between threads) by subtracting withdrawalAmount from balance:

 balance = balance - amount; 

Using Single Thread:

If we test this program using a single thread like below, it will always work perfectly and give correct balance which is 90 if withdrawal amount is 10 and give IllegalArgumentException if withdrwal amount is more than 100.

public class TestAccountBalanceSingleThread {

    public static void main(String[] args) {
Account account = new Account(100);
account.withdraw(10);
System.out.println(account.getBalance());
}
}

Or
public class TestAccountBalanceSingleThread {
public static void main(String[] args) {
Account account = new Account(100);

Thread t = new Thread(() -> {
account.withdraw(10);
System.out.println(account.getBalance());
});

t.start();
}
}

Or

public class TestAccountBalanceSingleThread {
public static void main(String[] args) {
Account account = new Account(100);

Thread t = new Thread(() -> {
account.withdraw(101); // withdrwal amount more than 100, throws IllegalArgumentException
System.out.println(account.getBalance());
});

t.start();
}
}

Using multiple Threads:

Now lets run this same code with 2 threads as below:

public class TestAccountBalanceMultipleThread {

    public static void main(String[] args) {
Account account =
new Account(100);

Runnable task = () -> {
account.withdraw(60);
System.
out.println(Thread.currentThread().getName()
+
" completed withdrawal. Balance: "
+ account.getBalance());
};

Thread t1 =
new Thread(task, "Thread-1");

Thread t2 =
new Thread(task, "Thread-2");

t1.start();
t2.start();
}
}


When I ran above code on my system, I got output  as below and notice that although we started Thread-1 before thread-2, it is Thread-2 which finished first.
Thread-2 entered withdraw method
Thread-1 entered withdraw method
Thread-2 exited withdraw method
Thread-2 entered getBalance method
Exception in thread "Thread-1" java.lang.IllegalArgumentException: balance is less than withdrawal amount
	at com.threads.racecondition.Account.withdraw(Account.java:13)
	at com.threads.racecondition.TestAccountBalanceMultipleThread.lambda$main$0(TestAccountBalanceMultipleThread.java:8)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Thread-2 completed withdrawal. Balance: 40

What basically happening here is:
Thread 2 entered the withdraw() method
Thread 1 also entered the withdraw() method Thread 2 checked if balance is less than withdrawal amount and because 100 < 60 is false, there is no exception and balance is then updated as below:
Thread 2 updates the balance  to 100 - 60 = 40
Thread 2 enters getBalance
Thread 1 checks if balance is less than withdrawal amount. Reads balance updated by Thread-2 which is 40 and checks 40 < 60 which is true so throws exception.

And if I run it again, I get following output and notice that this time Thread-1 completes first.

Thread-1 entered withdraw method
Thread-2 entered withdraw method
Thread-1 exited withdraw method
Thread-1 entered getBalance method
Exception in thread "Thread-2" java.lang.IllegalArgumentException: balance is less than withdrawal amount
	at com.threads.racecondition.Account.withdraw(Account.java:13)
	at com.threads.racecondition.TestAccountBalanceMultipleThread.lambda$main$0(TestAccountBalanceMultipleThread.java:8)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Thread-1 completed withdrawal. Balance: 40
So although it seems like this program is working fine for multiple threads but is not, multiple threads are executing withdraw() and getBalance() interleavingly and are producing different results in different executions but at least it seems here it is allowing only one thread to withdraw successfully and throws exception for other.

To make result of multiple threads operating on shared variable 'balance' more explicit, we can put a Thread.sleep() in withdraw() method after check statement which gives some time for other thread to enter the code.

So our updated withdraw() method now looks like as below:

public void withdraw(int withdrawalAmount) {
System.
out.println(Thread.currentThread().getName() + " entered withdraw method");
if (balance < withdrawalAmount) {
throw new IllegalArgumentException("balance is less than withdrawal amount");
}

try {
Thread.
sleep(50);
}
catch (InterruptedException interruptedException) {

}
balance = balance - withdrawalAmount;
System.
out.println(Thread.currentThread().getName() + " exited withdraw method");
}

And if I execute my TestAccountBalanceMultipleThread , I see following output:

Thread-2 entered withdraw method
Thread-1 entered withdraw method
Thread-2 exited withdraw method
Thread-1 exited withdraw method
Thread-2 entered getBalance method
Thread-1 entered getBalance method
Thread-2 completed withdrawal. Balance: -20
Thread-1 completed withdrawal. Balance: -20  

So now both Thread-1 and Thread-2 end up with balance of -20, which clearly depicts that balance is over withdrawn and hence wrong.

So now how to fix this ?

We can fix it by making both withdrawal() and getBalance() methods synchronized Or alternatively we can use explicit lock like ReentrantLock.

Using synchronized:

public class Account {
private int balance;

public Account (int balance) {
this.balance = balance;
}

public synchronized void withdraw(int withdrawalAmount) {
System.out.println(Thread.currentThread().getName() + " entered withdraw method");
if (balance < withdrawalAmount) {
throw new IllegalArgumentException("balance is less than withdrawal amount");
}

try {
Thread.sleep(50);
} catch (InterruptedException interruptedException) {
                Thread.currentThread().interrupt();
}
balance = balance - withdrawalAmount;
System.out.println(Thread.currentThread().getName() + " exited withdraw method");
}

public synchronized int getBalance() {
System.out.println(Thread.currentThread().getName() + " entered getBalance method");
return balance;
}
}
Now if I run TestAccountBalanceMultipleThread, output is as below:

Thread-2 entered withdraw method
Thread-2 exited withdraw method
Thread-2 entered getBalance method
Thread-1 entered withdraw method
Exception in thread "Thread-1" java.lang.IllegalArgumentException: balance is less than withdrawal amount
	at com.threads.racecondition.Account.withdraw(AccountFixedWithSynchronized.java:13)
	at com.threads.racecondition.TestAccountFixedWithSynchronizedBalanceMultipleThread.lambda$main$0(TestAccountFixedWithSynchronizedBalanceMultipleThread.java:8)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Thread-2 getBalance: 40


Here from output, we can see that only one thread(Thread-2) in this case is allowed acquire a lock and to enter withdraw() and getBalance() methods at a time, which means Thread-2 reduced the balance from 100 to 40 and after that when Thread-2 tried to withdraw it will get exception as balance is now less than withdrawalAmount.

Another important point to note here is that, lock is acquired at the object/instance level which means before entering withdraw() or getBalance() methods, thread acquires lock on the account object on which these methods are called. So in above example, we had only 1 account and both threads will try to acquire lock on that account object but only one of them will get a lock at a time and other thread will be blocked during that time.

so basically above synchronized methods are equivalent to:
public void withdraw(int withdrawalAmount) {
synchronized(this);
System.out.println(Thread.currentThread().getName() + " entered withdraw method");
        if (balance < withdrawalAmount) {
throw new IllegalArgumentException("balance is less than withdrawal amount");
}

try {
Thread.sleep(50);
} catch (InterruptedException interruptedException) {
            Thread.currentThread().interrupt();
}
balance = balance - withdrawalAmount;
System.out.println(Thread.currentThread().getName() + " exited withdraw method");
}

public synchronized int getBalance() {
synchronized(this);
System.out.println(Thread.currentThread().getName() + " entered getBalance method");
return balance;
}

In case, we have two account objects, it is possible for Thread-2 to enter withdraw() of account-2 when Thread-1 is in withdraw() of account-1

Thread 1  -> account1.withdraw()   -- acquires lock on account1
Thread 2  -> account1.withdraw()   -- Blocked - same object, same lock
Thread 1  -> account1.withdraw()   -- acquires lock on account1
Thread 2  -> account2.withdraw()   -- acquires lock on account2 - Not Blocked

Using ReentrantLock:

public class Account {
private int balance;

private final ReentrantLock lock = new ReentrantLock();

public Account (int balance) {
this.balance = balance;
}

public void withdraw(int withdrawalAmount) {
System.out.println(Thread.currentThread()
.getName() + " entered withdraw method");

lock.lock();
try {
if (balance < withdrawalAmount) {
throw new IllegalArgumentException("balance is less than withdrawal amount");
}

Thread.sleep(50);
balance = balance - withdrawalAmount;
System.out.println(Thread.currentThread()
.getName() + " exited withdraw method");
} catch (InterruptedException interruptedException) {
Thread.currentThread()
.interrupt();
} finally {
lock.unlock();
}
}

public int getBalance() {
System.out.println(Thread.currentThread()
.getName() + " entered getBalance method");
lock.lock();
try {
return balance;
} finally {
lock.unlock();
}
}
}

And here is the output which also works in same way as for synchronized.

Thread-1 entered withdraw method
Thread-1 exited withdraw method
Thread-2 entered withdraw method
Thread-1 entered getBalance method
Exception in thread "Thread-2" java.lang.IllegalArgumentException: balance is less than withdrawal amount
	at com.threads.racecondition.Account.withdraw(AccountFixedWithReentrantLock.java:21)
	at com.threads.racecondition.TestAccountFixedWithReentrantLockBalanceMultipleThread.lambda$main$0(TestAccountFixedWithReentrantLockBalanceMultipleThread.java:8)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Thread-1 getBalance: 40

Although above example also covers read modify update, we will see another example.


Transaction isolation levels

What is a Transaction ?

A Database transaction is set or group of operations which can be read or writes and are treated as a single unit of work, so they either all are done (committed) or none of them is done(roll back).

Transactions have ACID properties which is acronym for :

A - Atomicity

C - Consistency

I - Isolation

D - Dependency

In this blog post, we will focus on property 'Isolation'.

What is Transaction Isolation Level ? 

A transaction isolation level defines how much a transaction can see and be affected by the other concurrent transactions' changes.

What are different transaction isolation levels ?

There are four transaction isolation levels:

- Read Uncommitted

- Read Committed

- Repeatable Read

- Serializable

Different databases can support different set of isolation levels and have different default isolation level so when working with databases, we should be aware of what isolation level our database supports.

Here are some of the most commonly used databases and their default isolation level:

MySQL, MariaDB: Repeatable Read

Oracle, PostgreSQL, SQL Server: Read committed

Let us see each of these isolation level with example of MariaDB:

Note: every isolation level example starts with salary= 80,000

Read Uncommitted:

In this transaction isolation level, suppose transaction T1 has read row R1 and has made update but have not committed the updated data yet, then transaction T2 can read these uncommitted changes.

The problem with this is that if T1 fails later and all changes are rolled back, T2 has already read the updated data from T1 and now if it updates database on the basis of that it would cause database in inconsistent state. 

Lets see how it works with an example:

Session A and Transaction T1:










Then update the record in same transaction:

UPDATE employee SET salary = 81000 WHERE employee_id = 100;

Don't commit T1.






Session B and Transaction T2:

Now open another Session and start a new transaction T2 and within this transaction read the employee record updated in transaction T1 and it shows the uncommitted data:






When one transaction can read uncommitted data of another transaction, it is called Dirty read and to prevent Dirty reads we can use 'READ COMMITTED' isolation level which we will discuss next.

Read Committed:

With Read Committed isolation level, transaction T2 can only read the committed changes of T1 so there will not be any dirty read.

Session A, transaction T1:

Change transaction isolation level to READ COMMITTED. Start a new transaction T1 and update employee salary to 81000 without committing.






Session B, transaction T2:

Now open another session Session B and start a new transaction, T2 and read the employee salary by executing select statement and it still shows the old salary as now uncommitted data can not be read.









Session A, transaction T1:

Now I will go back to Session A and within Transaction T1, I will commit the changes by executing following statement:

COMMIT;

Session B, transaction T2:

Now if go back to Session B and from within Transaction T2, I read the salary again by executing select statement, it will show the updated salary, which is what we expect.








However  READ COMMITTED has following problems:

Non Repeatable Reads: 

Multiple reads in one transaction can result in different result if another transaction committed between multiple reads of first transaction.

Phantom Reads:

Multiple reads in one transaction can result in different result if another transaction inserted new records in another transaction which satisfies the criteria of select query in first transaction.

Repeatable Read:

Repeatable Read solves the problem of 'Non Repeatable Reads' as well as 'Phantom Read' at least with maria db(can be different for different DBs), which means with 'Repeatable Read' isolation level, if within transaction T1, same row is read multiple times, it will always return same result. In other words, no other transaction can modify or delete the row(s) read by Transaction T1 until transaction T1 is finished.

so if within transaction T1, row r1 is read and then within transaction T2, row r1 is read and one of its field is updated and even committed, if transaction reads row r1 again, it will still see the old row what it saw before updates of T2.

Let us see example of how Non repeatable reads does not happen with 'Repeatable Read'.

Session A, transaction T1:

Read the row with employee_id 100









Session B, transaction T2:

Updated the row with employee_id 100









Session A, transaction T1:













Let us see example of how Phantom reads does not happen with 'Repeatable Read'

Phantom read is basically when same transaction re-runs the same query and get different set of rows.

Session A, transaction T1:

A query with criteria that returns 2 records

Now lets add additional record in another transaction that fulfills same criteria as in previous query.

Session B, transaction T2:










Now lets again go back to transaction T1 and re-run the same query again and see the result:

As we can see, it still returns 2 records, so no phantom reads. In case of Phantom read, we would have got additional record that we inserted in transaction T2.


Next we will see isolation level Serializable.



What are Virtual Threads and Why we need them?

What are Virtual threads 

Before we even look into definition of Virtual threads, lets clarify some terminology which will help us understanding Virtual threads better.



1. Platform Thread

The term 'Platform Thread' is used in two contexts:

1.1 Traditional Java Thread

- For non Virtual Thread, the thread which we had/have before Virtual thread, the traditional OS(Operating System) backed Java thread and scheduled by OS. Platform thread run Java code on its underlying OS thread and uses OS thread for  its entire life cycle which means the OS thread can not be used by any other request/task because of which the number of available platform threads is limited to the number of available OS threads. Note that these traditional Platform threads are not Carrier threads, as their purpose is not to run Virtual threads.

They are instance of java.lang.Thread

1.2 Carrier Thread

They are special Platform threads created and managed by JVM to carry or run Virtual threads. 

To do its task, a thread needs to be scheduled or assigned for execution onto the CPU core processor. Traditional Java threads are scheduled by OS scheduler to the processor. In contrast, Virtual threads are not scheduled by OS to processor but by JVM and to the Carrier threads which are platform threads. 

If there is blocking call from within Virtual thread then Virtual thread is unmounted from the Carrier thread and stack of the Virtual thread is copied to the heap which makes Carrier thread free to run another Virtual thread.

Once the blocking call is finished, Virtual thread stack is brought back from heap and it is mounted on Carrier thread and this can be same Carrier thread from which it was unmounted Or a new Carrier thread, so Virtual thread is not tied to a single Carrier thread for its whole lifecycle.

Platform threads(Traditional or Carrier) are relatively heavy weight compared to Virtual thread. A stack for Platform thread takes around 1 MB of memory.

They are also instance of java.lang.Thread

Now let us understand what Virtual threads are.

2. Virtual Thread

They are relatively light weight threads with size of around 2-10 KB(can grow dynamically) and are managed entirely by the JVM rather than the operating system. They are not directly mapped 1:1 to OS threads. Instead, the JVM schedules many virtual threads onto a smaller pool of Platform threads called Carrier threads.  As mentioned before, If there is blocking call from within Virtual thread then Virtual thread is unmounted from the Carrier thread and stack of the Virtual thread is copied to the heap which makes Platform/Carrier/OS thread free to run another Virtual thread and hence allowing multiple Virtual Threads to share a ew Carrier/Platform/OS threads.

Once the blocking call is finished, Virtual thread stack is brought back from heap and it is mounted on Carrier thread and this can be same Carrier thread from which it was unmounted Or a new Carrier thread, so Virtual thread is not tied to a single Carrier thread for its whole lifecycle.

We will look into scalability in some time soon in one of next few paragraphs.

Remember that all Carrier threads are Platform threads, but not all Platform threads are Carrier threads.

And they are also instance of java.lang.Thread but following new class hierarchy has been added in JDK or VirtualThread

BaseVirtualThread extends Thread 

VirtualThread extends BaseVirtualThread

How to create Platform Threads V/s Virtual Threads V/s Carrier Threads

Platform Threads:

Creating a Thread with java.lang.Thread class or using ExecutorService give us platform thread, each of which is backed 1:1 by an OS thread. Actually the term ‘Platform thread’ was introduced in JDK 19 when Virtual Thread was added as a preview feature, to distinguish these threads from newly introduced Virtual threads, but as mentioned before that these Platform threads are not the Carrier threads.

So using old Thread API, all following returns Platform thread which is mapped 1:1 to OS thread:

1) Using java.lang.Thread class to create Thread

1.1 ) 

        Thread t = new Thread(() -> {

            System.out.println("Running on: " + Thread.currentThread().getName());

        });

       t.start();

1.2 Extending Thread class

     public class MyThread extends Thread {

        @Override

         public void run() { 

            System.out.println("Thread t1");    

         } 

     }

  public static void main(String[] args) {

        MyThread t1 = new MyThread(); 

        t1.start();                  

        System.out.println("Main thread finished");

}

2) Using ExecutorService

As creating Platform threads is expensive, we can not just create thousands of them, so to get around that problem, java has Thread Pools where in we can have defined pool of Threads which can be reused for multiple tasks.

2.1 )

ExecutorService executorService = Executors.newFixedThreadPool(8);

executorService.submit(() -> System.out.println("Hi"));

Threads are created lazily by thread pool on demand which means when a task is submitted then a thread is created to execute this task. When first task is submitted, first thread is created. If next task is submitted after first task is completed, same existing thread will be reused. If 2nd task is submitted before first task is completed, a new thread will be created to execute 2nd task.

So if 8 tasks are submitted concurrently, 8 threads will be created.

If 9 threads are submitted concurrently, still 8 threads will be created to execute 8 tasks as the max and core pool size defined via the Executors.newFixedThreadPool(8) is 8 and 9th tasks will be put in the queue(a blocking queue is used by fixed thread pool). Once one of the thread executing 8 tasks is freed, it will pick the 9th task.

But what will happen if multiple executing/running threads become idle at the same time, then who will pick the 9th task from the queue ?

And this is the reason fixed thread pool uses blocking queue. If it is not a blocking queue, threads will be in race condition. All idle threads might read the queue as not empty and will try to execute the task which can result in tasks being executed multiple times, exception or corrupted state. With blocking queue, while one thread is reading the task from queue, it will block the queue for further read.

Fixed thread pool is ideal to use for:

-  Long running or blocking tasks (I/O tasks like reading/writing to database, files, Http call(e.g Rest API).

-  For bounded concurrency, as we don't want to overwhelm CPU or memory. At any time, max n threads will be be in the pool, so 8 in above example.

2.2)

ExecutorService executorService = Executors.newSingleThreadExecutor();

executorService.submit(() -> System.out.println("Hi"));

Here also, thread is created lazily by thread pool on demand when task is submitted. However only one thread is created and is ever there. When more than one task(s) are submitted then remaining tasks are put in the blocking queue. Once the thread has completed the currently executing task, it picks the task from queue from front(FIFO), so the tasks are processed sequentially in the order they are submitted.

Single thread pool is ideal to use for:

- Sequential execution of tasks

- Decoupling submission from execution

2.3)

ExecutorService executorService = Executors.newCachedThreadPool();

executorService.submit(() -> System.out.println("Hi"));

Threads are created on demand as tasks are submitted. Uses blocking synchronous queue in which each insert operation must wait for a corresponding remove operation by another thread.

When 1st task is submitted, a new thread is created for executing the task. When 2nd task is submitted, if the first thread has become idle after executing first task, it executes the 2nd task else a new thread is created.

For n tasks submitted concurrently, n threads are created. As these are platform threads and are mapped to operating system threads and we have limited operating system threads, we can not create infinite number of threads via cachedThreadPool, so threads which are idle for more than 60 secs are terminated.

No definite core pool size needs to be maintained and maximum pool size theoretically can be as many as Integer.MA_VALUE.

Cached thread pool is ideal to use for:

- Short lived asynchronous tasks.

- Tasks coming intermittently or in burst.

2.4)

ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(8);

scheduledExecutorService = scheduledExecutorService.schedule(() -> System.out.println("Hi"), 1000,  TimeUnit.MILLISECONDS);

Threads in a ScheduledThreadPool are created lazily on demand, similar to a fixed thread pool. When the first scheduled task is scheduled, task is added to the delay queue and the 1st thread is created to execute it. Task is executed after give delay period. If the 2nd task is submitted after the 1st task completes, the same existing thread is reused. If a 2nd task is scheduled while the 1st is still running, a new thread is created (up to the core pool size) to execute it. A core pool size of 8 threads is maintained in the pool, even if they are idle.

3) Using CompletableFuture

CompletableFuture.supplyAsync(dbCall(), executor)

4) Using new Thread.Builder API

4.1

 Thread.ofPlatform().start(() -> System.out.println("Platform Thread"));

4.2

ThreadFactory factory = Thread.ofPlatform().factory();

Thread t = factory.newThread(() -> System.out.println("task"));

t.start();


Virtual Threads:

And here is how we can create Virtual threads :

1. Using Thread.Builder API

1.1 

Thread.ofVirtual()

              .name("virtual thread-", 0)

              .start(() -> System.out.println("Task"));

1.2

Thread.ofVirtual()

              .factory()

              .newThread(() -> System.out.println("Task"))

              .start();

1.3

Thread.startVirtualThread(() ->

                                      System.out.println("Creating and starting Virtual thread"));

2. Using ExecutorService

2.1

ThreadFactory factory = Thread.ofVirtual().factory();

ExecutorService executor = Executors.newThreadPerTaskExecutor(factory);

2.2

try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {

            executor.submit(() -> System.out.println("task"));

 }

3. Using Structured Concurrency

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {

            Subtask<String> subtask1 = scope.fork(() -> callServiceA());

            Subtask<String> subtask2 = scope.fork(() -> callServiceB());

            scope.join();

            return subtask1.get() + subtask2.get();

 }

How Async API like CompletableFuture helps with scaling and issues with it

CompletableFuture frees the request thread immediately and runs the task asynchronously. It takes a thread from its default ForkJoin commonpool(if we don't assign our Executor otherwise thread from assigned thread pool is provided) and runs task on it.

We can chain multiple tasks using then..() methods.

For Example:

Say we have following three bocking operations to perform from our service method with mentioned time they take. So basically, getUser() is making DB call which takes around 300 ms, getOrders() is making call to another Rest API that takes around 400 ms and getRecommendations() also makes call to another Rest API that takes around 500 ms.

getUser()                       // 300ms DB call      (blocking)
getOrders()                    // 400ms HTTP call (blocking)
getRecommendations() // 500ms HTTP call (blocking)


1. If we use Async methods, all the chained tasks are executed in a separate threads from thread pool

ExecutorService executorService = Executors.newFixedThreadPool(20);

CompletableFuture.supplyAsync(() -> getUser(userId), executorService)  
                               .thenApplyAsync(user -> getOrders(user), executorService)
                               .thenApplyAsync(user -> getRecommendations(user), executorService);

Here getOrders() and getRecommendations() will be using separate threads from getUser(). so for these 3 operations, there will be 3 separate threads.

thenApplyAsync() is executed only after we have result from suppyAsync() and so on...

As each of the operation performed in these methods is blocking, so respective threads are blocked for mentioned times. Advantage here of doing each operation in separate thread is that, say thread 1 completed the getUser(), so thread 1 will be released, so  now when thread 2 is running getOrders() code, thread 1 can be reused.

2. If we don't use Async methods, all the chained tasks are executed in a same thread from thread pool

ExecutorService executorService = Executors.newFixedThreadPool(20);

CompletableFuture.supplyAsync(() -> getUser(userId), executorService)
                               .thenApply(user -> getOrders(user), executorService)
                               .thenApply(user -> getRecommendations(user), executorService);

Here getOrders() and getRecommendations() will be using same thread as getUser().

Why Virtual Threads ? What problem they solve ?

Java server side applications typically follow thread-per-request model which means for each incoming HTTP request to the server, server will assign a thread from its thread pool to handle the request. These were Platform threads with 1:1 mapping to OS threads, meaning each Java thread directly occupied an underlying OS thread while processing the request.

However, this model had following limitations :

- As the incoming HTTP request is executed by a Platform thread backed by OS thread, when this request comes across to the blocking operation(s) like writing to database, it blocks the Platform/OS thread and as each request is executed by a single thread, this thread just stays idle waiting for blocking operation(s) to finish, which means this thread can not utilize CPU, hence inefficient use of CPU.

If we consider example of Tomcat, typically it can handle maximum 200 concurrent requests with default thread pool size of 200 which means 200 Platform threads. If there are more than 200 concurrent requests those are either queued or rejected. And if these requests have lots o blocking operations like database call , writing to File or making HTTP call to another service, most of these threads will be sitting idle for quite some time for blocking operations to complete.

Also from memory perspective, for 200 requests

Memory used : 200 * 2 MB = 400 MB

It is not that much.

so which means we are not even hitting memory limits but with this model, CPU is highly under utilized and if want to scale our application, we need to add more severs which we also call horizontal scaling but horizontal scaling comes at a cost of buying more hardware.

To solve this, asynchronous programming constructs like CompletableFuture was introduced in Java 8, which makes sure that thread to handle incoming request is immediately freed up to handle other requests and the blocking operations are performed in separate thread(s) but the problem with CompletableFuture was/is that developers need to know lots of API related to it, business logic gets embedded in async code jargons and hence makes code difficult to read as well as difficult to debug.

Hence Java designers with Project Loom introduced Virtual threads.

Virtual threads make blocking cheap and they are also cheap to create.

Now for each incoming request, Tomcat uses VirtualThreadPerTaskExecutor, so instead of Platform thread, a Virtual thread is assigned by sever. JVM mounts this Virtual thread on to the Carrier thread.

When a blocking call happens, JVM saves the stack of this Virtual thread to the heap and unmounts this Virtual thread from Carrier thread which makes Carrier thread free and another Virtual thread can now  be mounted onto it and hence another request can be handled, which results in the efficient use of CPU, increased throughput or number of requests handled per second.

So now with Virtual threads, if we want to scale, we can first scale vertically as same machine can handle more concurrent requests per second but eventually we need to add as needed server(s) horizontally for example to add more CPU cores or for high availability, fault isolation etc.

 

 




Difference between dependencyManagement and dependencies in Maven

Managing dependencies correctly is one of the most important aspects of working with Apache Maven. A frequent source of confusion for Java developers is understanding the difference between <dependencyManagement> and <dependencies> in a Maven pom.xml.

Although both deal with dependencies, they serve completely different purposes.

In this article, we will see:

  • What <dependencies> does
  • What <dependencyManagement> does
  • The key differences between them
  • How they work together
  • Best practices for real-world Maven projects

What is <dependencyManagement> in Maven?

The <dependencyManagement> section is used to define and control dependency versions without actually adding them to the project. It acts as a central version registry.

So, if you define dependencies under the <dependencyManagement> section and run the command mvn clean install, Maven will not download the dependency to your local repository and will not add it to the classpath of your project.

Then you might ask, why do we want to add <dependencyManagement>?

It is mainly used in multi-module projects, where in the parent POM, we declare all the dependencies (along with their versions) that are intended to be used by child modules. This ensures:

  • No version conflicts
  • All child modules use the same dependency versions
  • Easy upgrades by changing the version in one place

Example: <dependencyManagement>


<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>3.14.0</version>
        </dependency>
    </dependencies>
</dependencyManagement>

What is <dependencies> in Maven?

The <dependencies> section is where you declare the libraries or dependencies your project actually uses.

Any dependency listed here:

  • Is downloaded by Maven
  • Is added to the project classpath
  • Is available during compile, test, or runtime (based on scope)

Example: <dependencies>


<dependencies>
    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-lang3</artifactId>
        <version>3.14.0</version>
    </dependency>
</dependencies>

Key Points About <dependencies>

  • Directly adds dependencies to your project
  • Required for compilation or execution
  • Versions must be specified unless inherited. When versions are defined in <dependencyManagement>, there is no need to specify them again in child POMs.
  • Affects the build immediately

Key Differences Between <dependencyManagement> and <dependencies>

Feature <dependencies> <dependencyManagement>
Adds dependency to project Yes No
Controls version Yes Yes
Required for compilation Yes No
Common in parent POM Optional Very common
Helps avoid version conflicts Limited Yes

How <dependencyManagement> and <dependencies> Work Together

The most common and recommended usage is in multi-module Maven projects.

Parent POM (Version Control)


<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>3.14.0</version>
        </dependency>
    </dependencies>
</dependencyManagement>

Child Module (Actual Usage)


<dependencies>
    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-lang3</artifactId>
    </dependency>
</dependencies>
  • Version is inherited automatically
  • No need to repeat versions
  • Easy upgrades and consistency

Why <dependencyManagement> Is Important

1. Centralized Version Control

Define dependency versions in one place instead of repeating them across modules.

2. Prevents Dependency Conflicts

Ensures all modules use the same library versions.

3. Simplifies Maintenance

Upgrade a dependency version once instead of everywhere.

4. Enables Clean Child POMs

Child POMs stay simple and readable.


When Should You Use Each?

Use <dependencies> when:

  • Your project actually needs the dependency
  • You want the library on the classpath
  • You are working in a single module or child module

Use <dependencyManagement> when:

  • Managing versions across multiple modules
  • Creating a parent POM
  • Standardizing dependency versions
  • Avoiding transitive dependency conflicts

Common Mistakes Developers Make

  • Expecting <dependencyManagement> to add dependencies automatically
  • Forgetting to declare dependencies in <dependencies>
  • Duplicating versions across modules
  • Mixing incompatible library versions

Best Practices for Maven Dependency Management

  • Always use <dependencyManagement> in parent POMs
  • Declare dependencies without versions in child POMs
  • Prefer BOMs (Bill of Materials) when available
  • Regularly run mvn dependency:tree
  • Keep dependencies updated to avoid security issues

Summary

In simple terms:

  • <dependencies> → What your project uses
  • <dependencyManagement> → How versions are controlled

They are not alternatives — they are complementary.

Understanding this distinction is essential for building clean, scalable, and maintainable Maven projects.

Thanks for reading.

Java String interview Questions - 2

 What is the output of following code and Why ?

public class TestString {

 public static void main(String[] args) {
String str1 = "Hello";
       String str2 = new String("Hello");
       System.out.println(str1.hashCode() == str2.hashCode());
  }
}

Output: true

Explanation:






















Line 1: String str1 = "Hello";

Here String literal (object) is created in the String constant pool area of the heap memory. JVM see that it is string literal and there is no existing literal with that value in String constant pool, so it creates a new String literal with value "Hello" in the String constant pool. This is done by JVM for efficient usage of memory. The reference variable str1 refers(read have address of this String literal) to this String literal. 

Line 2: String str2 = new String("Hello");

Here as we are creating String object using new operator, JVM creates a new String object in heap with value "Hello". str2 refers to(or read have address of this newly created object in heap) to object in heap.


Line 3: System.out.println(str1.hashCode() == str2.hashCode());

Here we are calling hashCode() method on str1 and str2 and  comparing hashcodes of objects referred by str1 and str2 respectively.

Now because String class has overriden the hashCode() method of Object class from which it inherits and the overriden implementation in the String class calculates hashCode based on the value of String, hashCode of both the String objects will be same as both have same value.

Here is source code of hashCode() method from String class (JDK 17) and you can see that hashCode is calculated on the basis of value.

/**
* Returns a hash code for this string. The hash code for a
* {@code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {@code int} arithmetic, where {@code s[i]} is the
* <i>i</i>th character of the string, {@code n} is the length of
* the string, and {@code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
// The hash or hashIsZero fields are subject to a benign data race,
// making it crucial to ensure that any observable result of the
// calculation in this method stays correct under any possible read of
// these fields. Necessary restrictions to allow this to be correct
// without explicit memory fences or similar concurrency primitives is
// that we can ever only write to one of these two fields for a given
// String instance, and that the computation is idempotent and derived
// from immutable state
int h = hash;
if (h == 0 && !hashIsZero) {
h = isLatin1() ? StringLatin1.hashCode(value)
: StringUTF16.hashCode(value);
if (h == 0) {
hashIsZero = true;
} else {
hash = h;
}
}
return h;
}


Java String Interview Questions - 1

What is the output of following Java code:


public class TestString {

 public static void main(String[] args) {

     String str1 = "Hello";

     String str2 = new String("Hello");

     System.out.print(str2 == "Hello");

 }

}

Explanation:




Line 1: String str1 = "Hello";

Here String literal (object) is created in the String constant pool area of the heap memory. JVM see that it is string literal and there is no existing literal with that value in String constant pool, so it creates a new String literal with value "Hello" in the String constant pool. This is done by JVM for efficient usage of memory. The reference variable str1 refers(read have address of this String literal) to this String literal. 


Line 2: String str2 = new String("Hello");

Here as we are creating String object using new operator, JVM creates a new String object in heap with value "Hello". str2 refers to(or read have address of this newly created object in heap) to object in heap.


Line 3:  System.out.print(str2 == "Hello");

When we compare two object references using '==' , we are comparing whether two object references are pointing to same object in memory or not. If two object references are pointing to same object in memory or in other words if two object references have addresses of same object in memory, then result of comparing two object references using '==' will return true and if two object references have address of two different memory locations then comparison will return false.


so str1 == str2 would have returned false as str1 has address of String object in String constant pool and str2 has memory address of String object in heap and of course as two addresses are different, result of comparison will be false.

As in our case, we are comparing str2 (which is reference to String object in heap) with String literal "Hello" using '==', it is apparently comparing memory address of the String in heap(4444) with memory address of the String  literal "Hello" in String constant pool(5555), which is of course not equal, so answer will be false.



Grep command and interesting things you can do with it

What is Grep ?

grep is a command line utility for searching plain text data sets for lines that match a regular expression. grep was developed originally for Unix operating system but later was available for all Unix like operating systems. 

grep stands for global regular expression print.




Which file types grep is compatible with ?

grep is compatible with a wide variety of file types, as long as the files contain text that can be searched for patterns using regular expressions.

Some common file types that can be searched with grep include:

Plain text files: These are simple text files with no special formatting or encoding.

Configuration files: These files are used to configure applications and services and often have specific syntax and formatting rules.

Code files: These files contain programming code written in various languages such as C, Java, Python, etc.

Log files: These files are used to store system logs and application logs and can be searched to find specific events or error messages.

HTML, XML, and other markup files: These files contain structured data and can be searched for specific tags, attributes, or values.

CSV and other delimited data files: These files contain tabular data separated by commas or other delimiters, and can be searched for specific values or patterns.

In general, any file that contains text can be searched with grep. However, grep may not work well with binary files or files that are encoded in a non-text format, such as images or audio files. In such cases, you may need to use specialized tools or libraries that are designed to work with those file types.

What is the syntax for grep command ?

grep [OPTIONS] PATTERN [FILE...]

OPIONS are various command-line options that can be used to modify the behavior of grep. Some common options include:

-i: Ignore case distinctions during the search.

-v: Invert the match; display all lines that do not match the pattern.

-n: Display the line numbers of the matched lines.

-c: Display only the count of matched lines.

-r: Search recursively in directories and their subdirectories.

PATTERN is the regular expression that you want to search for in the specified files.

FILE... is the name of the file(s) to search. If no file is specified, grep searches for the pattern in standard input.

What are the examples of Grep command ?

1. Searching for a string in a file

     grep "search text" file.txt

It will search file with name file.txt for any matching string "search text" in whole file and return lines which contains string "search text".

2. Searching for a string in multiple files.

    grep "search text" file1,txt file2.txt file3.txt

 It will search for files with names file1.txt, file2.txt, file3.txt for text "search text" and return lines with matching text.

3. Searching for a string in all files in a directory

    grep "search text" directory/*

  It will search all files under directory with name 'directory' and returns lines with matching text.

4. Searching for à string in all the files in a directory and its subdirectories.

  grep -r "search text" directory/

 It will search in files in a directory with name 'directory' and recursively under its subdirectories for text "search text" and return all matching lines.

5. Searching for a string and ignoring case.

   grep -i "search text" file1.txt

  It will search in file file1.txt for text "search text" ignoring case and return all matching lines. So, for example, if there is line which has "SEARCH TEXT" or "Search Text", those lines will also be returned.

6. Searching for a string and returning line numbers.

  grep -n "search text" file1.txt

  It will search file file1.txt for text "search text" and return all  lines having "search text" along with line numbers of the lines.

7. Searching for a string and only returning matching string.

  grep -o "search text" file1.txt

It will search in file file1.txt for "search text" and return for each matching line with "search text". 

8. Searching for a string and returning lines which does not contain the string

 grep -v "search text" file.txt

9. Searching for a string and returning only count of matching string

grep -c "search text" file.txt

10. Searching for a string and returning count of the number of occurrences of string.

grep -o "search text" file.txt | wc -l

How to use output of another command as input to search for grep command ?

Example1:

Suppose you have a command that generates some output, like ls -l, and you want to search that output for files that end with ".txt". You can pipe the output of ls -l into 'grep' like this:

ls -l | grep "\.txt$"

In this case, ls -l generates output and pipes it into grep. The pattern \.*txt$ will then be applied to the input that grep receives from ls -l. The $ at the end of the pattern means "end of line", so this pattern will match any line that ends with ".txt".

So, if no file is specified, grep searches for the pattern in standard input, which means you can pipe the output of another command into grep to search for a pattern within that output.

Example2:

Suppose under the current directory and its subdirectories, you want to search for all the files which has .log extension and then for each file that has .log extension, you want to execute grep to search for a text 'search text', then you need to execute following command:

find . -name "*.log" -exec grep -H 'search text' \{\} \

It will print the name of the file along with each matching line that contains the 'search text'.

Here's how the command works:

find . -name "*.log" searches for all files under the current directory and its subdirectories with a .log extension.

-exec is a flag that tells find to execute a command on each file that matches the search criteria.

grep -H 'search text' {} is the command that gets executed for each file that matches the search criteria. grep is a command that searches for patterns in text files. The -H flag tells grep to print the filename along with each matching line.

The \{\} is a placeholder that find replaces with the name of each file that matches the search criteria.

So when you run this command, find searches for all files with a .log extension under the current directory and its subdirectories, and for each file that matches the search criteria, it executes the grep command to search for the pattern "search text". The output shows the filename along with each matching line that contains the pattern.

Example 3:

cat file.txt | grep "pattern"

cat can be used to display the contents of a file, and grep can be used to filter the output based on a pattern. 

Example 4:

cat file.txt | grep "pattern" | awk '{print $1}'

awk is a powerful text processing tool that can be used to filter and manipulate text data. grep can be used to filter the input, and awk can be used to perform additional processing. 

Example 5:

cat file.txt | grep "pattern" | sed 's/foo/bar/g'

sed is another text processing tool that can be used to manipulate text data. grep can be used to filter the input, and sed can be used to perform additional processing.

This command uses grep to filter the input based on a pattern, and then uses sed to replace all occurrences of "foo" with "bar".

Example 6:

ps -ef | grep "process name"

ps can be used to list all running processes, and grep can be used to filter the output based on process name.

Can we use grep to search text in compressed file, like file compressed using gzip ?

grep does not work with compressed files compressed with gzip. To search text in files compressed with gzip, you need to use zgrep command line utility.

Example:

zgrep "search text" file.gz

Thanks for reading. If you enjoyed reading the article then please Subscribe to the blog.