Jun 23 2004

Of non-static ThreadLocals and memory leaks …

Tags: , , Rajiv @ 8:36 pm UTC

My recent experience has made me realize that the ThreadLocal class was never really designed to be used as a non-static field. However, the implications of making it non-static are not highlighted enough in JDK API and the many ThreadLocal tutorials you find on the net.

If you want to know more about ThreadLocals I would recommend reading Threading lightly by Brian Goetz. Brian talks about why and how to use ThreadLocals along with some examples. And if you are a performance buff, you will surely have an "Aha!" moment when you read the section on the performance bottleneck with the JDK 1.2 implementation and how it was resolved in JDK1.3.

Coming back to my "Non-static ThreadLocals considered harmful" [pun intended! ;) ], couple of months ago Prasad called:

  • Prasad: Hey I need to store some information per thread per
    object
    can I do that?
  • Me: What?!
  • Prasad: See when we used a ThreadLocal for Transaction id,
    it was a static singleton across the VM. In my case I need the value of my
    ThreadLocal to be per instance of a cached object.
  • Me: Yeah, so you can have a non static ThreadLocal field in your cached
    object… right?!
  • Prasad: Yeah .. that was my question. Would that work?

What he wanted was something like:

Initial reference diagram

So I went about stepping through the code/logic to prove how/why it would work if you make the ThreadLocal non static. Well he tried the whole thing and it did work. Only months later did we realize that
there was a memory leak in the application! Running through optimizeIt and looking for non GC’ed instances we realized that the number of Context objects was more than the cache size. Likely that there was some one referring the Context even after the corresponding CachedObject had been removed from the Cache. OptimizeIt’s reduced reference graph showed that the ThreadLocals were holding reference to these Context object instances. Aha! Now there were multiple places where the CachedObjects were being evicted from the Cache. We had to set the ThreadLocal to null at all these places. 

The simplest option seemed to be to set the thread local to null in the finalize of the cached object. *buzzer* Well, unfortunately that wouldn’t work. The finalize method would get called in the Finalizer thread of the VM. So it would try to set the value to null in the Finalizer thread … but we wanted it to be set to
null in the Thread that accessed it!

So we refactored the code a bit, made sure all evictions happen in one place and made sure to set(null) on the ThreadLocal of the CachedObject which was going to be evicted. Unfortunately, that wasn’t the end of the story. We still had OutOfMem issues … only that it took a little longer now. OptimizeIt’s reduced reference graph showed that the number of CachedObjects was same as the Cache size now. So our solution did work. The memory leak was due to the number of ThreadLocals. Logically the number of ThreadLocals has to be same as the number of CachedObjects. So even after the CachedObject has been GC’ed, some one was holding a reference to the ThreadLocal. Initially, while stepping through the logic we had stopped the moment we concluded any thread would find the right value. We never went further to analyze what happens to all the ThreadLocals we created. We just assumed that GC would take care of it. [Which I still think was fair enough!]

When we invoke ThreadLocal.set(), the ThreadLocal would add itself as the key and the Context as the value to the threadLocal map of the current Thread. [This is in JDK 1.3 and above, the implementation is slightly different in JDk1.2, but the end result is the same ... there is a memory leak!] By setting the value of the ThreadLocal to null we have made the value of the map entry null . This made the Context to be available for garbage collection. However, the threadLocal map of Thread stills holds a reference to the key, which in our case is the ThreadLocal. We actually need to remove the ThreadLocal and not just set its value to null. If you are having trouble following this consider opening ThreadLocal.java and Thread.java from <jdk_1.3++_home>/src.zip and stepping through ThreadLocal.set and ThreadLocal.get methods. The memory leak was aggravated in this scenario because the web container reuses threads across requests and the cache objects keep getting created and garbage collected. However, the ThreadLocals don’t get GC’ed.

Unfortunately, it appears that till JDK1.5 came out, ThreadLocal‘s did not have a proper lifecycle. They were expected to be created as static singletons or in limited numbers. For a proper life cycle I would have expected:

  1. The ThreadLocal should have taken care of cleaning itself up when no user
    code is referring to it. The way to achieve this is to make the threadLocals
    map of the Thread class a WeakHashMap. I wonder why this was not done …
    performance issues?! … or ThreadLocal was just not designed for non-static usage??
  2. An explicity destroy method should have been provided on ThreadLocal. In
    JDK 1.5, a new method remove
    has been added to ThreadLocal‘s API to complete its lifecycle management.
    Users can call the remove method and avoid such memory leaks.

But for most of us who can’t make JDK 1.5 as the minimum requirement for our application we have to find some other way out.

One simple option seemed to be to write a CustomThreadLocal based on the JDK 1.2 implementation of ThreadLocal. Basically, have a synchronized Map of Thread vs threadLocalValue and expose a remove method which will remove the Thread from the map. However, this has serious performance issues, which is why ThreadLocal was redesigned in JDK 1.3! So the only realistic option is to make the
ThreadLocal a static field.

Currently each CachedObject instance has its own ThreadLocal instance and each ThreadLocal instance holds a Context instance. If we make the ThreadLocal static, then all the instances of CachedObject will refer to the same ThreadLocal instance. So, in order to retain the original semantics, we will have to make the value of the ThreadLocal to be a map of CachedObject vs Context. Since multiple threads will never access this map [which is local to the thread], this map does not need to be synchronized. So no performance bottlenecks and since the remove method is exposed, no memory issues. The new reference graph looks like: New reference diagram

and the source for the ThreadLocalMap would look like:

1
2    import java.util.*;
3
4    public class ThreadLocalMap
5            implements Map
6    {
7        private ThreadLocal threadLocal = new ThreadLocal();
8
9        private Map getThreadLocalMap(){
10           Map map = (Map) threadLocal.get();
11           if(map==null){
12               map = new HashMap();
13               threadLocal.set(map);
14           }
15           return map;
16       }
17
18       public Object put(Object key, Object value)
19       {
20           return getThreadLocalMap().put(key, value);
21       }
22
23       public Object get(Object key)
24       {
25           return getThreadLocalMap().get(key);
26       }
27
28       public Object remove(Object key)
29       {
30           return getThreadLocalMap().remove(key);
31       }
32
33       //code snipped for brevity ...
34
77   }
Share:
  • email
  • del.icio.us
  • DZone
  • Technorati
  • Reddit
  • Ma.gnolia
  • Google Bookmarks
  • YahooMyWeb
  • SphereIt
  • StumbleUpon
  • Digg
  • Mixx
  • TwitThis
  • Furl
  • Simpy

No trackbacks

4 comments


  1. rajiv


    I have posted an update to this after receiving some feedback.

  2. Glenn Bech


    Great! Thanks for pointing out a potential application killer, and insight into the ThreadLocal lifecycle.


  3. Daniel Woo


    I found similar issue in a team code review long before, and I believe Sun should change the Thread class to hold the ThreadLocal instances with a WeakHashMap.


  4. Vladimir Ralev


    Absolutely. I just re-discovered it and googled around to see how many angry blogposts would be out there :)

Leave a Reply

Subscribe to comments on this post

Allowed tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>