Wednesday, July 21, 2010

Scale your cache to a terabyte with Terracotta Distributed Ehcache 2.2

Introduction :

In todays world of cloud computing and internet scale applications, data scale and availability are becoming an increasing burden to developers. With that problem in mind we are excited to release our latest snap in versions of Terracotta Ehcache. With this release and a few lines of config you are able to scale an application to terabytes of highly available data and 100s of nodes. We  accomplish this for our users with a new Terracotta based storage strategy called  DCV2  (for "Distributed Cache Version 2").  In this post, I would like to demystify DCV2 by explaining a little bit about the design, inner workings and limitations of DCV2 in its current form.

The Problem :

Terracotta Server Array already provides striping capabilities to the cluster, allowing the users scale to their demands. Before we built DCV2, distributed Ehcache using the "classic storage strategy" had the ability to flush and fault values from a cache at its discretion depending on access pattern and memory requirements, but not the keys. All the keys in the cache had to be resident in all the clients memory.

Keys are generally very small compared to the values. Having the keys in memory at the client nodes helped us achieve better performance for certain operations and also allowed us to do some nifty optimizations around locking and prefetching of values so we could be fast and still be coherent. It also allowed us to run more efficient eviction algorithms at the client nodes since we have more details about access patterns here. These optimizations gave us pretty good performance numbers for smaller to medium sized caches with even up to 5-10 million entries.

But this strategy posed a few issues for very large caches :
  • Since all the keys are present in the client heap, large caches (50-100+ million entries)  imposes bigger heap requirement at the client nodes even though keys are generally small.
  • Java takes a long time to do full GCs of large heaps, so increasing the heap was not a great solution.
  • There is a penalty we have to pay for each node in the cluster  since updates to the cache needs to be broadcasted. This is minimal with all the batching we do but it is still there and can be seen for large clusters with a lot of nodes.
DCV2 for large caches :

So to solve all these issues for very large caches, we built a new storage strategy from ground up and did a lot of optimizations to achieve good scale-out characteristics. Some of the key design aspects of DCV2 are listed below.
  • As you might have guessed by now, in DCV2, all the keys in the cache are not kept at the client nodes thus relieving the memory requirement from the client nodes. They can be thin clients with very little heap but still be connected to a huge cluster and accessing a large cache.
  • The state of the cache is maintained by the Terracotta Server Array as usual.
  • The updates to the caches are not broadcasted to any client nodes in the cluster.
  • To have the cache coherent, locks are given out to the client nodes for the entries they access and they can hold them greedily until someone else in the cluster needs them or they don't need them anymore.
  • We build an efficient local memory cache that DCV2 uses at the client side so that each node can hit the local cache for its hot set and not make multiple trips to the server. This cache grows or shrinks as memory demands. One can also hand tune it for whatever reason.
  • The local memory cache only caches entries that are protected by the locks that are owned by the current node. Hence we can afford to not broadcast any changes and still be coherent with low read latency.
  • The eviction algorithm runs at the Terracotta Server array where it is closest to the data, without needing to fault any data to the client, thus being faster.
Results  :

With DCV2 strategy we got the following benefits.
  • Since the client nodes can now fault and flush entries (as opposed to only values) as needed from the Terracotta Server Array, which can be scaled up as needed, it gives us the ability to scale the cache to over 100+ million entries with a terabyte of data in the cache !
  • Since we don't broadcast any updates to the cache, it performed better esp. with higher number of nodes. In some cases we got up to 5X improvement in write tps while still maintaining the same read tps.
  • Each node doesn't add any extra overhead to the cluster thus helping us scale to a large number of nodes. 
  • The node startup time is also almost instantaneous even when connecting to a large cluster with a large cache.
  • You still get all the benefits of a clustered coherent cache with every low latency and high throughput.
 Limitations :

There are still more features and improvements planned for DCV2. In its current form, it has the following limitations.
  • Some operations like getKeys() which returns a list of all keys are not implemented yet. (You shouldn't call getKeys() on a cache with 100 million entries anyways !) We have plans of supporting these kinds of operation with some kind of cursor in future.
  • Some features like eviction events in Ehcache are not supported with DCV2.
  • Incoherent mode as described in my previous post,  is  not optimized for DCV2 yet.
  • For certain kind of usage pattern with smaller to medium size caches, you might find the classic mode to be a better fit. In general we found DCV2 to be as fast as or faster in most cases esp. for larger caches.
Using DCV2 :

To try out DCV2, you can download our enterprise kit from here. You can enable the new storage strategy either in ehcache.xml or programmatically.

<cache name="MyLargeCache">
<terracotta clustered="true" storageStrategy="DCV2"/>

//programmatically ... 
CacheConfiguration cacheConfiguration = new CacheConfiguration("MyLargeCache");
TerracottaConfiguration terracottaConfig = new TerracottaConfiguration().clustered(true);
cache = new Cache(cacheConfiguration); ...