Wednesday, February 17, 2010

Terracotta Distributed Ehcache : New fast incoherent mode


Introduction :

One of the main advantage that Terracotta brings to Ehcache users is Cache coherency in a distributed environment. A coherent cache will always give consistent view of its data. Where as with an incoherent cache, you can read stale data, data that is not valid anymore.

Terracotta as a distributed platform provides scalability, availability and coherent view of data by default. So it was easy for us to implement a coherent cache plugin for ehcache using Terracotta as the distributed platform.

So whats the problem ?

So to keep the cache coherent there is a certain cost involved. For example, row level locking is implemented internally to prevent two threads from two nodes updating the same entry at the same time. Various optimizations like greedy locks, optimistic prefetching, transaction folding etc. have been implemented to make this super fast while keeping the cache coherent.

But we felt there are still some scenarios where users will be willing to compromise cache coherency to gain extra performance. Some scenarios where one might not care about coherent cache are
  • In Read-only cache (no updates - so no stale data)
  • When bulk loading entries into a cache (a cron job type of batch loading during non-peak hours)
  • When warming up the cache for the first time before use
  • In usecases where reading stale values once in a while is acceptable
Solution :

With the latest release of Terracotta (3.2.1) and Ehcache (2.0) , we are introducing a new incoherent mode to our distributed Ehcache. To expose such a mode to the users, we had to enhance the Ehcache Interface to include the following methods.


/**
* Returns true if the cache is in coherent mode cluster-wide. Returns false otherwise.
* <p />
* It applies to coherent clustering mechanisms only e.g. Terracotta
*
* @return true if the cache is in coherent mode cluster-wide, false otherwise
*/
public boolean isClusterCoherent();

/**
* Returns true if the cache is in coherent mode for the current node. Returns false otherwise.
* <p />
* It applies to coherent clustering mechanisms only e.g. Terracotta
*
* @return true if the cache is in coherent mode cluster-wide, false otherwise
*/
public boolean isNodeCoherent();

/**
* Sets the cache in coherent or incoherent mode depending on the parameter on this node.
* Calling {@code setNodeCoherent(true)} when the cache is already in coherent mode or
* calling {@code setNodeCoherent(false)} when already in incoherent mode will be a no-op.
* <p />
* It applies to coherent clustering mechanisms only e.g. Terracotta
*
* @param coherent
* true transitions to coherent mode, false to incoherent mode
* @throws UnsupportedOperationException if this cache does not support coherence, like RMI replication
*/
public void setNodeCoherent(boolean coherent) throws UnsupportedOperationException;

/**
* This method waits until the cache is in coherent mode in all the connected nodes. If the cache is already in coherent mode it returns
* immediately
* <p />
* It applies to coherent clustering mechanisms only e.g. Terracotta
* @throws UnsupportedOperationException if this cache does not support coherence, like RMI replication
*/
public void waitUntilClusterCoherent() throws UnsupportedOperationException;


All these methods are self-explanatory. One important detail to note is that when setting a cache to coherent or incoherent mode by calling setNodeCoherent(), the mode is changed for that cache only in that particular node and not cluster wide, as denoted by the method name. This is done so that if a node set a cache to incoherent mode and then crashes, the cache automatically moves to coherent mode as soon as that node disconnects. This also means every node that wants to use the cache in an incoherent fashion has to call this method.

There is also a new configuration attributed added to the terracotta element in ehcache.xml that can setup a cache in incoherent mode on startup. By default, a cache is always in coherent mode unless set in config.

<xs:element name="terracotta">
<xs:complexType>
<xs:attribute name="coherent" use="optional" type="xs:boolean" default="true"/>
</xs:complexType>
</xs:element>

The waitUntilClusterCoherent() method in the Ehcache interface is provided as a convenient mechanism for multiple cache nodes to synchronize among themselves and to wait until all of them are finished loading in incoherent mode, without using any kind of external distributed cyclic barrier. Anyone calling this method on the cache will wait until all the nodes have the cache back in coherent mode. If the cache is already in coherent mode in all nodes then this method just returns immediately.

Test :

To test the performance of incoherent mode compared to coherent mode, we wrote a simple test using Pet Clinic Data Model but we short circuited the database access just to remove the database bottleneck.

The test loads 300,000 owners with each owner having many pets of various types. It ends up loading about 3,000,000 entries across 7 different caches.

The tests were all run in both coherent mode and incoherent mode on similar 8GB, 16 core Linux boxes connected to same Gigabit ethernet backbone.

Test Results :

Number of Cache Nodes

Number of Threads / Cache Node

Cache Entries to load

Coherent Mode – TPS / Time to Load

Incoherent Mode – TPS / Time to Load

Improvement over coherent mode

1

1

3 Million

943 tps – 318.2 sec

3371 tps – 89.4 sec

3.6X Faster

1

8

3 Million

3093 tps – 97.6 sec

7895 tps – 38.6 sec

2.6X Faster

6

1

3 Million

1251 tps – 240 sec

14405 tps – 21.6 sec

11.5X Faster

6

8

3 Million

1416 tps – 211.9 sec

16245 tps – 18.5 sec

11.5X Faster


Conclusion :

As you can see from the above table, in incoherent mode the cache is faster by a factor of 3X to 12X. The benefits are increasingly huge as you increase the number of nodes in the cluster. In scenarios where one can compromise on cache coherency, users can now take advantage of this new incoherent mode to reap huge performance gains with the latest release of Terracotta 3.2.1 and Ehcache 2.0

4 comments:

  1. If I have say 10 elements in a cache each having a list of say 100 objects, will my total size of cache be 10 or the total of all the objects in cache(part of list).
    when will the maxElementsinMemory reach its threshold, when the number of keys (elements) reach that count or when the toal objects in all reach the max element count.

    ReplyDelete
  2. Size of the cache would be equal to the number of elements in the cache. So in your example, the size would be 10. maxElementsInMemory also works against the number of keys/elements in that tier. You can also size tiers using bytes. For more details check out http://ehcache.org/documentation/configuration/cache-size

    ReplyDelete
  3. Can we enable/disable bulk mode via ehCache.xml file rather than using api (or programmatically)?

    ReplyDelete
  4. Bulk mode is a transient state of the cache used only when loading data. Hence it has to be done programatically.

    ReplyDelete