Interesting Post on Handling Large Data Volumes

Over on the HighScalability blog, there is an interesting post on how Sify.com handles scaling the web site to 3900 requests per second on just 30 VMs (across 4 physical machines). In the Future section of the article, the notion of using Drools for cache invalidation really grabbed my attention. Drools is a rules engine that implements the Rete algorithm to resolve rules. The Rete algorithm emphasizes speed of evaluation over memory consumption. Rules engines that support forward chaining and inference will normally implement Rete in some form. BizTalk (and I would assume Windows Workflow Foundation) also use Rete.

It was the notion of using a rules engine that really grabbed my attention. One of the problems with cache invalidation is that the easy stuff to cache is just that, easy. No thought is required to cache the front page of your web site. But, if your website is “addictive” in any fashion (think Facebook, MySpace, Fidelity.com, Digg, etc.), the personalized data that each user gets is cacheable too. When looking at overall traffic patterns, the data is light on writes and heavy on reads. Individual pieces of data may appear on many pages in the application. When that data changes, you want to invalidate any cached values that use that information. Figuring out and maintaining how to list all the places consumes and cache friend status is tough, especially if the goal is to do so in a centralized fashion. However, if I can add rules that state “I watch Scott’s status. If that changes, invalidate this cache location.” then I can make an interesting system.

I’ve been in a number of .NET shops that seem to stay away from Workflow Foundation. I wonder if products like Windows Server AppFabric and the cache server might finally get folks to look at using Windows Workflow for the rules engine. At the moment, this seems like an idea worth pursuing, just to see how it works out in the end. I also wonder if one could use the rules to do in place updates to the cache, so that instead of invalidation, we get a newly valid copy.

As of now, this idea is up on my white board as something to dig into after I get some other work done. If you hit this idea sooner, please let me know your results (scott@scottseely.com)!

This entry was posted on May 14, 2010, 8:13 am and is filed under Uncategorized. You can follow any responses to this entry through RSS 2.0. Both comments and pings are currently closed.

Karmic Code Monkey