Interesting Post on Handling Large Data Volumes

Over on the HighScalability blog, there is an interesting post on how Sify.com handles scaling the web site to 3900 requests per second on just 30 VMs (across 4 physical machines). In the Future section of the article, the notion of using Drools for cache invalidation really grabbed my attention. Drools is a rules engine that implements the Rete algorithm to resolve rules. The Rete algorithm emphasizes speed of evaluation over memory consumption. Rules engines that support forward chaining and inference will normally implement Rete in some form. BizTalk (and I would assume Windows Workflow Foundation) also use Rete.

It was the notion of using a rules engine that really grabbed my attention. One of the problems with cache invalidation is that the easy stuff to cache is just that, easy. No thought is required to cache the front page of your web site. But, if your website is “addictive” in any fashion (think Facebook, MySpace, Fidelity.com, Digg, etc.), the personalized data that each user gets is cacheable too. When looking at overall traffic patterns, the data is light on writes and heavy on reads. Individual pieces of data may appear on many pages in the application. When that data changes, you want to invalidate any cached values that use that information. Figuring out and maintaining how to list all the places consumes and cache friend status is tough, especially if the goal is to do so in a centralized fashion. However, if I can add rules that state “I watch Scott’s status. If that changes, invalidate this cache location.” then I can make an interesting system.

I’ve been in a number of .NET shops that seem to stay away from Workflow Foundation. I wonder if products like Windows Server AppFabric and the cache server might finally get folks to look at using Windows Workflow for the rules engine. At the moment, this seems like an idea worth pursuing, just to see how it works out in the end. I also wonder if one could use the rules to do in place updates to the cache, so that instead of invalidation, we get a newly valid copy.

As of now, this idea is up on my white board as something to dig into after I get some other work done. If you hit this idea sooner, please let me know your results (scott@scottseely.com)!

Leave a comment

Notes from Software Engineering Talk

I gave a talk at Milwaukee Area Technical College where my friend, Chuck Andersen, teaches a software engineering class. I promised the students to put up some interview study resources. This is the set of things I do to prepare for more in depth interviews so that I clear the algorithm questions when folks do a technical screen. I really hate the idea of being passed over because I haven’t thought about some undergrad algorithms in a few years, so I get these things back into the more recent memory parts of my brain.

My study resources are:

Programming Pearls by Jon Bentley: 256 pages of good review material

The Algorithm Design Manual by Steven Skiena: Amazon has the wrong page count on this one: 486 pages of great review material. Get the Kindle version-this appears to be out of print and valuable otherwise. I know I didn’t pay $200+ for this book.

Project Euler: Go through 1-2 of these per week, just to stay in shape.

Leave a comment

Friseton, LLC is Open for Business

My last day as someone’s employee was Friday, May 7. As of today, I have completely jumped into the world of the self-employed. My wife and I started a company named Friseton, LLC (yes, I married I developer!). What does Friseton, LLC (which is really just me and my wife) do? Well, I’m glad you asked.

We consult on distributed application architecture and development. I personally have worked on architecture for small applications with only a few computers to systems with thousands of cooperating computers. I have worked on architecture in both traditional enterprise applications as well as for one of the five most popular web sites on the planet (circa 2008/9).

We’ve also invested a lot of time into understanding and developing on Azure, Silverlight, and Windows Phone 7. As the firm grows beyond the first two founders, we expect to also invest time into release applications on Azure and Windows Phone.

If you are interested in discussing an opportunity, please feel free to contact me: scott.seely@friseton.com.

Leave a comment

Custom ChannelFactory Creation

Just the other day, Derik Whitaker ran into some issues setting up his ChannelFactory to handle large object graphs being returned to his clients (post is here). After some back and forth through email, we came up with a solution. Instead of use the default ChannelFactory<T>, we created a new class that inherits from ChannelFactory<T> and sets the DataContractSerializerBehavior to handle int.MaxValue objects in the graph.

The trick is to override the ChannelFactory<T>.OnOpening method. This method is called as the ChannelFactory is opened and allows a derived class to alter the behavior at the last minute. All OperationDescriptions have a DataContractSerializerOperationBehavior attached to them. What we want to do is pull out that behavior and set the MaxItemsInObjectGraph property to int.MaxValue so that it allows all content to be serialized in. Derik’s use case was valid-he owned the client and server and wanted to incur any penalty associated with reading ALL data. If you are in a similar situation and need to remove that safety net/throttle in your code, here is what you need. Note that the constructors aren’t interesting other than they preserve the signatures made available through ChannelFactory<T> and make them visible in my DerikChannelFactory<T>.

 

public class DerikChannelFactory<T> : ChannelFactory<T>
{
    public DerikChannelFactory(Binding binding) :
        base(binding) { }

    public DerikChannelFactory(ServiceEndpoint endpoint) :
        base(endpoint) { }

    public DerikChannelFactory(string endpointConfigurationName) :
        base(endpointConfigurationName) { }

    public DerikChannelFactory(Binding binding, EndpointAddress remoteAddress) :
        base(binding, remoteAddress) { }

    public DerikChannelFactory(Binding binding, string remoteAddress) :
        base(binding, remoteAddress) { }

    public DerikChannelFactory(string endpointConfigurationName,
        EndpointAddress remoteAddress) :
        base(endpointConfigurationName, remoteAddress) { }

    protected override void OnOpening()
    {
        foreach (var operation in Endpoint.Contract.Operations)
        {
            var behavior =
                operation.Behaviors.
                    Find<DataContractSerializerOperationBehavior>();
            if (behavior != null)
            {
                behavior.MaxItemsInObjectGraph = int.MaxValue;
            }
        }
        base.OnOpening();
    }
}

 

The OnOpening override is also a good place to inject behaviors or other items if you want to make sure that all ChannelFactory instances have the same setup without resorting to configuration or code for each instance.

Leave a comment

Move to WCF 4.0 for Less Configuration/Code

People have lots of complaints around WCF. For the 3.x codebase, many don’t like the amount of configuration one has to write or code in order to get a service up and running. For example, let’s assume that we have a simple service contract, IEchoService.

[ServiceContract(Namespace="http://www.friseton.com/Echo")]
interface IEchoService
{
  [OperationContract]
  string Echo(string value);
}

The class is implemented by EchoService:

class EchoService : IEchoService
{
  public string Echo(string value)
  {
    return value;
  }
}

In .NET 3.x, we would then have to setup some endpoints, each endpoint specific to the protocol we wanted to understand. We had to remember things like “URLs that begin with net.tcp use the NetTcpBinding.” For intranet and local machine communication, this is a pain in the butt. In .NET 4.0, the common case of taking the defaults is much easier. If you plan on listening at the base URL(s) for the service, a console application can look like this:

(code only)

var netTcp = new Uri(string.Format("net.tcp://{0}/EchoService",
  Environment.MachineName));
var netPipe = new Uri(string.Format("net.pipe://{0}/EchoService",
  Environment.MachineName));
using (var host = new ServiceHost(typeof(EchoService), netTcp, netPipe))
{
  host.Open();
  Console.WriteLine("Press [Enter] to exit.");
  Console.ReadLine();
}

You could also configure the base URIs if you wanted this all to be dynamic. This mechanism only works if you don’t explicitly add any endpoints. Choosing to add any endpoint: discovery, metadata, or a specific contract WILL mean you have to specify everything. The implicit behavior will expose all contracts on the endpoint, so a service that implements 2 or more contracts will listen for all contracts when you use implicit listeners.

Leave a comment

More with Discovery, Day 4

Previously, we looked at configuring discovery on the server. What about the client? To discover from the client, we use a class named DiscoveryClient. DiscoveryClient implements the WS-Discovery protocol. Discovery is typically done over UDP because UDP allows for endpoints to broadcast a message.

The client uses a FindCriteria instance. In our case, we will ask for discovery to give us the metadata exchange endpoints that have definitions for ITest. Upon finding 1 of these, or timing out, we will resolve the metadata exchange endpoint and ask for information about the endpoint. If at least one of those is found (which it should be but it may disappear in between the first request and this one), extract the ITest information and create an ITest ChannelFactory using the discovered binding and endpoint. Sample code looks exactly like this:

// Create a client to find ITest instance. Return as soon as
// 1 is found.
var discoveryClient = new DiscoveryClient(new UdpDiscoveryEndpoint());
var criteria = FindCriteria.
  CreateMetadataExchangeEndpointCriteria(typeof (ITest));
criteria.MaxResults = 1;
var findResponse = discoveryClient.Find(criteria);
discoveryClient.Close();
if (findResponse.Endpoints.Count > 0)
{
  // Resolve the metadata for the first address.
  // Return the binding and address information.
  var endpoints = MetadataResolver.Resolve(typeof (ITest),
    findResponse.Endpoints[0].Address);
  if (endpoints.Count > 0)
  {
    // Create a factory based on the binding and address information
    // we received from the metadata endpoint.
    var factory = new ChannelFactory<ITest>(endpoints[0].Binding,
      endpoints[0].Address);
    var channel = factory.CreateChannel();

    // Call the add function
    Console.WriteLine(channel.Add(3, 4));
    factory.Close();
  }
}

The above code will fail if authentication credentials other than Windows or anonymous are required. But, if you use standard windows authentication on the service (or nothing) this works well. Discovery is well suited to intranet scenarios, because things like Windows identities and authentication are already in use.

Leave a comment

More with Discovery, Day 3

By now, you might be wondering where a person would actually use discovery. A common case would be allowing two processes on the same machine to find each other and allow for dynamic naming of all endpoints (such as using GUIDs in the URLs). This could be used by every Windows Service that has an application running in the system tray. In the enterprise, you would use discovery as one part of a publish and subscribe system. The subscribers would query for all endpoints that publish some kind of information (which would allow the subscribers to poll). Alternatively, a publisher could periodically ask for all entities on the network that were interested in a given topic and push to those endpoints. Likewise, a client could look for a service that implemented some other functionality and dynamically configure itself (instead of needing a priori knowledge about how infrastructure is deployed).

To make a service discoverable on the server, you add a ServiceDiscoveryBehavior to the service. This behavior, when combined with a UdpDiscoveryEndpoint, allows the service to be found over a broadcast message sent on the network. If you want clients to be able to automatically configure themselves, you need to add an IMetadataExchange endpoint as well. The IMetadataExchange endpoint allows the service to send information about the contracts in use, the bindings against those contracts, and address information on where the service is listening for messages.

The following code constructs a ServiceHost for some service, TestService, that implements a contract named ITest.

var baseUri = string.Format("net.tcp://{0}/Test", Environment.MachineName);
using (var host = new ServiceHost(typeof(TestService), new Uri(baseUri)))
{
  // Make the service discoverable and make sure it has an
  // endpoint to announce its presence.
  var discoveryBehavior = new ServiceDiscoveryBehavior();
  discoveryBehavior.AnnouncementEndpoints.Add(
    new UdpAnnouncementEndpoint());
  host.Description.Behaviors.Add(discoveryBehavior);

  // Make sure the service can respond to probes.
  host.AddServiceEndpoint(new UdpDiscoveryEndpoint());

  // Add the ability to handle Metadata requests (aka WSDL)
  host.Description.Behaviors.Add(new ServiceMetadataBehavior());
  host.AddServiceEndpoint(typeof(IMetadataExchange),
    MetadataExchangeBindings.CreateMexTcpBinding(), "/mex");

  // Tell the service to listen for ITest messages on net.tcp
  host.AddServiceEndpoint(typeof (ITest), new NetTcpBinding(),
    string.Empty);

  // Open the host and start listening.
  host.Open();

  // Display what we are listening for
  foreach (var endpoint in host.Description.Endpoints)
  {
    Console.WriteLine("{0}: {1}",
      endpoint.Contract.Name,
      endpoint.ListenUri);
  }

  // Wait until we are done.
  Console.WriteLine("Press [Enter] to exit");
  Console.ReadLine();
}

 

With this code, we have a service that is discoverable and callable without requiring a client to have any code or configuration that is specific to our service (though it may if the developer chooses).

Leave a comment

Limiting time for Discovery Part II

Previously, I had thought I found a way to speed up discovery by limiting the search time. It turns out that there are better ways to achieve the same goal. When limiting duration, you are trying to find all endpoints implementing a given contract within a specific period of time. However, you may have other criteria, like you just want an endpoint that implements a contract. As soon as you find that endpoint, you are happy. This is pretty simple to do as well. On your FindCriteria object, just tell it you want it to stop when it finds 1 instance OR times out. Instead of setting the duration, set the MaxResults property to 1:

FindCriteria criteria = new FindCriteria(typeof(ITest));
criteria.MaxResults = 1;

That’s all there is to it!

Leave a comment

First Experiences with WS-Discovery

I was digging into WCF’s implementation of WS-Discovery today and was somewhat appalled by how long it took to discover a service from a client when both endpoints lived on the same machine. I setup tracing and message logging to dig into why things were taking so long. Inside the messages, I found this nugget in the WS-Discovery probe messages:

<s:Body>
  <Probe xmlns="http://docs.oasis-open.org/ws-dd/ns/discovery/2009/01">
    <d:Types 
      xmlns:d="http://docs.oasis-open.org/ws-dd/ns/discovery/2009/01" 
      xmlns:dp0="http://tempuri.org/">dp0:ITest</d:Types>
    <Duration 
       xmlns="http://schemas.microsoft.com/ws/2008/06/discovery">
       PT20S</Duration>
  </Probe>
</s:Body>

 

I thought to myself, “That looks like a TimeSpan. I wonder how I can set it.” If you haven’t used WS-Discovery on the client in WCF, let me walk you through the basic few lines of code that get things set up. You need a DiscoveryClient which knows how to send the probe messages and extract endpoints from the results. When the DiscoveryClient sends out a request, it sends out the request asking for services that implement a specific type. In our case, we are looking for a type in the http://tempuri.org/ XML Namespace where the type is named ITest.You state what you are looking for using a FindCriteria object. The code looks like this:

var discoveryClient = new DiscoveryClient(new UdpDiscoveryEndpoint());
var criteria = new FindCriteria(typeof (ITest));
var findResponse = discoveryClient.Find(criteria);
 

Looking at the data which appeared in the Probe, I thought it looked at awful lot like FindCriteria since FindCriteria was the only thing I told about the type I wanted to talk to. I took a quick look at the FindCriteria object via IntelliSense and found a member called Duration. For grins, I set it to 100 milliseconds:

var discoveryClient = new DiscoveryClient(new UdpDiscoveryEndpoint());
var criteria = new FindCriteria(typeof (ITest));
criteria.Duration = TimeSpan.FromMilliseconds(100);
var findResponse = discoveryClient.Find(criteria);

Suddenly, discovery ran faster. I also saw that the probe Duration was now set to PT0.1S.

So, what did I learn? I learned that, by default, Discovery on WCF will allow for 20 seconds to find all endpoints. If you know the endpoint is close and the collection of implementations is small, you can ratchet the discovery time down to something reasonable for your situation.

Leave a comment

How Office Automation Saved my Morning

I had procrastinated on mailing out information for the Chicago Give Camp. I wanted to make sure the email went out early in the week and in the morning so that people would probably read it. This morning, I was determined to get the mailing out. I started out brute forcing this thing, but the tedium hit me fast. Switching between apps, double checking that I copy/pasted the right info, and that I didn’t have any screw ups got to me fast-after about 6 messages. I had another 84 to go.

The email I was sending out was a classic form letter: insert recipients name in one spot, insert my info in a few others, and send. I wanted all the email history to show up in my Outlook ‘Sent Items’ and I wanted the message to look nice (aka HTML formatting). In about 20 minutes, I had the task completed and the email sent. Here is  what I did:

1. I saved the form letter as HTML and made sure that the fields to replace were easily identified. I was going to use string.Replace(string, string) to fill in the form fields. I added the HTML file to the solution and told VS to copy the file to the output directory on build. The file isn’t a resource, just an asset that shows up in a well known location.

2. I identified where I needed to stop and start in the spreadsheet. I was on row 8 and needed to go through row 89. I didn’t need a general purpose solution, I needed something that saved me from mind-numbing tedium, so I hard coded these values.

3. I identified which columns contained the information I needed and ran a quick test to get the values out of the cells from Excel.

4. I tested a couple of times with sending the email  to myself instead of to the actual recipient. This was a low bar unit test that was easy to remove once things appeared to work.

5. I changed the code to send to the actual recipient and, once all the messages went out, marveled at a job well done!

As software developers, we frequently write tools that are meant to be general purpose. Some days, it’s fun to just write a piece of throwaway code that doesn’t solve any grand problems, but does allow you to get a one time task done quickly. Today was one of those days.

Here is the code, in case you are curious. Cut and paste into your own applications at your own risk. This code is not production ready, and other disclaimers that basically mean run  this code in a debugger.

static void Main(string[] args)
{
    var excelApp = new Microsoft.Office.Interop.Excel.Application();
    var outlookApp = new Microsoft.Office.Interop.Outlook.Application();
    var spreadsheet = excelApp.Workbooks.Open(
          @"C:UsersScott SeelyDownloadsChicago Charities.xlsx");
    Microsoft.Office.Interop.Excel.Worksheet worksheet = spreadsheet.Worksheets[1];
    string originalEmail = File.ReadAllText("GiveCampLetter.htm")
          .Replace("[Insert your name]", "Scott Seely")
          .Replace("[insert your email]", "xxxx@xxxxx.xxx")
          .Replace("[insert your preferred contact number]", "847-xxx-xxxx");
    for  (int i = 8; i < 90; ++i)
    {
        dynamic realnameCell = worksheet.Cells[i, "C"];
        var realname = realnameCell.FormulaLocal;
        dynamic emailCell = worksheet.Cells[i, "F"];
        var email = emailCell.FormulaLocal;
        if (string.IsNullOrEmpty(realname) || string.IsNullOrEmpty(email))
        {
            continue;
        }
        Microsoft.Office.Interop.Outlook.MailItem mail = outlookApp.CreateItem(
          Microsoft.Office.Interop.Outlook.OlItemType.olMailItem);
        mail.Subject = "Midwest Give Camp";
        mail.To = email;
        mail.HTMLBody = originalEmail.Replace("[insert contact name]", realname);
        mail.Send();
        Console.WriteLine("{0}: {1}", realname, email);
    }
}

Leave a comment