Scott Seely

Unknown's avatar

This user hasn't shared any biographical information

Homepage: https://scottseely.wordpress.com

Reading a WebResponse into a byte[]

This question came up on Twitter. I’m posting the solution here for posterity. How do you read a non-seekable Stream into a byte[]? Specifically, a HttpWebResponse? Like this:

 

class Program{
  static void Main(string[] args)
  {
    var request = WebRequest.Create("http://www.scottseely.com/blog.aspx");
    var response = request.GetResponse() as HttpWebResponse;
    var stream = response.GetResponseStream();
    var buffer = new byte[int.Parse(response.Headers["Content-Length"])];
    var bytesRead = 0;
    var totalBytesRead = bytesRead;
    while(totalBytesRead < buffer.Length)
    {
      bytesRead = stream.Read(buffer, bytesRead, buffer.Length - bytesRead);
      totalBytesRead += bytesRead;
    }
    Console.WriteLine(Encoding.UTF8.GetString(buffer, 0, totalBytesRead));
  }
}

Leave a comment

XmlDictionary and Binary Serialization

One of the interesting things that came out of WCF is the improvements in Infoset serialization. In particular, WCF introduced a format for binary serialization which reduces space concerns for objects. One of the keys to saving space is the notion of an XmlDictionary. The WCF serialization folks asked the questions:

How much could we reduce the size of a message if we allowed the parties communicating to exchange metadata about the messages?

What if we could reduce the size of messages by exchanging aliases for the XML Infoset node names?

The result of this what if experiment is the XmlDictionary and XmlBinaryWriterSession. The mechanism is astonishingly simple. Assume that both ends have a mechanism for exchanging information about what to call the two parts of a QName: name namespace and the name of the node. Instead of sending namespace:element qualified items, send aliases. This works well in WCF messaging and happens whenever you send messages over the binary serializer. You can also use this in your own code that uses a binary serializer. The only requirement is that the serializer and deserializer have to agree on the makeup of the XmlDictionary. Let’s start by looking at some code that does plain old binary serialization.

We start with an object:

[DataContract(Namespace = "http://www.friseton.com/Name/2010/06")]
public class Person
{
  [DataMember]
  public string FirstName { get; set; }

  [DataMember]
  public string LastName { get; set; }

  [DataMember]
  public DateTime Birthday { get; set; }
}

I then have a ‘driver’ program:

 

static void Main(string[] args)
{
  var person = new Person
               {
                 FirstName = "Scott",
                 LastName = "Seely",
                 Birthday = new DateTime(1900, 4, 5)
               };
  var serializer = new DataContractSerializer(typeof (Person));
  Console.WriteLine("Serialize Binary: {0} bytes",
    SerializeBinary(person, serializer).Length);
  Console.WriteLine("Serialize Binary with Dictionary: {0} bytes",
    SerializeBinaryWithDictionary(person, serializer).Length);
}

The application emits the size of the streams when each object is written out. The first, SerializeBinary, does not use a dictionary. As a result, it won’t have access to the aliases and must instead write out the full object.

private static Stream SerializeBinary(Person person,
  DataContractSerializer serializer)
{
  var stream = new MemoryStream();
  var writer = XmlDictionaryWriter.CreateBinaryWriter(stream);
  serializer.WriteObject(writer, person);
  writer.Flush();
  return stream;
}

In this case, we get a stream which contains 146 bytes. That’s pretty poor considering that we are interested in 10 characters (28 bytes: each string has a 4 byte length and then 2 bytes/character) and a simple DateTime representation (4 bytes). Can we make this smaller? How close can we get to 32 bytes? The answer: really close!

The version of SerializeBinaryWithDictionary that I wrote is verbose: it contains a number of lines that show what is going on internally. Your own code may be as long, but would include the lines as debug output.Please note that you need to include a reference to the XMLSchema-instance namespace in your dictionary so that both the reader and writer agree on the value of this attribute.

private static Stream SerializeBinaryWithDictionary(Person person,
  DataContractSerializer serializer)
{
  var stream = new MemoryStream();
  var dictionary = new XmlDictionary();
  var session = new XmlBinaryWriterSession();
  var key = 0;
  session.TryAdd(dictionary.Add("FirstName"), out key);
  Console.WriteLine("Added FirstName with key: {0}", key);
  session.TryAdd(dictionary.Add("LastName"), out key);
  Console.WriteLine("Added LastName with key: {0}", key);
  session.TryAdd(dictionary.Add("Birthday"), out key);
  Console.WriteLine("Added Birthday with key: {0}", key);
  session.TryAdd(dictionary.Add("Person"), out key);
  Console.WriteLine("Added Person with key: {0}", key);
  session.TryAdd(dictionary.Add("http://www.friseton.com/Name/2010/06"),
    out key);
  Console.WriteLine("Added xmlns with key: {0}", key);
  session.TryAdd(dictionary.Add("http://www.w3.org/2001/XMLSchema-instance"),
    out key);
  Console.WriteLine("Added xmlns for xsi with key: {0}", key);

  var writer = XmlDictionaryWriter.CreateBinaryWriter(
    stream, dictionary, session);
  serializer.WriteObject(writer, person);
  writer.Flush();
  return stream;
}

The size difference is striking: we shave off 108 bytes by using the dictionary. We are getting close to the same size as the memory footprint of the object data! The cool bit: you can use this in your own code. The dictionary needs to be shared between the reader and writer sessions (there is a corresponding XmlBinaryReaderSession which can also be populated from the common dictionary via the deserialization process). For posterity, the output of the program is:

Serialize Binary: 146 bytes

Added FirstName with key: 0

Added LastName with key: 1

Added Birthday with key: 2

Added Person with key: 3

Added xmlns with key: 4

Added xmlns for xsi with key: 5

Serialize Binary with Dictionary: 38 bytes

A slightly different version that shows both reading and writing with a shared understanding of what the dictionary looks like follows:

private static Stream SerializeBinaryWithDictionary(Person person,
  DataContractSerializer serializer)
{
  var strings = new List<XmlDictionaryString>();
  var stream = new MemoryStream();
  var dictionary = new XmlDictionary();
  var session = new XmlBinaryWriterSession();
  var rdr = new XmlBinaryReaderSession();

  var key = 0;
  strings.Add(dictionary.Add("FirstName"));
  strings.Add(dictionary.Add("LastName"));
  strings.Add(dictionary.Add("Birthday"));
  strings.Add(dictionary.Add("Person"));
  strings.Add(dictionary.Add("http://www.friseton.com/Name/2010/06"));
  strings.Add(dictionary.Add("http://www.w3.org/2001/XMLSchema-instance"));
  Console.WriteLine("Added xmlns with key: {0}", key);

  var writer = XmlDictionaryWriter.CreateBinaryWriter(
    stream, dictionary, session);

  foreach (var val in strings)
  {
    if (session.TryAdd(val, out key))
    {
      rdr.Add(key, val.Value);
    }
  }
  serializer.WriteObject(writer, person);
  writer.Flush();
  stream.Position = 0;
  var reader = XmlDictionaryReader.CreateBinaryReader(stream, dictionary,
    XmlDictionaryReaderQuotas.Max, rdr);
  var per = serializer.ReadObject(reader) as Person;
  writer.Flush();
  return stream;
}

 

Looking at the above, we can also account for the missing 6 bytes in our serialization: the extra 6 bytes are names of the nodes.

Leave a comment

Speaking at Chicago Architects Group May 18

I’ll be speaking at the Chicago Architects Group on May 18 over at the ITA (next to Union Station in Chicago- corner of Adams and Wacker). My topic is Azure for Architects. In this talk, I go over how to look at and use Azure from a software architecture point of view. Unlike most Azure talks, this one has no code in it-just concepts. This isn’t the type of talk I normally give, but given the crowd, architecture and slides will work better than whiz bang demos.

The slides are here if you want them. I tend to use slides as guideposts when I present. Please don’t look at these slides as notes. 80% of the presentation is in what I say, not in what you can read. I’ll try to record the presentation as well and will put up the recording if the quality is good enough. There are still some seats open. Register at http://chicagoarchitectsgroup.eventbrite.com.

Leave a comment

Interesting Post on Handling Large Data Volumes

Over on the HighScalability blog, there is an interesting post on how Sify.com handles scaling the web site to 3900 requests per second on just 30 VMs (across 4 physical machines). In the Future section of the article, the notion of using Drools for cache invalidation really grabbed my attention. Drools is a rules engine that implements the Rete algorithm to resolve rules. The Rete algorithm emphasizes speed of evaluation over memory consumption. Rules engines that support forward chaining and inference will normally implement Rete in some form. BizTalk (and I would assume Windows Workflow Foundation) also use Rete.

It was the notion of using a rules engine that really grabbed my attention. One of the problems with cache invalidation is that the easy stuff to cache is just that, easy. No thought is required to cache the front page of your web site. But, if your website is “addictive” in any fashion (think Facebook, MySpace, Fidelity.com, Digg, etc.), the personalized data that each user gets is cacheable too. When looking at overall traffic patterns, the data is light on writes and heavy on reads. Individual pieces of data may appear on many pages in the application. When that data changes, you want to invalidate any cached values that use that information. Figuring out and maintaining how to list all the places consumes and cache friend status is tough, especially if the goal is to do so in a centralized fashion. However, if I can add rules that state “I watch Scott’s status. If that changes, invalidate this cache location.” then I can make an interesting system.

I’ve been in a number of .NET shops that seem to stay away from Workflow Foundation. I wonder if products like Windows Server AppFabric and the cache server might finally get folks to look at using Windows Workflow for the rules engine. At the moment, this seems like an idea worth pursuing, just to see how it works out in the end. I also wonder if one could use the rules to do in place updates to the cache, so that instead of invalidation, we get a newly valid copy.

As of now, this idea is up on my white board as something to dig into after I get some other work done. If you hit this idea sooner, please let me know your results (scott@scottseely.com)!

Leave a comment

Notes from Software Engineering Talk

I gave a talk at Milwaukee Area Technical College where my friend, Chuck Andersen, teaches a software engineering class. I promised the students to put up some interview study resources. This is the set of things I do to prepare for more in depth interviews so that I clear the algorithm questions when folks do a technical screen. I really hate the idea of being passed over because I haven’t thought about some undergrad algorithms in a few years, so I get these things back into the more recent memory parts of my brain.

My study resources are:

Programming Pearls by Jon Bentley: 256 pages of good review material

The Algorithm Design Manual by Steven Skiena: Amazon has the wrong page count on this one: 486 pages of great review material. Get the Kindle version-this appears to be out of print and valuable otherwise. I know I didn’t pay $200+ for this book.

Project Euler: Go through 1-2 of these per week, just to stay in shape.

Leave a comment

Friseton, LLC is Open for Business

My last day as someone’s employee was Friday, May 7. As of today, I have completely jumped into the world of the self-employed. My wife and I started a company named Friseton, LLC (yes, I married I developer!). What does Friseton, LLC (which is really just me and my wife) do? Well, I’m glad you asked.

We consult on distributed application architecture and development. I personally have worked on architecture for small applications with only a few computers to systems with thousands of cooperating computers. I have worked on architecture in both traditional enterprise applications as well as for one of the five most popular web sites on the planet (circa 2008/9).

We’ve also invested a lot of time into understanding and developing on Azure, Silverlight, and Windows Phone 7. As the firm grows beyond the first two founders, we expect to also invest time into release applications on Azure and Windows Phone.

If you are interested in discussing an opportunity, please feel free to contact me: scott.seely@friseton.com.

Leave a comment

Custom ChannelFactory Creation

Just the other day, Derik Whitaker ran into some issues setting up his ChannelFactory to handle large object graphs being returned to his clients (post is here). After some back and forth through email, we came up with a solution. Instead of use the default ChannelFactory<T>, we created a new class that inherits from ChannelFactory<T> and sets the DataContractSerializerBehavior to handle int.MaxValue objects in the graph.

The trick is to override the ChannelFactory<T>.OnOpening method. This method is called as the ChannelFactory is opened and allows a derived class to alter the behavior at the last minute. All OperationDescriptions have a DataContractSerializerOperationBehavior attached to them. What we want to do is pull out that behavior and set the MaxItemsInObjectGraph property to int.MaxValue so that it allows all content to be serialized in. Derik’s use case was valid-he owned the client and server and wanted to incur any penalty associated with reading ALL data. If you are in a similar situation and need to remove that safety net/throttle in your code, here is what you need. Note that the constructors aren’t interesting other than they preserve the signatures made available through ChannelFactory<T> and make them visible in my DerikChannelFactory<T>.

 

public class DerikChannelFactory<T> : ChannelFactory<T>
{
    public DerikChannelFactory(Binding binding) :
        base(binding) { }

    public DerikChannelFactory(ServiceEndpoint endpoint) :
        base(endpoint) { }

    public DerikChannelFactory(string endpointConfigurationName) :
        base(endpointConfigurationName) { }

    public DerikChannelFactory(Binding binding, EndpointAddress remoteAddress) :
        base(binding, remoteAddress) { }

    public DerikChannelFactory(Binding binding, string remoteAddress) :
        base(binding, remoteAddress) { }

    public DerikChannelFactory(string endpointConfigurationName,
        EndpointAddress remoteAddress) :
        base(endpointConfigurationName, remoteAddress) { }

    protected override void OnOpening()
    {
        foreach (var operation in Endpoint.Contract.Operations)
        {
            var behavior =
                operation.Behaviors.
                    Find<DataContractSerializerOperationBehavior>();
            if (behavior != null)
            {
                behavior.MaxItemsInObjectGraph = int.MaxValue;
            }
        }
        base.OnOpening();
    }
}

 

The OnOpening override is also a good place to inject behaviors or other items if you want to make sure that all ChannelFactory instances have the same setup without resorting to configuration or code for each instance.

Leave a comment

Move to WCF 4.0 for Less Configuration/Code

People have lots of complaints around WCF. For the 3.x codebase, many don’t like the amount of configuration one has to write or code in order to get a service up and running. For example, let’s assume that we have a simple service contract, IEchoService.

[ServiceContract(Namespace="http://www.friseton.com/Echo")]
interface IEchoService
{
  [OperationContract]
  string Echo(string value);
}

The class is implemented by EchoService:

class EchoService : IEchoService
{
  public string Echo(string value)
  {
    return value;
  }
}

In .NET 3.x, we would then have to setup some endpoints, each endpoint specific to the protocol we wanted to understand. We had to remember things like “URLs that begin with net.tcp use the NetTcpBinding.” For intranet and local machine communication, this is a pain in the butt. In .NET 4.0, the common case of taking the defaults is much easier. If you plan on listening at the base URL(s) for the service, a console application can look like this:

(code only)

var netTcp = new Uri(string.Format("net.tcp://{0}/EchoService",
  Environment.MachineName));
var netPipe = new Uri(string.Format("net.pipe://{0}/EchoService",
  Environment.MachineName));
using (var host = new ServiceHost(typeof(EchoService), netTcp, netPipe))
{
  host.Open();
  Console.WriteLine("Press [Enter] to exit.");
  Console.ReadLine();
}

You could also configure the base URIs if you wanted this all to be dynamic. This mechanism only works if you don’t explicitly add any endpoints. Choosing to add any endpoint: discovery, metadata, or a specific contract WILL mean you have to specify everything. The implicit behavior will expose all contracts on the endpoint, so a service that implements 2 or more contracts will listen for all contracts when you use implicit listeners.

Leave a comment

More with Discovery, Day 4

Previously, we looked at configuring discovery on the server. What about the client? To discover from the client, we use a class named DiscoveryClient. DiscoveryClient implements the WS-Discovery protocol. Discovery is typically done over UDP because UDP allows for endpoints to broadcast a message.

The client uses a FindCriteria instance. In our case, we will ask for discovery to give us the metadata exchange endpoints that have definitions for ITest. Upon finding 1 of these, or timing out, we will resolve the metadata exchange endpoint and ask for information about the endpoint. If at least one of those is found (which it should be but it may disappear in between the first request and this one), extract the ITest information and create an ITest ChannelFactory using the discovered binding and endpoint. Sample code looks exactly like this:

// Create a client to find ITest instance. Return as soon as
// 1 is found.
var discoveryClient = new DiscoveryClient(new UdpDiscoveryEndpoint());
var criteria = FindCriteria.
  CreateMetadataExchangeEndpointCriteria(typeof (ITest));
criteria.MaxResults = 1;
var findResponse = discoveryClient.Find(criteria);
discoveryClient.Close();
if (findResponse.Endpoints.Count > 0)
{
  // Resolve the metadata for the first address.
  // Return the binding and address information.
  var endpoints = MetadataResolver.Resolve(typeof (ITest),
    findResponse.Endpoints[0].Address);
  if (endpoints.Count > 0)
  {
    // Create a factory based on the binding and address information
    // we received from the metadata endpoint.
    var factory = new ChannelFactory<ITest>(endpoints[0].Binding,
      endpoints[0].Address);
    var channel = factory.CreateChannel();

    // Call the add function
    Console.WriteLine(channel.Add(3, 4));
    factory.Close();
  }
}

The above code will fail if authentication credentials other than Windows or anonymous are required. But, if you use standard windows authentication on the service (or nothing) this works well. Discovery is well suited to intranet scenarios, because things like Windows identities and authentication are already in use.

Leave a comment

More with Discovery, Day 3

By now, you might be wondering where a person would actually use discovery. A common case would be allowing two processes on the same machine to find each other and allow for dynamic naming of all endpoints (such as using GUIDs in the URLs). This could be used by every Windows Service that has an application running in the system tray. In the enterprise, you would use discovery as one part of a publish and subscribe system. The subscribers would query for all endpoints that publish some kind of information (which would allow the subscribers to poll). Alternatively, a publisher could periodically ask for all entities on the network that were interested in a given topic and push to those endpoints. Likewise, a client could look for a service that implemented some other functionality and dynamically configure itself (instead of needing a priori knowledge about how infrastructure is deployed).

To make a service discoverable on the server, you add a ServiceDiscoveryBehavior to the service. This behavior, when combined with a UdpDiscoveryEndpoint, allows the service to be found over a broadcast message sent on the network. If you want clients to be able to automatically configure themselves, you need to add an IMetadataExchange endpoint as well. The IMetadataExchange endpoint allows the service to send information about the contracts in use, the bindings against those contracts, and address information on where the service is listening for messages.

The following code constructs a ServiceHost for some service, TestService, that implements a contract named ITest.

var baseUri = string.Format("net.tcp://{0}/Test", Environment.MachineName);
using (var host = new ServiceHost(typeof(TestService), new Uri(baseUri)))
{
  // Make the service discoverable and make sure it has an
  // endpoint to announce its presence.
  var discoveryBehavior = new ServiceDiscoveryBehavior();
  discoveryBehavior.AnnouncementEndpoints.Add(
    new UdpAnnouncementEndpoint());
  host.Description.Behaviors.Add(discoveryBehavior);

  // Make sure the service can respond to probes.
  host.AddServiceEndpoint(new UdpDiscoveryEndpoint());

  // Add the ability to handle Metadata requests (aka WSDL)
  host.Description.Behaviors.Add(new ServiceMetadataBehavior());
  host.AddServiceEndpoint(typeof(IMetadataExchange),
    MetadataExchangeBindings.CreateMexTcpBinding(), "/mex");

  // Tell the service to listen for ITest messages on net.tcp
  host.AddServiceEndpoint(typeof (ITest), new NetTcpBinding(),
    string.Empty);

  // Open the host and start listening.
  host.Open();

  // Display what we are listening for
  foreach (var endpoint in host.Description.Endpoints)
  {
    Console.WriteLine("{0}: {1}",
      endpoint.Contract.Name,
      endpoint.ListenUri);
  }

  // Wait until we are done.
  Console.WriteLine("Press [Enter] to exit");
  Console.ReadLine();
}

 

With this code, we have a service that is discoverable and callable without requiring a client to have any code or configuration that is specific to our service (though it may if the developer chooses).

Leave a comment