XmlDictionary and Binary Serialization


One of the interesting things that came out of WCF is the improvements in Infoset serialization. In particular, WCF introduced a format for binary serialization which reduces space concerns for objects. One of the keys to saving space is the notion of an XmlDictionary. The WCF serialization folks asked the questions:

How much could we reduce the size of a message if we allowed the parties communicating to exchange metadata about the messages?

What if we could reduce the size of messages by exchanging aliases for the XML Infoset node names?

The result of this what if experiment is the XmlDictionary and XmlBinaryWriterSession. The mechanism is astonishingly simple. Assume that both ends have a mechanism for exchanging information about what to call the two parts of a QName: name namespace and the name of the node. Instead of sending namespace:element qualified items, send aliases. This works well in WCF messaging and happens whenever you send messages over the binary serializer. You can also use this in your own code that uses a binary serializer. The only requirement is that the serializer and deserializer have to agree on the makeup of the XmlDictionary. Let’s start by looking at some code that does plain old binary serialization.

We start with an object:

[DataContract(Namespace = "http://www.friseton.com/Name/2010/06")]
public class Person
{
  [DataMember]
  public string FirstName { get; set; }

  [DataMember]
  public string LastName { get; set; }

  [DataMember]
  public DateTime Birthday { get; set; }
}

I then have a ‘driver’ program:

 

static void Main(string[] args)
{
  var person = new Person
               {
                 FirstName = "Scott",
                 LastName = "Seely",
                 Birthday = new DateTime(1900, 4, 5)
               };
  var serializer = new DataContractSerializer(typeof (Person));
  Console.WriteLine("Serialize Binary: {0} bytes",
    SerializeBinary(person, serializer).Length);
  Console.WriteLine("Serialize Binary with Dictionary: {0} bytes",
    SerializeBinaryWithDictionary(person, serializer).Length);
}

The application emits the size of the streams when each object is written out. The first, SerializeBinary, does not use a dictionary. As a result, it won’t have access to the aliases and must instead write out the full object.

private static Stream SerializeBinary(Person person,
  DataContractSerializer serializer)
{
  var stream = new MemoryStream();
  var writer = XmlDictionaryWriter.CreateBinaryWriter(stream);
  serializer.WriteObject(writer, person);
  writer.Flush();
  return stream;
}

In this case, we get a stream which contains 146 bytes. That’s pretty poor considering that we are interested in 10 characters (28 bytes: each string has a 4 byte length and then 2 bytes/character) and a simple DateTime representation (4 bytes). Can we make this smaller? How close can we get to 32 bytes? The answer: really close!

The version of SerializeBinaryWithDictionary that I wrote is verbose: it contains a number of lines that show what is going on internally. Your own code may be as long, but would include the lines as debug output.Please note that you need to include a reference to the XMLSchema-instance namespace in your dictionary so that both the reader and writer agree on the value of this attribute.

private static Stream SerializeBinaryWithDictionary(Person person,
  DataContractSerializer serializer)
{
  var stream = new MemoryStream();
  var dictionary = new XmlDictionary();
  var session = new XmlBinaryWriterSession();
  var key = 0;
  session.TryAdd(dictionary.Add("FirstName"), out key);
  Console.WriteLine("Added FirstName with key: {0}", key);
  session.TryAdd(dictionary.Add("LastName"), out key);
  Console.WriteLine("Added LastName with key: {0}", key);
  session.TryAdd(dictionary.Add("Birthday"), out key);
  Console.WriteLine("Added Birthday with key: {0}", key);
  session.TryAdd(dictionary.Add("Person"), out key);
  Console.WriteLine("Added Person with key: {0}", key);
  session.TryAdd(dictionary.Add("http://www.friseton.com/Name/2010/06"),
    out key);
  Console.WriteLine("Added xmlns with key: {0}", key);
  session.TryAdd(dictionary.Add("http://www.w3.org/2001/XMLSchema-instance"),
    out key);
  Console.WriteLine("Added xmlns for xsi with key: {0}", key);

  var writer = XmlDictionaryWriter.CreateBinaryWriter(
    stream, dictionary, session);
  serializer.WriteObject(writer, person);
  writer.Flush();
  return stream;
}

The size difference is striking: we shave off 108 bytes by using the dictionary. We are getting close to the same size as the memory footprint of the object data! The cool bit: you can use this in your own code. The dictionary needs to be shared between the reader and writer sessions (there is a corresponding XmlBinaryReaderSession which can also be populated from the common dictionary via the deserialization process). For posterity, the output of the program is:

Serialize Binary: 146 bytes

Added FirstName with key: 0

Added LastName with key: 1

Added Birthday with key: 2

Added Person with key: 3

Added xmlns with key: 4

Added xmlns for xsi with key: 5

Serialize Binary with Dictionary: 38 bytes

A slightly different version that shows both reading and writing with a shared understanding of what the dictionary looks like follows:

private static Stream SerializeBinaryWithDictionary(Person person,
  DataContractSerializer serializer)
{
  var strings = new List<XmlDictionaryString>();
  var stream = new MemoryStream();
  var dictionary = new XmlDictionary();
  var session = new XmlBinaryWriterSession();
  var rdr = new XmlBinaryReaderSession();

  var key = 0;
  strings.Add(dictionary.Add("FirstName"));
  strings.Add(dictionary.Add("LastName"));
  strings.Add(dictionary.Add("Birthday"));
  strings.Add(dictionary.Add("Person"));
  strings.Add(dictionary.Add("http://www.friseton.com/Name/2010/06"));
  strings.Add(dictionary.Add("http://www.w3.org/2001/XMLSchema-instance"));
  Console.WriteLine("Added xmlns with key: {0}", key);

  var writer = XmlDictionaryWriter.CreateBinaryWriter(
    stream, dictionary, session);

  foreach (var val in strings)
  {
    if (session.TryAdd(val, out key))
    {
      rdr.Add(key, val.Value);
    }
  }
  serializer.WriteObject(writer, person);
  writer.Flush();
  stream.Position = 0;
  var reader = XmlDictionaryReader.CreateBinaryReader(stream, dictionary,
    XmlDictionaryReaderQuotas.Max, rdr);
  var per = serializer.ReadObject(reader) as Person;
  writer.Flush();
  return stream;
}

 

Looking at the above, we can also account for the missing 6 bytes in our serialization: the extra 6 bytes are names of the nodes.