Issues

Bulletproof Interface Deserialization in Json.NET

What is Deserialization?

To understand deserialization, it is useful to first understand the opposite process, serialization. Serialization is the process of converting an in-memory instance of an object into a string representation of that object. This is typically done to transfer this object (as a string) across a network or to persist it somewhere, such as to a database or to the file system. Deserialization, as you may have guessed, is the process of converting a string into an in-memory instance of an object.

Serialized objects and deserialized objects are a bit like eggs and chickens.

Similar to an egg, a serialized object (i.e., a string) is motionless and has the potential to go through a process to become a chicken. Much like the chicken and the egg, it is typically the serialization that comes first, followed later by deserialization. That is, an object is created in memory somehow, then it is serialized so that it can be persisted to the "egg" (i.e., the string). Some time later, that egg gets hatched back into the "chicken" (i.e., the in-memory object). What, you weren't aware that the egg came first? Now you know.

This article will talk about one part of the deserialization process that is particularly challenging. That is, the process of converting from a string into an in-memory object instance of an interface.

Deserialization is Hard

Serializing a class that implements an interface is simple. Deserializing JSON to one of many possible classes that implement an interface can be tricky business. Json.NET (otherwise known as Newtonsoft Json) provides an easy option to handle this situation, but this option is not as robust as may sometimes be necessary. Here's an example:

public class Person
{
    public IProfession Profession { get; set; }
}

public interface IProfession
{
    string JobTitle { get; }
}

public class Programming : IProfession
{
    public string JobTitle => "Software Developer";
    public string FavoriteLanguage { get; set; }
}

public class Writing : IProfession {     public string JobTitle => "Copywriter";     public string FavoriteWord { get; set; } } public class Samples {     public static Person GetProgrammer()     {         return new Person()         {             Profession = new Programming()             {                 FavoriteLanguage = "C#"             }         };     } }

If you attempt to serialize the sample programmer person, you'll get some JSON that looks like this:

{
    "Profession": {
        "JobTitle": "Software Developer",
        "FavoriteLanguage": "C#"
    }
}

Looking at that JSON, you as a person can easily tell that this serialized person is a programmer. However, the deserializer would throw an exception, as it wouldn't know which profession (programming or writing) to deserialize the person's profession to.

Simple Interface Deserialization

Luckily, Json.NET provides a simple setting to solve the problem of deserializing a property that is an interface with multiple implementing classes:

private Person SerializeAndDeserialize(Person person)
{
    var indented = Formatting.Indented;
    var settings = new JsonSerializerSettings()
    {
        TypeNameHandling = TypeNameHandling.All
    };
    var serialized = JsonConvert.SerializeObject(person, indented, settings);
    var deserialized = JsonConvert.DeserializeObject(serialized, settings);
    return deserialized;
}

By specifying that all type names should be included in the serialized JSON, you give the deserializer enough context to know what specific type to deserialize the profession to. In the above example, the serialized JSON would look like this:

{
    "$type": "SampleApp.Types.Person, SampleApp",
    "Profession": {
        "$type": "SampleApp.Types.Programming, SampleApp",
        "JobTitle": "Software Developer",
        "FavoriteLanguage": "C#"
    }
}

Note that each JSON object now includes the full type (e.g., "Programming"), namespace (e.g., "SampleApp.Types"), and assembly (e.g., "SampleApp"). Armed with that information, the deserializer knows exactly the type to instantiate when creating one of the IProfession implementations. However, this causes another problem.

Serializing Type Metadata is Brittle

Suppose you want to rename your app from "SampleApp" to "ExampleApp". Or maybe you want to rename the "Types" namespace to "Models". Or perhaps you want to change the "Programming" class to be the "Coding" class. You could do any of those and the serialization and deserialization would still work just fine going forward. However, any people that were serialized and persisted to, say, a file system, would later not deserialize with the refactored code. This is because the type information in the serialized version would no longer match the type information in the updated code base. There is a way to avoid this problem, though the solution is a little more involved than the prior Json.NET setting.

Bulletproof Deserialization

If we backtrack to our original implementation, we'd have some JSON that doesn't contain any type metadata:

{
    "Profession": {
        "JobTitle": "Software Developer",
        "FavoriteLanguage": "C#"
    }
}

For our bulletproof deserialization, we can use this original serialized version, as it contains enough information for the computer to know that the profession in this instance is programming. The trick is to use a custom JSON converter:

public class ProfessionConverter : JsonConverter
{
    public override bool CanWrite => false;
    public override bool CanRead => true;
    public override bool CanConvert(Type objectType)
    {
        return objectType == typeof(IProfession);
    }
    public override void WriteJson(JsonWriter writer,
        object value, JsonSerializer serializer)
    {
        throw new InvalidOperationException("Use default serialization.");
    }

    public override object ReadJson(JsonReader reader,
        Type objectType, object existingValue,
        JsonSerializer serializer)
    {
        var jsonObject = JObject.Load(reader);
        var profession = default(IProfession);
        switch (jsonObject["JobTitle"].Value())
        {
            case "Software Developer":
                profession = new Programming();
                break;
            case "Copywriter":
                profession = new Writing();
                break;
        }
        serializer.Populate(jsonObject.CreateReader(), profession);
        return profession;
    }
}

What's happening in this converter is that the job title is inspected to figure out which profession to deserialize to. First, the data is deserialized to a JObject, which makes it easy to read the properties of the object. Next, the job title is read from the deserialized object and an instance of a particular profession is created based on that job title. Finally, that profession is populated from the serialized data. To get all of this working, you will need to decorate the "Profession" property with a JsonConverter attribute to let it know to use your custom converter:

public class Person
{
    [JsonConverter(typeof(ProfessionConverter))]
    public IProfession Profession { get; set; }
}

In this converter, the type ID (a GUID) is used to infer the class to deserialize. What's notable about this implementation is that the types to deserialize to aren't referred to in the converter. Instead, the converter uses reflection to find all field types, then compares the type ID from the serialized string to the type ID of each field type to find the right one to instantiate.

Formulate, A Practical Example

This technique was actually used in a major open source project, Formulate (an Umbraco form builder plugin I built for Rhythm Agency). The problem it solved was that Formulate stores much of its data as JSON, and it is currently in the early stages of development. That means that Formulate has the potential to undergo a large amount of refactoring. If the JSON files contained type information, Formulate would constantly break when users upgrade. This bulletproof deserialization technique ensures that never happens.

Let's take a look at an example here: FieldsJsonConverter.ReadJson

/// 
/// Deserializes JSON into an array of IFormField.
/// 
/// 
/// An array of IFormField.
/// 
public override object ReadJson(JsonReader reader,
	Type objectType, object existingValue,
	JsonSerializer serializer)
{
	// Variables.
	var fields = new List();
	var jsonArray = JArray.Load(reader);


	// Deserialize each form field.
	foreach (var item in jsonArray)
	{
		// Create a form field instance by the field type ID.
		var jsonObject = item as JObject;
		var strTypeId = jsonObject["TypeId"].Value();
		var typeId = Guid.Parse(strTypeId);
		var instance = InstantiateFieldByTypeId(typeId);


		// Populate the form field instance.
		serializer.Populate(jsonObject.CreateReader(), instance);
		fields.Add(instance);

	}

	// Return array of form fields.
	return fields.ToArray();

}

And here's a sample of the JSON this code deserializes (we are only interested in the "Fields" array):

{
  "Name": "Contact Us",
  "Fields": [
    {
      "Id": "fb638ef8-939f-4c0e-8772-8a9c7bc7716b",
      "Name": "Your Name",
      "TypeId": "17906580-86ea-440b-bc30-9e1b099f803b"
    },
    {
      "Id": "42f15080-d3b9-4247-9615-7a93a2d24697",
      "Name": "Message",
      "TypeId": "9da84359-4d0b-4944-9144-9f8ccae7a4da"
    }
  ]
}

This code is doing two passes of deserialization. First, it is deserializing a collection of form fields (e.g., a textbox or a checkbox) into an array. Next, it enumerates that array to fetch the "TypeId" (a GUID) from each deserialized object. Once it has the GUID type identifier, it uses that information to construct an instance of the form field class identified by that GUID. It does this with a bit of reflection (not shown) that creates an instance of each form field type to ask them for their GUID type identifier, and it finds the instance that matches the deserialized GUID type identifier. Finally, it essentially runs a second pass of deserialization to populate the form field instance with the data from the first deserialized object. This second pass is necessary because it is converting a very general object (i.e., a JObject) to an instance of one of the form field classes (e.g., TextField or CheckboxField).

Now, if I ever decided to move any of the form field classes, rename them, or refactor them in some other way, I can still deserialize old versions of those form fields. This ability to be agile over time is exactly the right reason to use bulletproof deserialization.

How Can You Use Bulletproof Deserialization?

If you've read this far, chances are you're eager to try out this technique yourself. If so, here are the basic steps you'll need to follow:

  • Interface. Write some classes that implement an interface. Ensure they have some property data to uniquely identify them.
  • Converter. Write a class that implements JsonConverter.
  • Attribute. Decorate the property you want to deserialize with the JsonConverter attribute (passing it your JsonConverter implementation).
  • Deserialize. Call the JsonConvert.DeserializeObject method.

And that's it! That's all you need to do to make sure your deserialization is bulletproof. With those few short steps, you'll make your serialized data super resilient. Happy coding!

comments powered by Disqus