Issues

Testing the Performance of Querying Umbraco

In Umbraco, it’s common knowledge that there’s always AT LEAST three ways of doing the same thing and it’s not always obvious to newbies, which is the right one to use.

One of the biggest culprits for this is querying Umbraco for content. These are all of the ways of getting content that I know about: 

  • The old Node API (deprecated, you shouldn’t be using this)
  • The old Document API (also, deprecated)
  • The Content Service (not for the front end, designed for in code CRUD operations)
  • The Dynamic Content methods in the UmbracoHelper (I’m not a fan of these, I prefer the strongly typed ones)
  • The Strongly Typed methods in the UmbracoHelper
  • The TypedSearch method of the UmbracoHelper, that wraps Examine
  • Examine

That’s a LOT of different ways to do the same thing! Recently at uWestFest, Jason Prothero and Mark Bowser did a very interesting talk about Anti-Patterns that included some tests on querying Umbraco with the various methods. I thought it would nice to expand on this a bit and try and help people decide which method is right for them.

 

Here Comes the Science Part (Sort of)

For this highly scientific and rigorous testing (ahem) I set up the Fanoe starter kit, and I added 5000 blog posts. I then created a bunch of tests based on the most commonly used ways of getting content, plus the Content Service. BUT YOU SHOULDN’T USE THE CONTENT SERVICE FOR QUERYING!!!! I hear you shout in horror, and you’re right. But I thought it would be good to include it (at least early on) to illustrate just how bad it is.

I settled on using the following methods to test:

  • Getting the direct children of a node
  • Using a nasty LINQ query on the TypedContent API
  • Using the Descendants method on the root of the site
  • Using GetContentByXpath with an inefficient query
  • Using GetContentByXPath with an efficient query
  • Using the TypedSearch method of the UmbracoHelper
  • Using raw Examine

I ignored the two deprecated methods and the dynamic API, as they’re either outdated or not recommended.

I ran each test multiple times and timed the code, and then took an average of the results. I also simulated heavy load on the site to see how the methods perform under load, by firing 1000 simultaneous async requests at each method and averaging the results to see how much of a difference that makes. I used the site with 5000 blog posts, as that provides a fairly extreme example, and it also magnifies the performance and makes it easier to see how good/bad things are.

I tested just getting all of the blog posts (not looping through the returned content), I tested getting all the records and looping through them all, I tested getting the most recent 10 records, with and without looping, and finally I tested performing a search to get all records, and a search to get all records in an efficient paged manner. For the first two sets of tests, I also tested the Content Service, just to show how badly it ran.

I noticed a few interesting things while testing. Umbraco seems to do some per request caching for the TypedContent methods of the UmbracoHelper. The first time you run the query it will take longer, but any subsequent identical queries on the same request are MUCH faster. There’s still an overhead for needlessly repeating the same code, but Umbraco mitigates it a bit. I also noticed that the TypedSearch method is really bad (more on that later). Finally, the inefficient XPATH query isn’t always slower than the efficient one. On further investigation, that turned out to be because the site is mostly blog posts. Trying similar queries on a more complex site structure shows that the efficient query is noticeably faster than the inefficient one.

The machine I tested on was a 2.8GHz i7 with 8GB of RAM.

Getting ALL the Things

First up, testing getting all of the blog posts for the site (with and without looping through the records) used the following code:

//bad linq
var test = Umbraco.TypedContentAtRoot().First().Descendants().Where(a => a.DocumentTypeAlias == "blogPost");

//all children
var test = Umbraco.TypedContent(1085).Children();

//get by descendant
var test = Umbraco.TypedContentAtRoot().DescendantsOrSelf("BlogPost");

//greedy XPath
var test = Umbraco.TypedContentAtXPath("//BlogPost [@isDoc]");

//efficient XPath
var test = Umbraco.TypedContentAtXPath("root/Home/BlogPostRepository/BlogPost [@isDoc]");

//typed Search
var criteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = criteria.Field("nodeTypeAlias", "blogPost");

var test = Umbraco.TypedSearch(query.Compile());

//pure Examine
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile());

//content service
var test = Services.ContentService.GetContentOfContentType(1068);

So, how did they perform? Let's have a look.

Method Avg Time (ms) Avg Under Load (ms) Difference (ms) % Worse (rounded to nearest %)
Direct Children 218.9102 288.3253 69.41438 32%
Bad Linq 0.17853 0.235 0.05497 31%
Descendants From Root 0.03694 0.0824 0.04546 123%
XPATH Greedy 0.03571 0.04321 0.00739 21%
XPATH Efficient 0.03521 0.0546 0.01939 55%
TypedSearch 583.27079 N/A N/A N/A
Examine 3.59746 7.1039 3.50644 97%
Content Service 1991.05889 N/A N/A N/A

So how did we do? Well, the content service is clearly the worst performer, as it hits the database quite heavily. It’s over 3 times slower than the next worst, which is TypedSearch. Unsurprisingly, the XPath queries are the fastest. The performance on most of the other methods is actually pretty acceptable.

It’s interesting to note that both the Content Service and TypedSearch start to crash under load, returning out of memory exceptions fairly quickly. Direct children doesn’t come off that great, but I think that’s because it must pre-load all of the children.

Next up, how do they fare when we loop through all 5000 returned things?

Method Avg Time (ms) Avg Under Load (ms) Difference (ms) % Worse (rounded to nearest %)
Direct Children 202.38049 287.6374 85.25691 42%
Bad Linq 265.64 226.8708 66.8801 29%
Descendants From Root 226.64352 281.7007 55.05718 24%
XPATH Greedy 207.50742 260.0227 52.51528 25%
XPATH Efficient 198.73297 276.3727 77.63973 39%
TypedSearch 547.66416 N/A N/A N/A
Examine 352.2925 1694.7278 14342.4353 381%
Content Service 1933.66256 N/A N/A N/A

So, looping through the children isn’t much different, presumably because they’re all load in advance, and the same with the content service (which is still the slowest). All the others are slower. It’s interesting to note that looping through 5000 records in Examine is pretty slow, and it performs REALLY badly under load.

That said, this is a pretty contrived test, in the real world, it’s unlikely you’ll need to get 5000 records and loop through them all at once, unless you’re a crazy person maybe (or generating a VERY comprehensive sitemap). For most simple content queries that don’t return insane amounts of data, you can happily use Linq, direct children and descendants (if used CAREFULLY). The XPath methods are the fastest though, especially if you can craft a reasonably efficient query.

Getting the Latest Things

Now we’ll try something more real world. Lets say we want the latest 10 blog posts. That’s something you see a lot. This is a good example, as the more blog posts that you have, the slower the queries will become. For these tests, I binned the Content Service, as we’ve already proved that it’s crap for querying on.

For this one, I’m just looping through the records. For the curious, the code for each method is below:

//bad linq
var latest = Umbraco.TypedContentAtRoot().First().Descendants().Where(a => a.DocumentTypeAlias == "blogPost").OrderByDescending(a => a.CreateDate).Take(10);

//all children
var latest = Umbraco.TypedContent(1085).Children().OrderByDescending(a => a.CreateDate).Take(10);

//get by descendant
        
var test = Umbraco.TypedContentAtRoot().DescendantsOrSelf("BlogPost").OrderByDescending(a => a.CreateDate).Take(10);

//greedy XPath
var test = Umbraco.TypedContentAtXPath("//BlogPost [@isDoc]").OrderByDescending(a => a.CreateDate).Take(10);

//efficient XPath
var test = Umbraco.TypedContentAtXPath("root/Home [@isDoc]/BlogPostRepository [@isDoc]/BlogPost [@isDoc]").OrderByDescending(a => a.CreateDate);

//typed search
var criteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = criteria.Field("nodeTypeAlias", "blogPost").And().OrderByDescending("createDate");

var test = Umbraco.TypedSearch(query.Compile()).Take(10);

//pure Examine
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost").And().OrderByDescending("createDate");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile()).Take(10);

And the results were as follows:

Method Avg Time (ms) Avg Under Load (ms) Difference (ms) % Worse (rounded to nearest %)
Direct Children 208.32397 512.3234 303.99943 146%
Bad Linq 239.97 571.615 331.644 138%
Descendants From Root 235.54425 547.1448 311.60055 132%
XPATH Greedy 211.44828 509.9365 298.48822 141%
XPATH Efficient 206.1111 494.3311 288.22 140%
TypedSearch 546.99595 N/A N/A N/A
Examine 4.32127 8.2108 3.88953 90%

Notably, every single method apart from Examine is as bad or worse than looping through all the records. This is because in order to do the Linq sorting, all of the records must first be loaded into memory and sorted. The performance under load is also MUCH worse. Pure Examine however, is a BEAST. It’s 48 times faster than the next fastest method, and 126 times faster than the slowest method.

I’m surprised by just how badly TypedSearch behaves, as it’s a wrapper round Examine. But the secret here is that Pure Examine has special methods for Skip and Take that make it super fast, as it doesn’t need to load everything, but the TypedSearch method does not expose these.

I added some code to Examine code to use Umbraco.TypedContent() to get each of the 10 returned items, and it was still blisteringly quick. So the takeaway here is that if you need a subset of your content, and ordering is important, Examine is the way to go.

If you’re building navigation for example, and listing children and sorting them, as long as your section doesn’t have 1000’s of sub-pages, using the Children method with a sort (if needed) will be fine.

Searching All the Things

Finally, I tested a simple search. I searched for all blogposts that contain “testing” in the title (100 of the 5000 blog posts on my test site). Ordered them by latest, and then took 10 records, to simulate a properly paged search.

In this case, the obvious choice is probably going to be Examine, as that’s what it was built for, but for the sake of argument, I’ve done it using some other methods too.

Once again, here’s some code:

//search with linq
var test = Umbraco.TypedContentAtRoot().DescendantsOrSelf("BlogPost").Where(a => a.Name.Contains("testing")).OrderByDescending(a => a.CreateDate).Take(10);

foreach (var item in test)
{

}

//search with xpath
var test = Umbraco.TypedContentAtXPath("root/Home/BlogPostRepository/BlogPost [@isDoc and contains(@nodeName, 'testing')]").OrderByDescending(a => a.CreateDate).Take(10);

foreach (var item in test)
{

}
      
//search with typed content      
var criteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = criteria.Field("nodeTypeAlias", "blogPost").And().Field("nodeName", "testing").And().OrderByDescending("createDate");

var test = Umbraco.TypedSearch(query.Compile()).Take(10);

foreach (var item in test)
{

}

//search with pure examine   
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost").And().Field("nodeName", "testing").And().OrderByDescending("createDate");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile()).Take(10);

foreach (var item in test)
{

}

//searchwith pure examine and fetch ipublishedcontent
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost").And().Field("nodeName", "testing").And().OrderByDescending("createDate");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile()).Take(10);

foreach (var item in test)
{
    var content = Umbraco.TypedContent(item.Id);
}

And the results:

Method Avg Time (ms) Avg Under Load (ms) Difference (ms) % Worse (rounded to nearest %)
Linq 260.28506 559.4626 339.17754 130%
XSLT 67.59894 127.5567 59.95776 87%
TypedSearch 121.47725 412.2094 290.73215 239%
Pure Examine 2.09768 4.4403 2.34262 112%
Pure Examine and Get the IPublishedContent 2.75111 5.9849 3.23379 118%

And the results are pretty much as expected. Pure Examine wins, but if you need to get the IPublishedContent to process, it’s not all that much slower in the grand scheme of things. Linq, as expected, is a bit slow due to having to load everything into memory, multiple times. XPath isn’t that bad to be honest, and TypedSearch is again, a bit crap for something that wraps Examine.

And Finally….

So, in conclusion, for most small queries on your site well crafted Linq, or efficient XPAth is the way to go. For anything more complex, where you need to actually search/filter, or do efficient paging, Examine is DEFNITELY the way to go. In most real world situations, you won’t see quite as extreme times as I got here, but that was the point of the tests, to see how well the methods performed in the worst case scenario!

I was genuinely surprised by just how badly TypedSearch performed, I may have to have a look at the code for that and submit a PR to see if I can make it more efficient, as for most searches you might want to use it for, pure Examine is MUCH faster.

I hope this is useful and helps you to see the difference in performance between the different ways of querying Umbraco!

Tim Payne

Tim Payne is a (pending verification) Guinness World Record beating freelance developer, based in the north of England with over 20 years of experience who’s been working with Umbraco since v4. When he’s not slaving in the hot code mines, he can be found participating in crazy challenges because they seemed like a good idea at the time, or running around after his tiny daughter.

comments powered by Disqus