Testing the Performance of Querying Umbraco by Tim Payne

In Umbraco, it’s common knowledge that there’s always AT LEAST three ways of doing the same thing and it’s not always obvious to newbies, which is the right one to use.

One of the biggest culprits for this is querying Umbraco for content. These are all of the ways of getting content that I know about:

The old Node API (deprecated, you shouldn’t be using this)
The old Document API (also, deprecated)
The Content Service (not for the front end, designed for in code CRUD operations)
The Dynamic Content methods in the UmbracoHelper (I’m not a fan of these, I prefer the strongly typed ones)
The Strongly Typed methods in the UmbracoHelper
The TypedSearch method of the UmbracoHelper, that wraps Examine
Examine

That’s a LOT of different ways to do the same thing! Recently at uWestFest, Jason Prothero and Mark Bowser did a very interesting talk about Anti-Patterns that included some tests on querying Umbraco with the various methods. I thought it would nice to expand on this a bit and try and help people decide which method is right for them.

Here Comes the Science Part (Sort of)

For this highly scientific and rigorous testing (ahem) I set up the Fanoe starter kit, and I added 5000 blog posts. I then created a bunch of tests based on the most commonly used ways of getting content, plus the Content Service. BUT YOU SHOULDN’T USE THE CONTENT SERVICE FOR QUERYING!!!! I hear you shout in horror, and you’re right. But I thought it would be good to include it (at least early on) to illustrate just how bad it is.

I settled on using the following methods to test:

Getting the direct children of a node
Using a nasty LINQ query on the TypedContent API
Using the Descendants method on the root of the site
Using GetContentByXpath with an inefficient query
Using GetContentByXPath with an efficient query
Using the TypedSearch method of the UmbracoHelper
Using raw Examine

I ignored the two deprecated methods and the dynamic API, as they’re either outdated or not recommended.

I ran each test multiple times and timed the code, and then took an average of the results. I also simulated heavy load on the site to see how the methods perform under load, by firing 1000 simultaneous async requests at each method and averaging the results to see how much of a difference that makes. I used the site with 5000 blog posts, as that provides a fairly extreme example, and it also magnifies the performance and makes it easier to see how good/bad things are.

I tested just getting all of the blog posts (not looping through the returned content), I tested getting all the records and looping through them all, I tested getting the most recent 10 records, with and without looping, and finally I tested performing a search to get all records, and a search to get all records in an efficient paged manner. For the first two sets of tests, I also tested the Content Service, just to show how badly it ran.

I noticed a few interesting things while testing. Umbraco seems to do some per request caching for the TypedContent methods of the UmbracoHelper. The first time you run the query it will take longer, but any subsequent identical queries on the same request are MUCH faster. There’s still an overhead for needlessly repeating the same code, but Umbraco mitigates it a bit. I also noticed that the TypedSearch method is really bad (more on that later). Finally, the inefficient XPATH query isn’t always slower than the efficient one. On further investigation, that turned out to be because the site is mostly blog posts. Trying similar queries on a more complex site structure shows that the efficient query is noticeably faster than the inefficient one.

The machine I tested on was a 2.8GHz i7 with 8GB of RAM.

Getting ALL the Things

First up, testing getting all of the blog posts for the site (with and without looping through the records) used the following code:

//bad linq
var test = Umbraco.TypedContentAtRoot().First().Descendants().Where(a => a.DocumentTypeAlias == "blogPost");

//all children
var test = Umbraco.TypedContent(1085).Children();

//get by descendant
var test = Umbraco.TypedContentAtRoot().DescendantsOrSelf("BlogPost");

//greedy XPath
var test = Umbraco.TypedContentAtXPath("//BlogPost [@isDoc]");

//efficient XPath
var test = Umbraco.TypedContentAtXPath("root/Home/BlogPostRepository/BlogPost [@isDoc]");

//typed Search
var criteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = criteria.Field("nodeTypeAlias", "blogPost");

var test = Umbraco.TypedSearch(query.Compile());

//pure Examine
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile());

//content service
var test = Services.ContentService.GetContentOfContentType(1068);

So, how did they perform? Let's have a look.

Method	Avg Time (ms)	Avg Under Load (ms)	Difference (ms)	% Worse (rounded to nearest %)
Direct Children	218.9102	288.3253	69.41438	32%
Bad Linq	0.17853	0.235	0.05497	31%
Descendants From Root	0.03694	0.0824	0.04546	123%
XPATH Greedy	0.03571	0.04321	0.00739	21%
XPATH Efficient	0.03521	0.0546	0.01939	55%
TypedSearch	583.27079	N/A	N/A	N/A
Examine	3.59746	7.1039	3.50644	97%
Content Service	1991.05889	N/A	N/A	N/A

So how did we do? Well, the content service is clearly the worst performer, as it hits the database quite heavily. It’s over 3 times slower than the next worst, which is TypedSearch. Unsurprisingly, the XPath queries are the fastest. The performance on most of the other methods is actually pretty acceptable.

It’s interesting to note that both the Content Service and TypedSearch start to crash under load, returning out of memory exceptions fairly quickly. Direct children doesn’t come off that great, but I think that’s because it must pre-load all of the children.

Next up, how do they fare when we loop through all 5000 returned things?

Method	Avg Time (ms)	Avg Under Load (ms)	Difference (ms)	% Worse (rounded to nearest %)
Direct Children	202.38049	287.6374	85.25691	42%
Bad Linq	265.64	226.8708	66.8801	29%
Descendants From Root	226.64352	281.7007	55.05718	24%
XPATH Greedy	207.50742	260.0227	52.51528	25%
XPATH Efficient	198.73297	276.3727	77.63973	39%
TypedSearch	547.66416	N/A	N/A	N/A
Examine	352.2925	1694.7278	14342.4353	381%
Content Service	1933.66256	N/A	N/A	N/A

So, looping through the children isn’t much different, presumably because they’re all load in advance, and the same with the content service (which is still the slowest). All the others are slower. It’s interesting to note that looping through 5000 records in Examine is pretty slow, and it performs REALLY badly under load.

That said, this is a pretty contrived test, in the real world, it’s unlikely you’ll need to get 5000 records and loop through them all at once, unless you’re a crazy person maybe (or generating a VERY comprehensive sitemap). For most simple content queries that don’t return insane amounts of data, you can happily use Linq, direct children and descendants (if used CAREFULLY). The XPath methods are the fastest though, especially if you can craft a reasonably efficient query.

Getting the Latest Things

Now we’ll try something more real world. Lets say we want the latest 10 blog posts. That’s something you see a lot. This is a good example, as the more blog posts that you have, the slower the queries will become. For these tests, I binned the Content Service, as we’ve already proved that it’s crap for querying on.

For this one, I’m just looping through the records. For the curious, the code for each method is below:

//bad linq
var latest = Umbraco.TypedContentAtRoot().First().Descendants().Where(a => a.DocumentTypeAlias == "blogPost").OrderByDescending(a => a.CreateDate).Take(10);

//all children
var latest = Umbraco.TypedContent(1085).Children().OrderByDescending(a => a.CreateDate).Take(10);

//get by descendant
        
var test = Umbraco.TypedContentAtRoot().DescendantsOrSelf("BlogPost").OrderByDescending(a => a.CreateDate).Take(10);

//greedy XPath
var test = Umbraco.TypedContentAtXPath("//BlogPost [@isDoc]").OrderByDescending(a => a.CreateDate).Take(10);

//efficient XPath
var test = Umbraco.TypedContentAtXPath("root/Home [@isDoc]/BlogPostRepository [@isDoc]/BlogPost [@isDoc]").OrderByDescending(a => a.CreateDate);

//typed search
var criteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = criteria.Field("nodeTypeAlias", "blogPost").And().OrderByDescending("createDate");

var test = Umbraco.TypedSearch(query.Compile()).Take(10);

//pure Examine
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost").And().OrderByDescending("createDate");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile()).Take(10);

And the results were as follows:

Method	Avg Time (ms)	Avg Under Load (ms)	Difference (ms)	% Worse (rounded to nearest %)
Direct Children	208.32397	512.3234	303.99943	146%
Bad Linq	239.97	571.615	331.644	138%
Descendants From Root	235.54425	547.1448	311.60055	132%
XPATH Greedy	211.44828	509.9365	298.48822	141%
XPATH Efficient	206.1111	494.3311	288.22	140%
TypedSearch	546.99595	N/A	N/A	N/A
Examine	4.32127	8.2108	3.88953	90%

Notably, every single method apart from Examine is as bad or worse than looping through all the records. This is because in order to do the Linq sorting, all of the records must first be loaded into memory and sorted. The performance under load is also MUCH worse. Pure Examine however, is a BEAST. It’s 48 times faster than the next fastest method, and 126 times faster than the slowest method.

I’m surprised by just how badly TypedSearch behaves, as it’s a wrapper round Examine. But the secret here is that Pure Examine has special methods for Skip and Take that make it super fast, as it doesn’t need to load everything, but the TypedSearch method does not expose these.

I added some code to Examine code to use Umbraco.TypedContent() to get each of the 10 returned items, and it was still blisteringly quick. So the takeaway here is that if you need a subset of your content, and ordering is important, Examine is the way to go.

If you’re building navigation for example, and listing children and sorting them, as long as your section doesn’t have 1000’s of sub-pages, using the Children method with a sort (if needed) will be fine.

Searching All the Things

Finally, I tested a simple search. I searched for all blogposts that contain “testing” in the title (100 of the 5000 blog posts on my test site). Ordered them by latest, and then took 10 records, to simulate a properly paged search.

In this case, the obvious choice is probably going to be Examine, as that’s what it was built for, but for the sake of argument, I’ve done it using some other methods too.

Once again, here’s some code:

//search with linq
var test = Umbraco.TypedContentAtRoot().DescendantsOrSelf("BlogPost").Where(a => a.Name.Contains("testing")).OrderByDescending(a => a.CreateDate).Take(10);

foreach (var item in test)
{

}

//search with xpath
var test = Umbraco.TypedContentAtXPath("root/Home/BlogPostRepository/BlogPost [@isDoc and contains(@nodeName, 'testing')]").OrderByDescending(a => a.CreateDate).Take(10);

foreach (var item in test)
{

}
      
//search with typed content      
var criteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = criteria.Field("nodeTypeAlias", "blogPost").And().Field("nodeName", "testing").And().OrderByDescending("createDate");

var test = Umbraco.TypedSearch(query.Compile()).Take(10);

foreach (var item in test)
{

}

//search with pure examine   
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost").And().Field("nodeName", "testing").And().OrderByDescending("createDate");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile()).Take(10);

foreach (var item in test)
{

}

//searchwith pure examine and fetch ipublishedcontent
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost").And().Field("nodeName", "testing").And().OrderByDescending("createDate");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile()).Take(10);

foreach (var item in test)
{
    var content = Umbraco.TypedContent(item.Id);
}

And the results:

Method	Avg Time (ms)	Avg Under Load (ms)	Difference (ms)	% Worse (rounded to nearest %)
Linq	260.28506	559.4626	339.17754	130%
XSLT	67.59894	127.5567	59.95776	87%
TypedSearch	121.47725	412.2094	290.73215	239%
Pure Examine	2.09768	4.4403	2.34262	112%
Pure Examine and Get the IPublishedContent	2.75111	5.9849	3.23379	118%

And the results are pretty much as expected. Pure Examine wins, but if you need to get the IPublishedContent to process, it’s not all that much slower in the grand scheme of things. Linq, as expected, is a bit slow due to having to load everything into memory, multiple times. XPath isn’t that bad to be honest, and TypedSearch is again, a bit crap for something that wraps Examine.

And Finally….

So, in conclusion, for most small queries on your site well crafted Linq, or efficient XPAth is the way to go. For anything more complex, where you need to actually search/filter, or do efficient paging, Examine is DEFNITELY the way to go. In most real world situations, you won’t see quite as extreme times as I got here, but that was the point of the tests, to see how well the methods performed in the worst case scenario!

I was genuinely surprised by just how badly TypedSearch performed, I may have to have a look at the code for that and submit a PR to see if I can make it more efficient, as for most searches you might want to use it for, pure Examine is MUCH faster.

I hope this is useful and helps you to see the difference in performance between the different ways of querying Umbraco!

Become a Patron!

Testing the Performance of Querying Umbraco

Here Comes the Science Part (Sort of)

Getting ALL the Things

Getting the Latest Things

Searching All the Things

And Finally….

Umbraco's Guide to the WordPress Gallery