Testing the Performance of Querying Umbraco

In Umbraco, it’s common knowledge that there’s always AT LEAST three ways of doing the same thing and it’s not always obvious to newbies, which is the right one to use.

One of the biggest culprits for this is querying Umbraco for content. These are all of the ways of getting content that I know about: 

  • The old Node API (deprecated, you shouldn’t be using this)
  • The old Document API (also, deprecated)
  • The Content Service (not for the front end, designed for in code CRUD operations)
  • The Dynamic Content methods in the UmbracoHelper (I’m not a fan of these, I prefer the strongly typed ones)
  • The Strongly Typed methods in the UmbracoHelper
  • The TypedSearch method of the UmbracoHelper, that wraps Examine
  • Examine

That’s a LOT of different ways to do the same thing! Recently at uWestFest, Jason Prothero and Mark Bowser did a very interesting talk about Anti-Patterns that included some tests on querying Umbraco with the various methods. I thought it would nice to expand on this a bit and try and help people decide which method is right for them.

 

Here Comes the Science Part (Sort of)

For this highly scientific and rigorous testing (ahem) I set up the Fanoe starter kit, and I added 5000 blog posts. I then created a bunch of tests based on the most commonly used ways of getting content, plus the Content Service. BUT YOU SHOULDN’T USE THE CONTENT SERVICE FOR QUERYING!!!! I hear you shout in horror, and you’re right. But I thought it would be good to include it (at least early on) to illustrate just how bad it is.

I settled on using the following methods to test:

  • Getting the direct children of a node
  • Using a nasty LINQ query on the TypedContent API
  • Using the Descendants method on the root of the site
  • Using GetContentByXpath with an inefficient query
  • Using GetContentByXPath with an efficient query
  • Using the TypedSearch method of the UmbracoHelper
  • Using raw Examine

I ignored the two deprecated methods and the dynamic API, as they’re either outdated or not recommended.

I ran each test multiple times and timed the code, and then took an average of the results. I also simulated heavy load on the site to see how the methods perform under load, by firing 1000 simultaneous async requests at each method and averaging the results to see how much of a difference that makes. I used the site with 5000 blog posts, as that provides a fairly extreme example, and it also magnifies the performance and makes it easier to see how good/bad things are.

I tested just getting all of the blog posts (not looping through the returned content), I tested getting all the records and looping through them all, I tested getting the most recent 10 records, with and without looping, and finally I tested performing a search to get all records, and a search to get all records in an efficient paged manner. For the first two sets of tests, I also tested the Content Service, just to show how badly it ran.

I noticed a few interesting things while testing. Umbraco seems to do some per request caching for the TypedContent methods of the UmbracoHelper. The first time you run the query it will take longer, but any subsequent identical queries on the same request are MUCH faster. There’s still an overhead for needlessly repeating the same code, but Umbraco mitigates it a bit. I also noticed that the TypedSearch method is really bad (more on that later). Finally, the inefficient XPATH query isn’t always slower than the efficient one. On further investigation, that turned out to be because the site is mostly blog posts. Trying similar queries on a more complex site structure shows that the efficient query is noticeably faster than the inefficient one.

The machine I tested on was a 2.8GHz i7 with 8GB of RAM.

Getting ALL the Things

First up, testing getting all of the blog posts for the site (with and without looping through the records) used the following code:

//bad linq
var test = Umbraco.TypedContentAtRoot().First().Descendants().Where(a => a.DocumentTypeAlias == "blogPost");

//all children
var test = Umbraco.TypedContent(1085).Children();

//get by descendant
var test = Umbraco.TypedContentAtRoot().DescendantsOrSelf("BlogPost");

//greedy XPath
var test = Umbraco.TypedContentAtXPath("//BlogPost [@isDoc]");

//efficient XPath
var test = Umbraco.TypedContentAtXPath("root/Home/BlogPostRepository/BlogPost [@isDoc]");

//typed Search
var criteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = criteria.Field("nodeTypeAlias", "blogPost");

var test = Umbraco.TypedSearch(query.Compile());

//pure Examine
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile());

//content service
var test = Services.ContentService.GetContentOfContentType(1068);

So, how did they perform? Lets have a look.

Method

Avg Time (ms)

Avg Under Load (ms)

Difference (ms)

% Worse (rounded to nearest %)

Direct Children

218.9102

288.3253

69.41438

32%

Bad Linq

0.17853

0.235

0.05497

31%

Descendants From Root

0.03694

0.0824

0.04546

123%

XPATH Greedy

0.03571

0.04321

0.00739

21%

XPATH Efficient

0.03521

0.0546

0.01939

55%

TypedSearch

583.27079

N/A

N/A

N/A

Examine

3.59746

7.1039

3.50644

97%

Content Service

1991.05889

N/A

N/A

N/A

So how did we do? Well, the content service is clearly the worst performer, as it hits the database quite heavily. It’s over 3 times slower than the next worst, which is TypedSearch. Unsurprisingly, the XPath queries are the fastest. The performance on most of the other methods is actually pretty acceptable.

It’s interesting to note that both the Content Service and TypedSearch start to crash under load, returning out of memory exceptions fairly quickly. Direct children doesn’t come off that great, but I think that’s because it must pre-load all of the children.

Next up, how do they fare when we loop through all 5000 returned things?

Method

Avg Time (ms)

Avg Under Load (ms)

Difference (ms)

% Worse (rounded to nearest %)

Direct Children

202.38049

287.6374

85.25691

42%

Bad Linq

265.64

226.8708

66.8801

29%

Descendants From Root

226.64352

281.7007

55.05718

24%

XPATH Greedy

207.50742

260.0227

52.51528

25%

XPATH Efficient

198.73297

276.3727

77.63973

39%

TypedSearch

547.66416

N/A

N/A

N/A

Examine

352.2925

1694.7278

14342.4353

381%

Content Service

1933.66256

N/A

N/A

N/A

So, looping through the children isn’t much different, presumably because they’re all load in advance, and the same with the content service (which is still the slowest). All the others are slower. It’s interesting to note that looping through 5000 records in Examine is pretty slow, and it performs REALLY badly under load.

That said, this is a pretty contrived test, in the real world, it’s unlikely you’ll need to get 5000 records and loop through them all at once, unless you’re a crazy person maybe (or generating a VERY comprehensive sitemap). For most simple content queries that don’t return insane amounts of data, you can happily use Linq, direct children and descendants (if used CAREFULLY). The XPath methods are the fastest though, especially if you can craft a reasonably efficient query.

Getting the Latest Things

Now we’ll try something more real world. Lets say we want the latest 10 blog posts. That’s something you see a lot. This is a good example, as the more blog posts that you have, the slower the queries will become. For these tests, I binned the Content Service, as we’ve already proved that it’s crap for querying on.

For this one, I’m just looping through the records. For the curious, the code for each method is below:

//bad linq
var latest = Umbraco.TypedContentAtRoot().First().Descendants().Where(a => a.DocumentTypeAlias == "blogPost").OrderByDescending(a => a.CreateDate).Take(10);

//all children
var latest = Umbraco.TypedContent(1085).Children().OrderByDescending(a => a.CreateDate).Take(10);

//get by descendant
        
var test = Umbraco.TypedContentAtRoot().DescendantsOrSelf("BlogPost").OrderByDescending(a => a.CreateDate).Take(10);

//greedy XPath
var test = Umbraco.TypedContentAtXPath("//BlogPost [@isDoc]").OrderByDescending(a => a.CreateDate).Take(10);

//efficient XPath
var test = Umbraco.TypedContentAtXPath("root/Home [@isDoc]/BlogPostRepository [@isDoc]/BlogPost [@isDoc]").OrderByDescending(a => a.CreateDate);

//typed search
var criteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = criteria.Field("nodeTypeAlias", "blogPost").And().OrderByDescending("createDate");

var test = Umbraco.TypedSearch(query.Compile()).Take(10);

//pure Examine
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost").And().OrderByDescending("createDate");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile()).Take(10);

And the results were as follows:

Method

Avg Time (ms)

Avg Under Load (ms)

Difference (ms)

% Worse (rounded to nearest %)

Direct Children

208.32397

512.3234

303.99943

146%

Bad Linq

239.97

571.615

331.644

138%

Descendants From Root

235.54425

547.1448

311.60055

132%

XPATH Greedy

211.44828

509.9365

298.48822

141%

XPATH Efficient

206.1111

494.3311

288.22

140%

TypedSearch

546.99595

N/A

N/A

N/A

Examine

4.32127

8.2108

3.88953

90%

Notably, every single method apart from Examine is as bad or worse than looping through all the records. This is because in order to do the Linq sorting, all of the records must first be loaded into memory and sorted. The performance under load is also MUCH worse. Pure Examine however, is a BEAST. It’s 48 times faster than the next fastest method, and 126 times faster than the slowest method.

I’m surprised by just how badly TypedSearch behaves, as it’s a wrapper round Examine. But the secret here is that Pure Examine has special methods for Skip and Take that make it super fast, as it doesn’t need to load everything, but the TypedSearch method does not expose these.

I added some code to Examine code to use Umbraco.TypedContent() to get each of the 10 returned items, and it was still blisteringly quick. So the takeaway here is that if you need a subset of your content, and ordering is important, Examine is the way to go.

If you’re building navigation for example, and listing children and sorting them, as long as your section doesn’t have 1000’s of sub-pages, using the Children method with a sort (if needed) will be fine.

Searching All the Things

Finally, I tested a simple search. I searched for all blogposts that contain “testing” in the title (100 of the 5000 blog posts on my test site). Ordered them by latest, and then took 10 records, to simulate a properly paged search.

In this case, the obvious choice is probably going to be Examine, as that’s what it was built for, but for the sake of argument, I’ve done it using some other methods too.

Once again, here’s some code:

//search with linq
var test = Umbraco.TypedContentAtRoot().DescendantsOrSelf("BlogPost").Where(a => a.Name.Contains("testing")).OrderByDescending(a => a.CreateDate).Take(10);

foreach (var item in test)
{

}

//search with xpath
var test = Umbraco.TypedContentAtXPath("root/Home/BlogPostRepository/BlogPost [@isDoc and contains(@nodeName, 'testing')]").OrderByDescending(a => a.CreateDate).Take(10);

foreach (var item in test)
{

}
      
//search with typed content      
var criteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = criteria.Field("nodeTypeAlias", "blogPost").And().Field("nodeName", "testing").And().OrderByDescending("createDate");

var test = Umbraco.TypedSearch(query.Compile()).Take(10);

foreach (var item in test)
{

}

//search with pure examine   
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost").And().Field("nodeName", "testing").And().OrderByDescending("createDate");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile()).Take(10);

foreach (var item in test)
{

}

//searchwith pure examine and fetch ipublishedcontent
var searchCriteria = ExamineManager.Instance.DefaultSearchProvider.CreateSearchCriteria(IndexTypes.Content);

var query = searchCriteria.Field("nodeTypeAlias", "blogPost").And().Field("nodeName", "testing").And().OrderByDescending("createDate");

var test = ExamineManager.Instance.DefaultSearchProvider.Search(query.Compile()).Take(10);

foreach (var item in test)
{
    var content = Umbraco.TypedContent(item.Id);
}

And the results:

Method

Avg Time (ms)

Avg Under Load (ms)

Difference (ms)

% Worse (rounded to nearest %)

Linq

260.28506

559.4626

339.17754

130%

XSLT

67.59894

127.5567

59.95776

87%

TypedSearch

121.47725

412.2094

290.73215

239%

Pure Examine

2.09768

4.4403

2.34262

112%

Pure Examine and Get the IPublishedContent

2.75111

5.9849

3.23379

118%

And the results are pretty much as expected. Pure Examine wins, but if you need to get the IPublishedContent to process, it’s not all that much slower in the grand scheme of things. Linq, as expected, is a bit slow due to having to load everything into memory, multiple times. XPath isn’t that bad to be honest, and TypedSearch is again, a bit crap for something that wraps Examine.

And Finally….

So, in conclusion, for most small queries on your site well crafted Linq, or efficient XPAth is the way to go. For anything more complex, where you need to actually search/filter, or do efficient paging, Examine is DEFNITELY the way to go. In most real world situations, you won’t see quite as extreme times as I got here, but that was the point of the tests, to see how well the methods performed in the worst case scenario!

I was genuinely surprised by just how badly TypedSearch performed, I may have to have a look at the code for that and submit a PR to see if I can make it more efficient, as for most searches you might want to use it for, pure Examine is MUCH faster.

I hope this is useful and helps you to see the difference in performance between the different ways of querying Umbraco!

About the Author

Tim Payne is a freelance developer, based in the north of England with over 15 years of experience who’s been working with Umbraco since v4. When he’s not slaving in the code mines, he can be found participating in crazy challenges because they seemed like a good idea at the time, or running around after his tiny daughter.

comments powered by Disqus