Issues

Making Searches More Friendly with Hunspell Dictionaries

For a lot of the sites we build at Limbo, we have some kind of global search that visitors can use to find the content they are looking for.

That is at least the goal. However when building these searches, we often see that visitors may have misspelled one or more words, and then end up not finding what they were looking for.

Each word may also have a number of different variations or inflections - eg. if the visitor is searching for bikes (plural), the search results will not include pages only including bike (singular).

As some of our clients required us to handle those scenarios, we started looking at various solutions, and ended up with using something called Hunspell dictionaries. To quote Wikipedia:

Hunspell is a spell checker and morphological analyser designed for languages with rich morphology and complex word compounding and character encoding, originally designed for the Hungarian language.

Hunspell and Hunspell dictionaries lets us do spell checks, get suggestions for misspelled as well as other useful grammatical features - eg. what is referred to as stemming and morphing (we'll get back to that later).

  • Spell Checks
    This lets us check whether a given word is spelled correctly or not. For instance, we could show a message to the user that the entered word is likely spelled incorrectly.

  • Suggestions
    If a given word is spelled incorrectly, we can ask the dictionary for words that are spelled in a similar way to the misspelled word. So instead of showing a message to the user that the entered word is spelled incorrectly, we can ask them if they meant the most likely suggestion instead.

  • Stemming
    Stemming is about finding the stem of a word - eg. going from running to run, or from bikes to bike. On its own, we can't really use this for much, for it's an intermediary step needed for morphing.

  • Morphing
    Morphing (or perhaps more typically referred to as inflection) is the concept of finding the "variations" or forms of a word based on its stem. Eg. if the user enters running, we can use the dictionary to determine that the stem is run, and by morphing the stem, we can then search for runranruns and running instead of only searching for running, as the user originally entered.

    Unfortunately morphing is not widely supported, so this article will show examples using a Danish dictionary file. More about that later.

Third Party Frameworks

Hunspell is originally a C++ executable and library. Applications like Open Office, Chrome, Firefox and others are all using Hunspell dictionariess to provide spell checking as you type.

Since Umbraco is part of the .NET eco system, we couldn't use the Hunspell C++ library directly, so we had to look at alternatives for .NET - and luckily there are a few .NET ports. We ended up looking at the following two third party libraries:

  • NHunspell
    NHunspell is a .NET wrapper for the original C++ Hunspell libraries, and as such it targets the older .NET Framework. The latest release of NHunspell was back in 2015, but to this day, it still seems to be the .NET port with the widest feature set.

    At the time we were looking for a port we could use with Umbraco 7, so limiting ourselves to the .NET Framework wasn't really an issue, so we ended up choosing NHunspell.

  • WeCantSpell.Hunspell
    WeCantSpell.Hunspell is a standalone port for .NET Standard, which can then be used in for instance both .NET Framework and .NET 5. Like with NHunspell, WeCantSpell.Hunspell no longer seems to be actively maintained, with latest release back in 2018.

    According to their own performance benchmarks, WeCantSpell.Hunspell is quite slower than NHunspell. The benchmarks looks at how many dictionaries you can load or how many words you can check in a single second, which is isn't really a real use case, so the readme for WeCantSpell.Hunspell says, the performance of the library is "Good enough I guess". With the way we're using the package, I agree that if any, the different in performance will only have a rather small effect.

It's a big downside that both packages are no longer maintained. But as NHunspell still managed to do what we needed it for, and it worked better than WeCantSpell.Hunspell, we ended up choosing NHunspell for Umbraco 7.

However when picking this up again for Umbraco 9, which runs on .NET 5, we could no longer use NHunspell (as it targets the .NET Framework), so it kind of forced our hand in choosing WeCantSpell.Hunspell instead.

Hunspell Concept

Regardless of which Hunspell package you're using, you need a dictionary file (.dic) as well as a corresponding affix file (.aff) for the language you wish to use it for (eg. one set of files for English, and another set of files for Danish).

The dictionary file contains a long list of all the words in the given Language. Each line in the file describes a single word along with some rules for how the word can be transformed. For instance in the the English (British).dic file, the line for bike is bike/RMSGD, where RMSGD is then the rules for how bike can be transformed/inflected.

The affix file then lists all the different rules. As the languages differ, the English (British).aff file is very different from the Danish.aff file.

The dictionary file and affix file may also be accompanied with other files for doing hyphenations and finding synonym words, but this is out of the scope of this article.

Later in this article I will describe our Skybrud.TextAnalysis package. You can see the package's documentation on how to find a dictionary for your language.

Hunspell Operations

Loading a Dictionary File

Loading a dictionary file works a bit different when comparing the two packages. For instance, since NHunspell is a wrapper for an unmanaged C++ library, we need to wrap it in a using block like this:

@using System.Web.Hosting
@using NHunspell

@{
   
    // Map the path to the dictionary and affix files
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/English (British).dic");
    string aff = HostingEnvironment.MapPath("~/App_Data/Hunspell/English (British).aff");

    // Load the affix and dictionary files
    using (Hunspell hunspell = new Hunspell(aff, dic)) {
        
        // Do something useful with the dictionary

    }

}

Using the NHunspell package, the entry point for working with the dictionary files is the Hunspell class. In the example above, we're loading the dictionary files directly from the disk, via the paths to the two files respectively. The NHunspell class also has other ways to load the dictionary files - eg. from a stream.

The WeCantSpell.Hunspell package instead revolves around the WordList class. We can get a new instance by calling the static WordList.CreateFromFiles method with the path to the .dic file, and it will load both the .dic file as well as the .aff file assuming it exists in the same directory with a similar name.

@using System.Web.Hosting
@using WeCantSpell.Hunspell

@{

    // Map the path to the dictionary
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/English (British).dic");

    // Load the dictionary file (it will automatically find the affix file)
    WordList dictionary = WordList.CreateFromFiles(dic);

}

Given the way the WordList is implemented, there is no need to wrap it in a using block or dispose it once you're done using it.

Performing Spell Checks

Starting out simple, both packages lets you check if a given word is spelled correctly - eg. here with NHunspell, we can check whether the word Recommendation is spelled correctly:

@using System.Web.Hosting
@using NHunspell

@{
   
    // Map the path to the dictionary and affix files
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/English (British).dic");
    string aff = HostingEnvironment.MapPath("~/App_Data/Hunspell/English (British).aff");

    // Load the affix and dictionary files
    using (Hunspell hunspell = new Hunspell(aff, dic)) {
        
        // Check whether the word is spelled correctly
        bool correct = hunspell.Spell("Recommendation");
        
        // Write the result to the page
        <p>Recommendation is spelled @(correct ? "correct" : "not correct")</p>

    }

}

and the same with WeCantSpell.Hunspell:

@using System.Web.Hosting
@using WeCantSpell.Hunspell

@{

    // Map the path to the dictionary
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/English (British).dic");

    // Load the dictionary file (it will automatically find the affix file)
    WordList dictionary = WordList.CreateFromFiles(dic);

    // Check whether the word is spelled correctly
    bool correct = dictionary.Check("Recommendation");
        
    // Write the result to the page
    <p>Recommendation is spelled @(correct ? "correct" : "not correct")</p>

}

Finding suggestions for misspelled words

If a given word is spelled incorrectly, the Hunspell dictionaries can also be used to find suggestions similar words. Here with NHunspell:

@using System.Web.Hosting
@using NHunspell

@{

    // Declare a variable with the word we wish to check
    string word = "programmr";
   
    // Map the path to the dictionary and affix files
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/English (British).dic");
    string aff = HostingEnvironment.MapPath("~/App_Data/Hunspell/English (British).aff");

    // Load the affix and dictionary files
    using (Hunspell hunspell = new Hunspell(aff, dic)) {

        // Check whether the word is spelled correctly
        bool correct = hunspell.Spell(word);

        if (correct) {

            // Write the result to the page
            <p><strong>@word</strong> is spelled correctly.</p>

        } else {

            // Write the result to the page
            <p><strong>@word</strong> is not spelled correctly.</p>

            // Get suggestions for a misspelled word
            List<string> suggestions = hunspell.Suggest(word);

            if (suggestions.Any()) {
                <h4>Suggestions</h4>
                <ul>
                    @foreach (string suggestion in suggestions) {
                        <li>@suggestion</li>
                    }
                </ul>
            } else {
                <p><em>No suggestions found.</em></p>
            }

        }

    }

}

And the same with WeCantSpell.Hunspell:

@using System.Web.Hosting
@using WeCantSpell.Hunspell

@{

    // Declare a variable with the word we wish to check
    string word = "programmr";

    // Map the path to the dictionary
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/English (British).dic");

    // Load the dictionary file (it will automatically find the affix file)
    WordList dictionary = WordList.CreateFromFiles(dic);

    // Check whether the word is spelled correctly
    bool correct = dictionary.Check(word);

    if (correct) {

        // Write the result to the page
        <p><strong>@word</strong> is spelled correctly.</p>

    } else {

        // Write the result to the page
        <p><strong>@word</strong> is not spelled correctly.</p>

        // Get suggestions for a misspelled word
        IEnumerable<string> suggestions = dictionary.Suggest(word);

        if (suggestions.Any()) {
            <h4>Suggestions</h4>
            <ul>
                @foreach (string suggestion in suggestions) {
                    <li>@suggestion</li>
                }
            </ul>
        } else {
            <p><em>No suggestions found.</em></p>
        }

    }

}

Finding the stem of a word

With the NHunspell package, we can first use the Spell method to check whether the word running is spelled correctly, and then the Stem method to find the stem.

In theory a word can have multiple meanings, and therefore the Stem method may also return multiple stem words.

@using System.Web.Hosting
@using NHunspell

@{

    // Declare a variable with the word we wish to check
    string word = "running";
   
    // Map the path to the dictionary and affix files
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/English (British).dic");
    string aff = HostingEnvironment.MapPath("~/App_Data/Hunspell/English (British).aff");

    // Load the affix and dictionary files
    using (Hunspell hunspell = new Hunspell(aff, dic)) {

        // Check whether the word is spelled correctly
        bool correct = hunspell.Spell(word);

        if (correct) {

            // Write the result to the page
            <p><strong>@word</strong> is spelled correctly.</p>

            // Find the stem(s) of "word
            List<string> stems = hunspell.Stem(word);
            
            <h4>Stem</h4>
            if (stems.Any()) {
                <ul>
                    @foreach (string stem in stems) {
                        <li>@stem</li>
                    }
                </ul>
            } else {
                <p><em>No stems found.</em></p>
            }

        } else {

            // Write the result to the page
            <p><strong>@word</strong> is not spelled correctly.</p>

            // Get suggestions for a misspelled word
            List<string> suggestions = hunspell.Suggest(word);
            
            <h4>Suggestions</h4>
            if (suggestions.Any()) {
                <ul>
                    @foreach (string suggestion in suggestions) {
                        <li>@suggestion</li>
                    }
                </ul>
            } else {
                <p><em>No suggestions found.</em></p>
            }

        }

    }

}

The approach for WeCantSpell.Hunspell changes a bit, as it doesn't have a similar Stem method. Instead we have to use the CheckDetails (opposed to the Check method we used earlier).

The CheckDetails method returns a bit more information. If the word is spelled correctly, we can get the stem via spell.Root:

@using System.Web.Hosting
@using WeCantSpell.Hunspell

@{

    // Declare a variable with the word we wish to check
    string word = "running";

    // Map the path to the dictionary
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/en-GB.dic");

    // Load the dictionary file (it will automatically find the affix file)
    WordList dictionary = WordList.CreateFromFiles(dic);

    // Check whether the word is spelled correctly
    SpellCheckResult spell = dictionary.CheckDetails(word);

    if (spell.Correct) {

        // Write the result to the page
        <p><strong>@word</strong> is spelled correctly.</p>

        // Get the stem from the spell result
        string stem = spell.Root;

        <h4>Stem</h4>
        <ul>
            <li>@stem</li>
        </ul>

    } else {

        // Write the result to the page
        <p><strong>@word</strong> is not spelled correctly.</p>

        // Get suggestions for a misspelled word
        IEnumerable<string> suggestions = dictionary.Suggest(word);

        if (suggestions.Any()) {
            <h4>Suggestions</h4>
            <ul>
                @foreach (string suggestion in suggestions) {
                    <li>@suggestion</li>
                }
            </ul>
        } else {
            <p><em>No suggestions found.</em></p>
        }

    }

}

Morphing

This is where things get a bit tricky. Like I mentioned earlier, morphing is not really widely supported. By using the native C++ executable, it possible to call hunspell unfurl running to get the variations of running.

The NHunspell package features an Analyze method. When I initially started researching Hunspell dictionaries a few years back, I think I had this working with an English dictionary file. But honestly I can't remember for sure. When trying to do that again now, I can't get it to work. Maybe I used a different dictionary file back then, or maybe it didn't ever really work? 🤷‍♂️

I haven't been able to find similar functionality in the WeCantSpell.Hunspell package. 😢

Skybrud.TextAnalysis

Both to wrap the NHunspell package, create our own morphing implementation, and to work around a few issues, I ended up creating my own Skybrud.TextAnalysis package, which you can download from NuGet:

Install-Package Skybrud.TextAnalysis

Version 1.x of the package wraps the NHunspell package, and may be used in Umbraco 7 and 8, but not Umbraco 9 since it's targeting the .NET Framework. Version 2.x of the package instead wraps the WeCantSpell.Hunspell package, which we can use for Umbraco 9, since it's targeting .NET Standard, it may also be used with earlier versions of Umbraco.

For the remainder of this article, when I'm describing our Skybrud.TextAnalysis package, it's the 2.x version wrapping the WeCantSpell.Hunspell package.

Yoy can find a bit of documentation for the package here, although documentation is quite similar to the bits and pieces described throughout this article.

Loading the dictionary

The package revolves around the HunspellTextAnalyzer class, which then serves as a wrapper for the underlying WeCantSpell.Hunspell package as well as adds our own implementation on top of it.

You can load an instance of the HunspellTextAnalyzer from the .dic and .aff files - eg. for en-US in the example below:

@using System.Web.Hosting
@using Skybrud.TextAnalysis.Hunspell

@{
   
    // Map the path to the dictionary and affix files
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/en-US.dic");
    string aff = HostingEnvironment.MapPath("~/App_Data/Hunspell/en-US.aff");

    // Load a new text analyzer (Hunspell wrapper)
    HunspellTextAnalyzer analyzer = HunspellTextAnalyzer.CreateFromFiles(dic, aff);

}

Stemming and compound words

Compound words is the concept of putting two or more words together - eg. summer house is put together from summer and house. Some languages (like English) separate the words with a space, while other langauges (like Danish) don't use a separator. For instance, summer house in Danish is sommerhus, not sommer hus.

In the Danish dictionary, some words like sommerhus and avisredaktør (newspaper editor) are explicitly defined in the dictionary, while other words are only partly defined. For instance the word webredaktør (web editor) isn't defined in the dictionary, but redaktør (editor) is defined in the dictionary along with a rule that it may have an optional prefix - eg. web so it becomes webredaktør.

As a result of this, stemming have some mixed results in Danish. Getting the stem of webredaktører (web editors) results in redaktør, and not webredaktør as one would expect.

To work around this issue, our own HunspellTextAnalyzer.Stem method finds the stem via the logic in the WeCantSpell.Hunspell package. But for each stem found, we check whether the stem is at the start of the input word. If it isn't, we prepend the starting value to the stem - eg. like this:

public HunspellStemResult[] Stem(string word) {

    List<HunspellStemResult> temp = new List<HunspellStemResult>();

    foreach (string stem in Hunspell.Stem(word)) {
        int pos = word.IndexOf(stem, StringComparison.InvariantCultureIgnoreCase);
        temp.Add(new HunspellStemResult(stem, pos > 0 ? word.Substring(0, pos) : null));
    }

    return temp.ToArray();

}

Our version of the Stem method returns an array of HunspellStemResult, which then has the Prefix (eg. web) and Stem (eg. redaktør) properties, as well as a Value property which is a mix of Prefix and Stem (eg. webredaktør). Then later when morphing, we only morph the stem, and then prepend the prefix to each of the variants returned by the morph operation.

Morphing with a Danish dictionary

Like I wrote earlier, I think I once had morphing working with NHunspell and an English dictionary file, but I can't get it working now. But I absolutely couldn't get it working with the Danish dictionary, so since we needed this for our clients, I ended implementing my own Morph method, which would read and parse the rules of the affix file.

The logic necessary for parsing the rules of the affix file is rather complex, so I won't go in detail with this here. But the rules of the British affix file is using a different format compared to the Danish affix file, so my parsing logic unfortunately doesn't work with the English affix file. I haven't tested it against other languages. So as a result of this, the example below only works with the Danish affix file:

@using System.Web.Hosting
@using Skybrud.TextAnalysis.Hunspell
@using Skybrud.TextAnalysis.Hunspell.Stem
@{
   
    // Map the path to the dictionary and affix files
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/da-DK.dic");
    string aff = HostingEnvironment.MapPath("~/App_Data/Hunspell/da-DK.aff");

    // Load a new text analyzer (Hunspell wrapper)
    HunspellTextAnalyzer analyzer = HunspellTextAnalyzer.CreateFromFiles(dic, aff);

    // Get the stems from "webredaktør"
    HunspellStemResult[] stems = analyzer.Stem("webredaktør");

    // Iterate through the stems (there is only one for this word)
    foreach (HunspellStemResult stem in stems) {

        // Print the combined value of prefix and stem
        <h3>@stem.Value</h3>
            
        // Get the variants of "webredaktør" through the Morph method
        string[] variants = analyzer.Morph(stem);

        // Iterate through the variants
        <ul>
            @foreach (string variant in variants) {
                <li>@variant</li>
            }
        </ul>

    }

}

This results in the following variants/inflections:

  • webredaktør
  • webredaktørs
  • webredaktører
  • webredaktørerne
  • webredaktørernes
  • webredaktørers
  • webredaktøren
  • webredaktørens

So getting back to searching, we can do an OR-based search for all of these variants rather than just webredaktør.

Expanding search queries

Putting all this together, I invented something called expanding. So if the user searches for sommer, we expand the word, and search for all variants of sommer. But as searches are likely to involve more than one search for, we expand each of the words.

Like I described earlier, compound words are spelled without any separator in Danish. But it's a very common mistake to still use a space as separator, which our search should ideally account for. So if the user searches for sommer hus, which should be sommerhus, we search for variations of both sommer and hus, but also use the spellchecker to check whether to mix of these two words (sommerhus) forms a correctly spelled word, we search for variations of that word as well.

If the user searches for somerhus, we detect that it is spelled incorrectly, and then searches for somerhus, but also variants of the correctly spelled sommerhus. On the off chance that the input word is actually spelled correctly, but unknown to the dictionary, we still search for the "misspelled" word.

To expand a search text in code, you can use the HunspellTextAnalyzer.Expand method like this:

@using System.Web.Hosting
@using Skybrud.TextAnalysis.Hunspell
@using Skybrud.TextAnalysis.Hunspell.Expand
@{
   
    // Map the path to the dictionary and affix files
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/da-DK.dic");
    string aff = HostingEnvironment.MapPath("~/App_Data/Hunspell/da-DK.aff");

    // Load a new text analyzer (Hunspell wrapper)
    HunspellTextAnalyzer analyzer = HunspellTextAnalyzer.CreateFromFiles(dic, aff);
        
    // Expand the search text
    HunspellExpandResult extend = analyzer.Expand(new HunspellExpandOptions {
        Text = "sommer hus",
        CaseInsentive = true,
        MaxDistance = 3
    });

    // Print the raw Examine query
    <pre>@extend.Query.ToRawQuery(new []{ "nodeName" })</pre>

}

The Expand method takes an instance of HunspellExpandOptions, where the Text property should be set to the search phrase.

As the name of the CaseInsentive property indicates, it can be used to specify whether the expand operation should be case-insensitive (default is true). For instance if a user searches for my name, but in lowercase (anders), it's technically spelled incorrectly, and the Suggest method then returns the following suggested words:

  • enders (1)
  • angers (1)
  • aners (1)
  • andres (2)
  • ardens (2)
  • Anders (1)
  • andets (1)
  • tanders (1)
  • kanders (1)
  • anoders (1)
  • banders (1)
  • panders (1)
  • Sanders (1)
  • Randers (1)
  • Zanders (1)

That would also mean that we would search for variants of all these words, which would result in our search returning a lot more, but less related results. If case insensitivity is enabled, my package will look for a suggestion that is spelled exactly the same way, but with a different casing, and if so, ignore the other suggestions. So if the user searches for anders, we'll just search for Anders, and ignore the other suggestions.

The MaxDistance property indicates the maximum allowed Levenshtein distance between the input word and the suggested words, so for instance if set to 1, and searching for anders, the search will ignore the suggestions andres and ardens (as the Levensthein distance is 2 for each word, whereas all the others only have a distance of 1). Setting this property to a positive number can therefore help avoid more unlikely suggestions. Setting the property to 0 will disable this check.

The Expand method returns an instance of HunspellExpandResult, where the Query property can be used for generating a raw Examine query with the expanded words. We usually have a few different fields that we are searching - eg. nodeNameteaser and blockContent. So we can specify these field aliases to the ToRawQuery method.

With just nodeName, it can look like this:

(((((nodeName:(sommer sommer*)) OR (nodeName:(sommers sommers*))) AND ((nodeName:(hus hus*)) OR (nodeName:(huset huset*)) OR (nodeName:(husets husets*)) OR (nodeName:(hus' hus'*)) OR (nodeName:(husene husene*)) OR (nodeName:(huse huse*)) OR (nodeName:(huses huses*)) OR (nodeName:(husenes husenes*)) OR (nodeName:(huse huse*)) OR (nodeName:(husede husede*)) OR (nodeName:(huset huset*)) OR (nodeName:(huser huser*)) OR (nodeName:(husende husende*)) OR (nodeName:(huses huses*)) OR (nodeName:(hu hu*)) OR (nodeName:(huen huen*)) OR (nodeName:(hus hus*)))) OR ((nodeName:(sommerhus sommerhus*)) OR (nodeName:(sommerhuset sommerhuset*)) OR (nodeName:(sommerhusets sommerhusets*)) OR (nodeName:(sommerhus' sommerhus'*)) OR (nodeName:(sommerhusene sommerhusene*)) OR (nodeName:(sommerhuse sommerhuse*)) OR (nodeName:(sommerhuses sommerhuses*)) OR (nodeName:(sommerhusenes sommerhusenes*)))))

And with nodeNameteaser and blockList, it can look like this:

(((((nodeName:(sommer sommer*) OR teaser:(sommer sommer*) OR blockContent:(sommer sommer*)) OR (nodeName:(sommers sommers*) OR teaser:(sommers sommers*) OR blockContent:(sommers sommers*))) AND ((nodeName:(hus hus*) OR teaser:(hus hus*) OR blockContent:(hus hus*)) OR (nodeName:(huset huset*) OR teaser:(huset huset*) OR blockContent:(huset huset*)) OR (nodeName:(husets husets*) OR teaser:(husets husets*) OR blockContent:(husets husets*)) OR (nodeName:(hus' hus'*) OR teaser:(hus' hus'*) OR blockContent:(hus' hus'*)) OR (nodeName:(husene husene*) OR teaser:(husene husene*) OR blockContent:(husene husene*)) OR (nodeName:(huse huse*) OR teaser:(huse huse*) OR blockContent:(huse huse*)) OR (nodeName:(huses huses*) OR teaser:(huses huses*) OR blockContent:(huses huses*)) OR (nodeName:(husenes husenes*) OR teaser:(husenes husenes*) OR blockContent:(husenes husenes*)) OR (nodeName:(huse huse*) OR teaser:(huse huse*) OR blockContent:(huse huse*)) OR (nodeName:(husede husede*) OR teaser:(husede husede*) OR blockContent:(husede husede*)) OR (nodeName:(huset huset*) OR teaser:(huset huset*) OR blockContent:(huset huset*)) OR (nodeName:(huser huser*) OR teaser:(huser huser*) OR blockContent:(huser huser*)) OR (nodeName:(husende husende*) OR teaser:(husende husende*) OR blockContent:(husende husende*)) OR (nodeName:(huses huses*) OR teaser:(huses huses*) OR blockContent:(huses huses*)) OR (nodeName:(hu hu*) OR teaser:(hu hu*) OR blockContent:(hu hu*)) OR (nodeName:(huen huen*) OR teaser:(huen huen*) OR blockContent:(huen huen*)) OR (nodeName:(hus hus*) OR teaser:(hus hus*) OR blockContent:(hus hus*)))) OR ((nodeName:(sommerhus sommerhus*) OR teaser:(sommerhus sommerhus*) OR blockContent:(sommerhus sommerhus*)) OR (nodeName:(sommerhuset sommerhuset*) OR teaser:(sommerhuset sommerhuset*) OR blockContent:(sommerhuset sommerhuset*)) OR (nodeName:(sommerhusets sommerhusets*) OR teaser:(sommerhusets sommerhusets*) OR blockContent:(sommerhusets sommerhusets*)) OR (nodeName:(sommerhus' sommerhus'*) OR teaser:(sommerhus' sommerhus'*) OR blockContent:(sommerhus' sommerhus'*)) OR (nodeName:(sommerhusene sommerhusene*) OR teaser:(sommerhusene sommerhusene*) OR blockContent:(sommerhusene sommerhusene*)) OR (nodeName:(sommerhuse sommerhuse*) OR teaser:(sommerhuse sommerhuse*) OR blockContent:(sommerhuse sommerhuse*)) OR (nodeName:(sommerhuses sommerhuses*) OR teaser:(sommerhuses sommerhuses*) OR blockContent:(sommerhuses sommerhuses*)) OR (nodeName:(sommerhusenes sommerhusenes*) OR teaser:(sommerhusenes sommerhusenes*) OR blockContent:(sommerhusenes sommerhusenes*)))))

As Examine support boosting specific fields, we can also include this in the generated query:

// Print the raw Examine query
<pre>@extend.Query.ToRawQuery(new List<Field> {
     new Field("nodeName", 50),
     new Field("teaser", 40),
     new Field("blockContent", 5)
})</pre>

which results in this monster of a query:

(((((nodeName:(sommer sommer*)^50 OR nodeName:(sommer sommer*) OR teaser:(sommer sommer*)^40 OR teaser:(sommer sommer*) OR blockContent:(sommer sommer*)^5 OR blockContent:(sommer sommer*)) OR (nodeName:(sommers sommers*)^50 OR nodeName:(sommers sommers*) OR teaser:(sommers sommers*)^40 OR teaser:(sommers sommers*) OR blockContent:(sommers sommers*)^5 OR blockContent:(sommers sommers*))) AND ((nodeName:(hus hus*)^50 OR nodeName:(hus hus*) OR teaser:(hus hus*)^40 OR teaser:(hus hus*) OR blockContent:(hus hus*)^5 OR blockContent:(hus hus*)) OR (nodeName:(huset huset*)^50 OR nodeName:(huset huset*) OR teaser:(huset huset*)^40 OR teaser:(huset huset*) OR blockContent:(huset huset*)^5 OR blockContent:(huset huset*)) OR (nodeName:(husets husets*)^50 OR nodeName:(husets husets*) OR teaser:(husets husets*)^40 OR teaser:(husets husets*) OR blockContent:(husets husets*)^5 OR blockContent:(husets husets*)) OR (nodeName:(hus' hus'*)^50 OR nodeName:(hus' hus'*) OR teaser:(hus' hus'*)^40 OR teaser:(hus' hus'*) OR blockContent:(hus' hus'*)^5 OR blockContent:(hus' hus'*)) OR (nodeName:(husene husene*)^50 OR nodeName:(husene husene*) OR teaser:(husene husene*)^40 OR teaser:(husene husene*) OR blockContent:(husene husene*)^5 OR blockContent:(husene husene*)) OR (nodeName:(huse huse*)^50 OR nodeName:(huse huse*) OR teaser:(huse huse*)^40 OR teaser:(huse huse*) OR blockContent:(huse huse*)^5 OR blockContent:(huse huse*)) OR (nodeName:(huses huses*)^50 OR nodeName:(huses huses*) OR teaser:(huses huses*)^40 OR teaser:(huses huses*) OR blockContent:(huses huses*)^5 OR blockContent:(huses huses*)) OR (nodeName:(husenes husenes*)^50 OR nodeName:(husenes husenes*) OR teaser:(husenes husenes*)^40 OR teaser:(husenes husenes*) OR blockContent:(husenes husenes*)^5 OR blockContent:(husenes husenes*)) OR (nodeName:(huse huse*)^50 OR nodeName:(huse huse*) OR teaser:(huse huse*)^40 OR teaser:(huse huse*) OR blockContent:(huse huse*)^5 OR blockContent:(huse huse*)) OR (nodeName:(husede husede*)^50 OR nodeName:(husede husede*) OR teaser:(husede husede*)^40 OR teaser:(husede husede*) OR blockContent:(husede husede*)^5 OR blockContent:(husede husede*)) OR (nodeName:(huset huset*)^50 OR nodeName:(huset huset*) OR teaser:(huset huset*)^40 OR teaser:(huset huset*) OR blockContent:(huset huset*)^5 OR blockContent:(huset huset*)) OR (nodeName:(huser huser*)^50 OR nodeName:(huser huser*) OR teaser:(huser huser*)^40 OR teaser:(huser huser*) OR blockContent:(huser huser*)^5 OR blockContent:(huser huser*)) OR (nodeName:(husende husende*)^50 OR nodeName:(husende husende*) OR teaser:(husende husende*)^40 OR teaser:(husende husende*) OR blockContent:(husende husende*)^5 OR blockContent:(husende husende*)) OR (nodeName:(huses huses*)^50 OR nodeName:(huses huses*) OR teaser:(huses huses*)^40 OR teaser:(huses huses*) OR blockContent:(huses huses*)^5 OR blockContent:(huses huses*)) OR (nodeName:(hu hu*)^50 OR nodeName:(hu hu*) OR teaser:(hu hu*)^40 OR teaser:(hu hu*) OR blockContent:(hu hu*)^5 OR blockContent:(hu hu*)) OR (nodeName:(huen huen*)^50 OR nodeName:(huen huen*) OR teaser:(huen huen*)^40 OR teaser:(huen huen*) OR blockContent:(huen huen*)^5 OR blockContent:(huen huen*)) OR (nodeName:(hus hus*)^50 OR nodeName:(hus hus*) OR teaser:(hus hus*)^40 OR teaser:(hus hus*) OR blockContent:(hus hus*)^5 OR blockContent:(hus hus*)))) OR ((nodeName:(sommerhus sommerhus*)^50 OR nodeName:(sommerhus sommerhus*) OR teaser:(sommerhus sommerhus*)^40 OR teaser:(sommerhus sommerhus*) OR blockContent:(sommerhus sommerhus*)^5 OR blockContent:(sommerhus sommerhus*)) OR (nodeName:(sommerhuset sommerhuset*)^50 OR nodeName:(sommerhuset sommerhuset*) OR teaser:(sommerhuset sommerhuset*)^40 OR teaser:(sommerhuset sommerhuset*) OR blockContent:(sommerhuset sommerhuset*)^5 OR blockContent:(sommerhuset sommerhuset*)) OR (nodeName:(sommerhusets sommerhusets*)^50 OR nodeName:(sommerhusets sommerhusets*) OR teaser:(sommerhusets sommerhusets*)^40 OR teaser:(sommerhusets sommerhusets*) OR blockContent:(sommerhusets sommerhusets*)^5 OR blockContent:(sommerhusets sommerhusets*)) OR (nodeName:(sommerhus' sommerhus'*)^50 OR nodeName:(sommerhus' sommerhus'*) OR teaser:(sommerhus' sommerhus'*)^40 OR teaser:(sommerhus' sommerhus'*) OR blockContent:(sommerhus' sommerhus'*)^5 OR blockContent:(sommerhus' sommerhus'*)) OR (nodeName:(sommerhusene sommerhusene*)^50 OR nodeName:(sommerhusene sommerhusene*) OR teaser:(sommerhusene sommerhusene*)^40 OR teaser:(sommerhusene sommerhusene*) OR blockContent:(sommerhusene sommerhusene*)^5 OR blockContent:(sommerhusene sommerhusene*)) OR (nodeName:(sommerhuse sommerhuse*)^50 OR nodeName:(sommerhuse sommerhuse*) OR teaser:(sommerhuse sommerhuse*)^40 OR teaser:(sommerhuse sommerhuse*) OR blockContent:(sommerhuse sommerhuse*)^5 OR blockContent:(sommerhuse sommerhuse*)) OR (nodeName:(sommerhuses sommerhuses*)^50 OR nodeName:(sommerhuses sommerhuses*) OR teaser:(sommerhuses sommerhuses*)^40 OR teaser:(sommerhuses sommerhuses*) OR blockContent:(sommerhuses sommerhuses*)^5 OR blockContent:(sommerhuses sommerhuses*)) OR (nodeName:(sommerhusenes sommerhusenes*)^50 OR nodeName:(sommerhusenes sommerhusenes*) OR teaser:(sommerhusenes sommerhusenes*)^40 OR teaser:(sommerhusenes sommerhusenes*) OR blockContent:(sommerhusenes sommerhusenes*)^5 OR blockContent:(sommerhusenes sommerhusenes*)))))

Since the query is supposed to be used for an Examine search, we can load that up as in the example below:

@using System.Globalization
@using System.Web.Hosting
@using Examine.Search
@using Skybrud.TextAnalysis.Hunspell
@using Skybrud.TextAnalysis.Hunspell.Expand

@{
   
    // Map the path to the dictionary and affix files
    string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/da-DK.dic");
    string aff = HostingEnvironment.MapPath("~/App_Data/Hunspell/da-DK.aff");

    // Load a new text analyzer (Hunspell wrapper)
    HunspellTextAnalyzer analyzer = HunspellTextAnalyzer.CreateFromFiles(dic, aff);

    // Expand the search phrase "sommerhus"
    HunspellExpandResult expandResult = analyzer.Expand("sommerhus");

    // Get a reference to the "ExternalIndex" index
    if (!ExamineManager.Instance.TryGetIndex("ExternalIndex", out IIndex index)) {
        <pre>Index not found!</pre>
        return;
    }

    // Get the searcher from the index
    ISearcher searcher = index.GetSearcher();
    if (searcher == null) {
        <pre>Seacher not found!</pre>
        return;
    }

    // Get the boolean operation representing the search to be made
    IBooleanOperation query = searcher
        .CreateQuery()
        .NativeQuery(expandResult.Query.ToRawQuery(new[] { "nodeName" }));

    // Execute the search query and get the results
    ISearchResults results = query.Execute();

}

This example illustrates how to set up an Examine based search using the Danish Hunspell dictionary. The Expand method takes care of the various Hunspell operations that I've explained in this article, so you don't really have to focus on those operations in your code.

As mentioned earlier, our implementation doesn't support morphing for other dictionaries, but the package will still do spell checking and find suggested words.

For the sake of this example, I'm using the singleton accessor of the of the ExamineManager class. Ideally you should find a way to inject an IExamineManager instance instead. And maybe not put all the logic in a Razor view.

Anders Bjerner

Anders Bjerner is an Umbraco MVP and System Developer at Limbo (formerly Skybrud.dk), with offices in Vejle and Copenhagen, Denmark. He has a background in Computer Science, and has been working with Umbraco since 2011. His work at Limbo typically consists of implementing Umbraco solutions for various government and private clients as well as developing custom packages (where a number of them can be found on the Umbraco Marketplace and NuGet). When not working and playing around with Umbraco, he can be found on his bike exploring the hills around his hometown (some of the steepest/highest in an otherwise flat Denmark).

comments powered by Disqus