lunr.de demo with unexpected result for umlauts #41

derplakatankleber · 2017-09-14T13:43:22Z

I tried your demosite "demo-browser-require.html", but I don't understand the results.

tests:
console.log('Search for günstige: ', idx.search('günstige'));// expected resultsize: 1, result: 1
console.log('Search for günstig*: ', idx.search('günstig*'));// expected resultsize: 1, result: 0
console.log('Search for g*nstig*: ', idx.search('g*nstig*'));// expected resultsize: 1, result: 1

source: https://rawgit.com/MihaiValentin/lunr-languages/master/demos/demo-browser-require.html

Did I missunderstood, how to search for words with umlauts, or is it not possible to search with wildcards for words with umlauts?

The text was updated successfully, but these errors were encountered:

khawkins98 · 2021-05-26T10:04:41Z

I also noticed this.

In #66 the approach of replacing:

lunr.de.wordCharacters = "A-Za-züÜÄäÖöß0-9";

Fixes wildcard support.

jonex2 · 2021-10-15T10:16:08Z

workaround with

lunr.de.wordCharacters = "A-Za-züÜÄäÖöß0-9";

did not work.
Opened a new issue

khawkins98 · 2021-10-15T10:21:25Z

I also wound up changing approaches. I can dig up my code, but I believe what I did was:

Convert the umlaut character to their ae, ue versions
Do the same for the passed search string

khawkins98 · 2021-10-15T10:26:04Z

Here it is: I basically create a mirror search index without international characters so the user gets success if they use ü or u

// receive a set of text and replace diacritics
// it's a poor man's multi-lingual
function normalizeText(searchIndex) {
  function replaceCharacters(string) {
    var string = string || "";
    // handle some common international string as fuzzy english
    string = string.replace(/\u00c4/g, "A");
    string = string.replace(/\u00dc/g, "U");
    string = string.replace(/\u00d6/g, "O");
    string = string.replace(/\u00fc/g, "u");
    string = string.replace(/\u00e4/g, "a");
    string = string.replace(/\u00f6/g, "o");
    string = string.replace(/\u00df/g, "s");
    string = string.replace(/ae/g, "a");
    string = string.replace(/ue/g, "u");
    string = string.replace(/oe/g, "o");
    string = string.replace(/ss/g, "s");
    string = string.replace(/Ã¡/g, "a");

    return string;
  }
  for (const item in searchIndex) {
    if (Object.hasOwnProperty.call(searchIndex, item)) {
      searchIndex[item].multilingualAlternate = replaceCharacters(searchIndex[item].lastName);
      searchIndex[item].multilingualAlternate += " " + replaceCharacters(searchIndex[item].firstName);
    }
  }
  return searchIndex;
}

I'm sure it's terrible for performance, but for our use case the dataset was small enough that it didn't matter.

jonex2 · 2021-10-15T10:35:53Z

@khawkins98
Thank you very much for the quick answer and your new workaround!

derplakatankleber closed this as completed Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lunr.de demo with unexpected result for umlauts #41

lunr.de demo with unexpected result for umlauts #41

derplakatankleber commented Sep 14, 2017

khawkins98 commented May 26, 2021

jonex2 commented Oct 15, 2021

khawkins98 commented Oct 15, 2021

khawkins98 commented Oct 15, 2021 •

edited

Loading

jonex2 commented Oct 15, 2021

lunr.de demo with unexpected result for umlauts #41

lunr.de demo with unexpected result for umlauts #41

Comments

derplakatankleber commented Sep 14, 2017

khawkins98 commented May 26, 2021

jonex2 commented Oct 15, 2021

khawkins98 commented Oct 15, 2021

khawkins98 commented Oct 15, 2021 • edited Loading

jonex2 commented Oct 15, 2021

khawkins98 commented Oct 15, 2021 •

edited

Loading