Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lunr.de demo with unexpected result for umlauts #41

Closed
derplakatankleber opened this issue Sep 14, 2017 · 5 comments
Closed

lunr.de demo with unexpected result for umlauts #41

derplakatankleber opened this issue Sep 14, 2017 · 5 comments

Comments

@derplakatankleber
Copy link

I tried your demosite "demo-browser-require.html", but I don't understand the results.

tests:
console.log('Search for günstige: ', idx.search('günstige'));// expected resultsize: 1, result: 1
console.log('Search for günstig*: ', idx.search('günstig*'));// expected resultsize: 1, result: 0
console.log('Search for g*nstig*: ', idx.search('g*nstig*'));// expected resultsize: 1, result: 1

source: https://rawgit.com/MihaiValentin/lunr-languages/master/demos/demo-browser-require.html

Did I missunderstood, how to search for words with umlauts, or is it not possible to search with wildcards for words with umlauts?

@khawkins98
Copy link

I also noticed this.

In #66 the approach of replacing:

lunr.de.wordCharacters = "A-Za-züÜÄäÖöß0-9";

Fixes wildcard support.

@jonex2
Copy link

jonex2 commented Oct 15, 2021

workaround with

lunr.de.wordCharacters = "A-Za-züÜÄäÖöß0-9";

did not work.
Opened a new issue

@khawkins98
Copy link

I also wound up changing approaches. I can dig up my code, but I believe what I did was:

  1. Convert the umlaut character to their ae, ue versions
  2. Do the same for the passed search string

@khawkins98
Copy link

khawkins98 commented Oct 15, 2021

Here it is: I basically create a mirror search index without international characters so the user gets success if they use ü or u

// receive a set of text and replace diacritics
// it's a poor man's multi-lingual
function normalizeText(searchIndex) {
  function replaceCharacters(string) {
    var string = string || "";
    // handle some common international string as fuzzy english
    string = string.replace(/\u00c4/g, "A");
    string = string.replace(/\u00dc/g, "U");
    string = string.replace(/\u00d6/g, "O");
    string = string.replace(/\u00fc/g, "u");
    string = string.replace(/\u00e4/g, "a");
    string = string.replace(/\u00f6/g, "o");
    string = string.replace(/\u00df/g, "s");
    string = string.replace(/ae/g, "a");
    string = string.replace(/ue/g, "u");
    string = string.replace(/oe/g, "o");
    string = string.replace(/ss/g, "s");
    string = string.replace(/á/g, "a");

    return string;
  }
  for (const item in searchIndex) {
    if (Object.hasOwnProperty.call(searchIndex, item)) {
      searchIndex[item].multilingualAlternate = replaceCharacters(searchIndex[item].lastName);
      searchIndex[item].multilingualAlternate += " " + replaceCharacters(searchIndex[item].firstName);
    }
  }
  return searchIndex;
}

I'm sure it's terrible for performance, but for our use case the dataset was small enough that it didn't matter.

@jonex2
Copy link

jonex2 commented Oct 15, 2021

@khawkins98
Thank you very much for the quick answer and your new workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants