Skip to content

Commit

Permalink
Merge pull request #6 from alexharri/correctness
Browse files Browse the repository at this point in the history
Document Correctness
  • Loading branch information
alexharri authored Nov 14, 2022
2 parents 3a61b45 + 509ddc2 commit b59e4f1
Show file tree
Hide file tree
Showing 3 changed files with 88 additions and 3 deletions.
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ applyCase("þgf", "Helga Fríða Smáradóttir");
- [Usage](#Usage)
- [Cases](#Cases)
- [Whitespace](#Whitespace)
- [Correctness](#Correctness)
- [Passing a name in the wrong case](Passing_a_name_in_the_wrong_case)
- [What happens if beygla does not find a pattern?](What_happens_if_beygla_does_not_find_a_pattern)

---

Expand Down Expand Up @@ -168,3 +171,39 @@ If the name includes superfluous whitespace, `applyCase` removes it.
applyCase("þgf", " \n Helga Dís\tSmáradóttir \n\n");
//=> "Helgu Dís Smáradóttur"
```

<h2 id="Correctness">
Correctness
</h2>

Beygla will correctly apply the desired case to the input name in most cases.

Most Icelandic names (81%), especially common ones, are present on [bin.arnastofnun.is](https://bin.arnastofnun.is/gogn/). Beygla is guaranteed to produce a correct result for those names.

This does not mean that Beygla produces an incorrect result for the other 19% of names. Beygla finds patterns in name endings based on the data on [bin.arnastofnun.is](https://bin.arnastofnun.is/gogn/) and applies those patterns to any input name. This means that beygla will produce a correct result for most names, even if the name is not in the dataset from [bin.arnastofnun.is](https://bin.arnastofnun.is/gogn/).

I tried randomly sampling 20 names from the list of legal Icelandic names not present in [bin.arnastofnun.is](https://bin.arnastofnun.is/gogn/):

* 14 names matched a pattern with the correct result
* 6 names matched no pattern
* 0 names matched a pattern with an incorrect result

Even though I happened to get no incorrect results, this is a very small sample. I'm absolutely certain that there are a handful of names that will produce incorrect results.

See [beygla.spec.ts](https://github.com/alexharri/beygla/blob/master/lib/beygla.spec.ts).


<h3 id="Passing_a_name_in_the_wrong_case">
Passing a name in the wrong case
</h3>

Beygla operates on the assumption that names provided to it are in the nominative case (nefnifall). If a name provided to beygla is in another case than nominative, an incorrect result is extremely likely.


<h3 id="What_happens_if_beygla_does_not_find_a_pattern">
What happens if beygla does not find a pattern?
</h3>

Given a name that has an ending that beygla does not recognize, it will not apply the case to the name.

Do note that beygla attempts to apply the case to every name (first, last, and middle name) in a full name individually. This means that some names in a full name might have a case applied, and some not.
46 changes: 43 additions & 3 deletions lib/beygla.spec.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import { applyCase as _applyCase } from "./beygla";
import * as _beygla from "./beygla";
import serializedInput from "./read/serializedInput";
import groupedNames from "../out/grouped-names.json";

let applyCase = _applyCase;
let beygla = _beygla;

const testingBuild = process.env.TEST_BUILD === "true";
if (testingBuild) {
Expand All @@ -11,9 +11,11 @@ if (testingBuild) {
// on the build output.
console.log("Testing built module.");

applyCase = require("../dist/beygla.esm.js").applyCase;
beygla = require("../dist/beygla.esm.js");
}

const { applyCase, getDeclensionForName } = beygla;

jest.mock("./read/serializedInput", () => {
const fs = require("fs");
const path = require("path");
Expand Down Expand Up @@ -116,4 +118,42 @@ describe("applyCase", () => {
expect(son).toEqual("syni");
expect(dottir).toEqual("dóttur");
});

it("finds correct declension for some unknown names", () => {
const tests: Array<[name: string, declension: string]> = [
["Sotti", "1;i,a,a,a"],
["Sófía", "1;a,u,u,u"],
["Kórekur", "2;ur,,i,s"],
["Olivia", "1;a,u,u,u"],
["Caritas", "0;,,,ar"],
["Hávarr", "1;r,,i,s"],
["Ermenga", "1;a,u,u,u"],
["Fannþór", "0;,,i,s"],
["Ísbrá", "0;,,,r"],
["Sófús", "0;,,i,ar"],
["Kristólín", "0;,,,ar"],
["Jasper", "0;,,,s"],
["Rúnel", "0;,,i,s"],
["Agok", "0;,,i,s"],
];

for (const [name, declension] of tests) {
expect(getDeclensionForName(name)).toEqual(declension);
}
});

it("does not find a declension for some unknown names", () => {
const tests: string[] = [
"Emanuel",
"Frederik",
"Evan",
"Lennon",
"Artemis",
"Kaín",
];

for (const name of tests) {
expect(getDeclensionForName(name)).toEqual(null);
}
});
});
6 changes: 6 additions & 0 deletions lib/beygla.ts
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,9 @@ export function applyCase(caseStr: Case, name: string): string {
const names = name.split(/\s+/).filter(Boolean);
return names.map((name) => applyCaseToName(caseStr, name)).join(" ");
}

export function getDeclensionForName(name: string): string | null {
if (name.split(/\s+/).length > 1)
throw new Error("Name must not include whitespace");
return extractDeclension(trie, name);
}

0 comments on commit b59e4f1

Please sign in to comment.