Correct misspelled entities #12804
Replies: 1 comment 1 reply
-
Hi! That sounds like a tough challenge. In the spaCy universe, I only see something to correct spell checking, but I'm not convinced that that's going to solve your problem (you could have a look, though). In a very naieve approach, if you have an NER model that works, perhaps you can look at something simple like edit distance to identify likely candidates that were transcribed wrong? Like if you have "Swissquote" 5 times in the text, and "Swisscode" once, you could make the hypothesis that they all should have been "Swissquote". But it'll be tricky, because named entities really can be quite different & distinct with small edit distance, especially if the names are short, like with abbreviations. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am working with transcribed text. The general quality of the transcription is excellent but it contains a large number of misspelled entity names that are crucial for me.
For instance "Swisscode" or "Swiss Gold" instead of the correct "Swissquote" or "Alex Hormital" instead of the correct "ArcelorMittal".
Prior to reinventing the wheel, I was wondering if anyone would be aware of an existing NER tool that could correct such mistakes?
Otherwise, any tips on the best approach for implementing such solution would be greatly appreciated.
Best,
Ed
Beta Was this translation helpful? Give feedback.
All reactions