String Comparison for C#.NET
StringComparison is a library developed for reconciling naming conventions between different models of the electric grid.
I have stripped off the power system specific code and put together what can effectively be used as a string extension for determining approximate equality between two strings.
All of the algorithms used here have been pulled from online resources, translated into C#, and compiled into this library.
I found several other similar open-source implementations around but nothing for .NET/C#. Adding the *.dll to your project will give you access to this extension and the individual extensions under the hood of the IsSimilarity()
extension.
- Hamming Distance
- Jaccard Distance
- Jaro Distance
- Jaro-Winkler Distance
- Levenshtein Distance
- Longest Common Subsequence
- Longest Common Substring
- Overlap Coefficient
- Ratcliff-Obershelp Similarity
- Sorensen-Dice Distance
- Tanimoto Coefficient
While all of the algorithms are exposed and can be used and can provide their raw results,
they have been conveniently combined in a way that they can selectively be used to judge the approximate equality of two strings.
This is done through the IsSimilar
extension and by setting the desired StringComparisonOptions
and StringComparisonTolerance
.
For two strings that are desired to be compared approximately, a boolean response of equality can be garnered in the following way:
Download last release https://github.com/Behzadkhosravifar/StringComparison/releases
or install from NuGet https://www.nuget.org/packages/StringComparison. To install run the following command in the Package Manager Console
Install-Package StringComparison
string source = "behzad";
string target = "behsad";
var options = new List<StringComparisonOptions>();
// Choose which algorithms should weigh in for the comparison
options.Add(StringComparisonOptions.UseOverlapCoefficient);
options.Add(StringComparisonOptions.UseLongestCommonSubsequence);
options.Add(StringComparisonOptions.UseLongestCommonSubstring);
// Choose the relative strength of the comparison - is it almost exactly equal? or is it just close?
var tolerance = StringComparisonTolerance.Strong;
// Get a boolean determination of approximate equality
bool result = source.IsSimilar(target, options, tolerance);
double howManySimilar = source.SimilarityPercent(target, options);
double simLevenshtein = 1 - source.LevenshteinDistancePercentage(target);
double simJaro = 1 - source.JaroDistance(target);
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request :)