Skip to content

sashalemon/Toxic

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Toxic (1.1.0)

Tools for Syndromic Surveillance using Social Media

Semantic distance

The semantic distance quantifies the difference in meaning between two pieces of text. It ranges between 0 and 1. Prior approaches considered the number of words in common. This does not account for sentences that use different words but mean the same thing.

Consider two strings, A and B each of which have n words. The semantic distance between A and B is the average semantic distance between all pairs of words from A and B. The semantic distance between two words is 0 if the words are the same. If the words are different, it is their path similarity on WordNet. Path similarity refers to the shortest path that connects the senses (hypernyms and hyponyms) of the two words.

Averaging accounts for length.

Pitfalls


Some words are not in WordNet. Some meanings of words are not in WordNet. These omissions bias semantic distance towards more established meanings of words, which suggests that the semantic distance is unreliable for texts full of slang or highly metaphorical language.

Despite these limitations, the semantic distance provides a useful and simple way to quantify meaning. These limitations, moreover, can be overcome by extending WordNet to account for new words or new meanings of words.

About

Tools for Toxicovigilance

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 75.9%
  • Ruby 24.0%
  • Clojure 0.1%