Skip to content
Jinho D. Choi edited this page Jan 26, 2017 · 9 revisions

Distributional Semantics

Your task is to apply a minimum spanning tree algorithm to cluster similar words together in vector space.

  • Download the word vectors.
  • Create a complete graph create by connecting all word vectors, where the distances between word pairs are measured by the cosine distance (1 - cosine similarity).
  • Find a minimum spanning tree from the complete graph by running Prim's algorithm.
  • Install Graphviz.
  • Save the minimum spanning tree as a dot file: sample.dot.
  • Find and implement a way of breaking the minimum spanning tree into meaningful clusters. Describe your approach and show the results in your report.

Input Format

Each line in the input file consists of a word and its vector representation. For instance, the first line consists of the word "New" (0th column) and its vector representation (1st - 50th columns).

Submission

CS571: Natural Language Processing

Instructor


Emory University

Clone this wiki locally