-
Notifications
You must be signed in to change notification settings - Fork 43
Homework 2
Jinho D. Choi edited this page Jan 26, 2017
·
9 revisions
Your task is to apply a minimum spanning tree algorithm to cluster similar words together in vector space.
- Download the word vectors.
- Create a complete graph create by connecting all word vectors, where the distances between word pairs are measured by the cosine distance (1 - cosine similarity).
- Find a minimum spanning tree from the complete graph by running Prim's algorithm.
- Install Graphviz.
- Save the minimum spanning tree as a
dot
file:sample.dot
. - Find and implement a way of breaking the minimum spanning tree into meaningful clusters. Describe your approach and show the results in your report.
Each line in the input file consists of a word and its vector representation. For instance, the first line consists of the word "New" (0th column) and its vector representation (1st - 50th columns).
- Compress your code and report into
hw2.zip
and submit it to: https://canvas.emory.edu/courses/29596/assignments/31890
Copyright © 2015-2019 Emory University - All Rights Reserved.