GitHub

File	Discription
groupby_parse_tokenize.ipynb	Groups the data by document. Parses the HTML text. Then it tokenize it. Creates the file: tokenized_data
stemming.ipynb	Removes stopwords from tokenized file and stems the words. Creates the file: stemmed_data
evaluate_query_expansion.ipynb	Computes the Normalized Discounted Cumulative Gain for the non stemmed corpus (tokenized_data) and stemmed corpus (stemmed_data)
query_expansion.ipynb	creates the expanded queries
results_q_expansion.ipynb	averages nDCG scores and plots them

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
evaluate_query_expansion.ipynb		evaluate_query_expansion.ipynb
groupby_parse_tokenize.ipynb		groupby_parse_tokenize.ipynb
query_expansion.ipynb		query_expansion.ipynb
results_q_expansion.ipynb		results_q_expansion.ipynb
stemming.ipynb		stemming.ipynb