Applied PageRank algorithm in a full-text search engine, which allow people to search a collection of up to 100,000 pages for a list of words, and which rank results according to how relevant the documents are to those words.
• Built a crawler to collect documents and follow links to others. (Beautiful Soup, urllib used)
• Set up a database for building the full-text index. The index is a list of all the different words along with the documents and their locations in the documents. (SQLite, sqlite3 used)
• Returned a ranked list of documents from a query. Implemented the content-based ranking and the PageRank algorithm.
• Created a neural network to change the ordering of results, which learns to associate searches with results based on people’s clicking habits.