GitHub - mandymchu/search-engine: Implemented a full-text search engine, the content-based ranking and the page rank algorithms.

Search Engine and Page Rank

Applied PageRank algorithm in a full-text search engine, which allow people to search a collection of up to 100,000 pages for a list of words, and which rank results according to how relevant the documents are to those words.

• Built a crawler to collect documents and follow links to others. (Beautiful Soup, urllib used)

• Set up a database for building the full-text index. The index is a list of all the different words along with the documents and their locations in the documents. (SQLite, sqlite3 used)

• Returned a ranked list of documents from a query. Implemented the content-based ranking and the PageRank algorithm.

• Created a neural network to change the ordering of results, which learns to associate searches with results based on people’s clicking habits.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitattributes		.gitattributes
README.md		README.md
experiment_results.txt		experiment_results.txt
neuralnetwork.py		neuralnetwork.py
run_all.py		run_all.py
searchengine.py		searchengine.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search Engine and Page Rank

About

Releases

Packages

Languages

mandymchu/search-engine

Folders and files

Latest commit

History

Repository files navigation

Search Engine and Page Rank

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages