-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
31 lines (21 loc) · 951 Bytes
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Copyright 2004, 2013 Vishwas Bhat, Apurva Pangam, Tarun Makhija, Vineet Jalali
Multipurpose Internet crawler
-----------------------------
Purpose:
-------
Create a knowledge base on a particular domain like mathematics.
Using set of keywords for the required domain, crawl the Internet for sites containing the keywords and store the relevant pages locally. Use MD5 hash to prevent storing the same page again.
Execute the program:
-------------------
./agentxml.py
Limitations of the crawler:
--------------------------
1. Does not handle authentication websites
2. Works for only http sites
Required improvements:
---------------------
1. Reorganisation of code
2. Improve storage of captured sites
Credits:
-------
The crawler was created as part of a student project at the Homi Bhabh Center for Science Education under the guidance of Dr. Nagarjuna. We would like to thank him for his support and direction in creating this project.