NLP based approach to automatically categorize your bookmarks!
To understand this project in-depth, refer to my technical paper: Bookmark Classification using Multinomial Naive Bayes Model
- Enter your bookmarks in
./links.json
file - To run the code, run
categorize.py
scrape_filter_link.py
contains the classes used to scrape information from each URL
It can categorize a variety of bookmarks. Currently it supports all the categories mentioned in the ./corpus/
directory.
To a certain extent! For example: Firefox allows users to backup the bookmarks in a JSON format. You can extract the uri
from that JSON file and feed it into ./links.json
.
To backup your bookmarks in Firefox, press Ctrl+Shift+O, go to Import and Backup
and then to Backup
.
Chrome users can check this post on superuser.
No, the mapping of a URL with it's appropriate category is stored in a JSON file: result.json
, in a dict format.
The keys are your bookmarks with values being their categories.
Sure, here's one (The highlighted part is the one stored in result.json
):
Yes, you can! The code is fairly scalable.
To add your own corpuses:
- Create a directory with a unique category name in
./corpus/
- Inside the
./corpus/your-category-dir
add your corpus text in a JSON file with the format:{"text": "_your_corpus_text_here_"}
(NOTE: You can add multiple JSON files in a category directory)
When you run the code, you will find that the categorize.py
will take the new/modified corpuses into consideration.
The code is under MIT License