-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write a script for scraping Mumbai University cutoffs #1
Comments
assign me |
Will be assigning you this one first as I need a sequence for tracking the progression |
i had a doubt. for admissions in maharashtra, there is no web page for cutoffs to scrape from. CET CELL provide pdf documents for it. so for that maybe using NLP would be a better option i guess. |
You'll have to use pytessaract or llm parsers for scrapping through the pdfs. @gaurav-rm11 |
@gaurav-rm11 any progress here? |
ive used pdfplumber to extract data from the pdf. but i it works on downloaded pdf and generate a csv file. will that do? then ill raise a PR. |
Sounds good to me just wanted that data from the huge round pdf to be on the database. Raise a PR after that I'll check for any issues and inform you about them. Thanks for the update tho! |
@gaurav-rm11 you may use external web sources like Shiksha too for the same. Example for the IITmain.csv file |
@gaurav-rm11 any updates? |
Requirements are -
The text was updated successfully, but these errors were encountered: