Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a script for scraping Mumbai University cutoffs #1

Open
Sohammhatre10 opened this issue Oct 11, 2024 · 9 comments
Open

Write a script for scraping Mumbai University cutoffs #1

Sohammhatre10 opened this issue Oct 11, 2024 · 9 comments
Assignees
Labels

Comments

@Sohammhatre10
Copy link
Owner

Requirements are -

  1. A selenium bot for scraping Mumbai University's 2024 cutoff data and automation for logging
  2. Ease of use and clean code, add a doc string for everything
@gaurav-rm11
Copy link

assign me

@Sohammhatre10
Copy link
Owner Author

Will be assigning you this one first as I need a sequence for tracking the progression

@gaurav-rm11
Copy link

i had a doubt. for admissions in maharashtra, there is no web page for cutoffs to scrape from. CET CELL provide pdf documents for it. so for that maybe using NLP would be a better option i guess.

@Sohammhatre10
Copy link
Owner Author

Sohammhatre10 commented Oct 13, 2024

You'll have to use pytessaract or llm parsers for scrapping through the pdfs. @gaurav-rm11

@Sohammhatre10
Copy link
Owner Author

@gaurav-rm11 any progress here?

@gaurav-rm11
Copy link

ive used pdfplumber to extract data from the pdf. but i it works on downloaded pdf and generate a csv file. will that do? then ill raise a PR.

@Sohammhatre10
Copy link
Owner Author

Sounds good to me just wanted that data from the huge round pdf to be on the database. Raise a PR after that I'll check for any issues and inform you about them. Thanks for the update tho!

@Sohammhatre10
Copy link
Owner Author

@gaurav-rm11 you may use external web sources like Shiksha too for the same.
The csv files must have the columns
College, Branch, Quota, Category, Gender, OpenRank, CloseRank

Example for the IITmain.csv file
Indian Institute of Technology Bhubaneswar, "Civil Engineering (4 Years, Bachelor of Technology)", AI ,OPEN, Gender-Neutral, 9106, 14782

@Sohammhatre10
Copy link
Owner Author

@gaurav-rm11 any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants