Skip to content

Hassan-Qureshi/arbisoft-tech-assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project Setup Instructions

  • Install, Create and Activate Virtual Environment

    $ pip install virtualenv

    $ virtualenv <ENV-NAME>

    $ ./<ENV-NAME>/Scripts/activate

  • Install dependencies from requirements.txt file

    $ pip install -r requirements.txt

  • The working directory should be src

  • The project is fully dynamic the directories, file names, URLs etc are stored in separate file which can be reference from any where inside the project.


Task Explanation

The task is divided into two sub-modules

  1. Scraping Module
  2. Data Analytics Module

Scraping Module

My approach was to scrap the first main page which contains list of all the universities and then traverse the universities list one by one, get href of respective university, and get the University Domain from second page which was bonus part.
The module is located at src.scrapper.Scrapper.py
The scrapped data will be at location output/scrapped-data/

Data Analytics Module

I used two transformations in order to present the data or information in better way.

  1. Total count of universities for each country
  2. Country-wise average ranking of universities

The data (for both transformation) always saves in CSV file format and located at src/output/analytical-data/raw-curated-data/.
I've presented the data in graphical format using matplotlib library. The resultant graphs can be viewed at location src/output/visutal-graph/ which was bonus task.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages