This repository contains a Candidate Selection Model "HireSense" designed to parse resumes and job descriptions (JDs) into JSON format using fine-tuned NER models, and calculate the similarity between them using embeddings generated by the T5 model. 🌐
The project includes the following Python notebooks:
-
📄 jdnertrainnotebooknew
- Contains the training code for the Job Description Parser Model.
-
📄 resumenertrainnotebooknew
- Contains the training code for the Resume Parser Model.
-
📄 finalbyopnew
- Loads the trained models and demonstrates the process of parsing resumes and JDs into JSON format.
- Generates embeddings using the T5 model.
- Calculates the similarity of an Electrical Engineer JD with resumes of an Electrical Engineer, Animator, and Bar Tender.
To process resumes, use the following format:
text = text_preprocess(pdf_load("path of the resume"))
ensemble = EnsembleNERResume(
pretrained_model_path="/kaggle/input/vvvvvvvvvvvvv/resume_ner_model_pickle.pkl",
device="cuda" if torch.cuda.is_available() else "cpu"
)
predictions = ensemble.predict(text)
parsed_resume = group_entities_unique_resume(predictions)
Here, parsed_resume
contains the parsed resume in JSON format. 🔐
To process JDs, use the following format:
jd = text_preprocess("type your jd here")
ensemble = EnsembleNERJd(
pretrained_model_path="/kaggle/input/vvvvvvvvvvvvv/jd_ner_model_pickle.pkl",
device="cuda" if torch.cuda.is_available() else "cpu"
)
predictionsjd = ensemble.predict(jd)
parsed_jd = group_entities_unique_jd(predictionsjd)
Here, parsed_jd
contains the parsed JD in JSON format. 🔐
Upload the parsed_resume
and parsed_jd
into the eval()
function in the third notebook to calculate the similarity score. 🔎
- The repository includes training data for testing the model.
- Resumes are provided in PDF format. 📄
- JDs are provided in TXT format. 📄
-
Clone this repository: 🔧
git clone <repository_url> cd <repository_folder>
-
Open the desired notebook in Jupyter Notebook, JupyterLab, or any compatible IDE. 🌐
-
Follow the instructions provided in each notebook to:
- Load and parse resumes and JDs. 🔑
- Calculate similarity scores. 🔍
-
Ensure that the required dependencies are installed:
pip install -r requirements.txt
-
For the third notebook, upload the resumes and JDs for which embeddings need to be calculated, and follow the prescribed formats to process them. 🔄
- The embeddings are generated using a T5 model. ✨
- Ensure that the pretrained model paths for
resume_ner_model_pickle.pkl
andjd_ner_model_pickle.pkl
are correctly specified. - Use the provided training data to test the model's functionality. 📚
This project simplifies the candidate selection process by automating the parsing of resumes and job descriptions and computing their similarity effectively. 🚀
Enjoy exploring and enhancing the model! 😊