Resume Analyzer is a tool for recruiters which can help them to select candidates based on their resume and it also helps by providing a overall summary of the resume using which recruiters can know that individual in a more better way in less time.

About the Project🌟:

The whole application is having two tools right now,

Resume Score Generator
Resume Summarizer

Resume Score Generator: Its a NLP classification problem usecase, where multiple resumes are taken and a certaing score(between 1 to 10) is assigned. And a classification model is trained to classify any resume between 1 to 10. and this score is the score what they get for their resume.Total data points was 300+.file name is resume_data2_(used in training).csv and its under data folder. Resume Summarizer: Custom NER is used to summarize any resume. Its done using spacy. Data for this provided in the data folder,file name is train_data.pkl. Total custom tagged data is 150+.

User Interface📱:

Project workflow🧾:

Data Preparation:

I scraped app the sample resumes from overlife.com using the pdf scraper.py file.And parsed all the text from each resume. The resumes from this source is mostly for Engineering and programming field.And data quality is not so good.
I took some data from here also. Most of the resume of this source is for software development and data analyst role.
There was almost 300+ resumes(122 by scrapping and ~200 from the above repo), I did not get chance to label all the data to I randomly assigned some score from 1 to 10.
Created a csv file combining all the data source.
Data from above repo is already tagged for NER so i did not do that.

Model Building:

For classificating I tried RNNs but as the dataset size was too less deep learning was working poor. I tried different ML models like random forest ,naive bayes classifier, random forest with RandomizedSearchCV. As we were having accurecy_score as the evaluation method so i went with random forest as it was giving more accuracy.(The dataset is not so balanced and upsampling, weighted baised approach can be applied and some different method of evaluation could be applied like recall or f1 but because of some time constraints I was unable to do that). Notebook is provided under Notebooks folder,file name is classification model training notebook.ipynb.
For Custom NER I used Spacy to do that. As per the Spacy docs they used Convolutional layers with residual connections, layer normalization and maxout non-linearity are used,which giving much better efficiency than the standard BiLSTM solution.Source

Model Deployment:

For that I went with flask 1st, but as the UI was not good so, finally i switched to streamlit. The python file for flask and streamlit bot are present in the repo.
As it was a streamlit app, and as I just got the approval to use their deployment plateform from the streamlit team itself. So, I decided to use that. You can see the deployed app here.
I have also Containerized the whole app using Docker. So you can that also to get the app locally.

Documentation:

In the form of readme I am providing the details of the project. Below, I also have provided that ditails explanation for the file structure and how you can run the application locally.

Future Improvements✊"

Making the models more robust.As its not right now, because of some reason 1.1. Data is not labeled correctly. 1.2. Dataset is imbalanced,

df5['score'].value_counts()

1    47
7    39
9    35
3    33
5    32
0    32
4    31
8    28
2    22
6    20
Name: score, dtype: int64

1.3. Adding More data in the dataset for both the task.

Adding a QnA based model for easy query search option. As it will provide the user to make some query in the form of a question and extract answer in the form of model output. It will help people to search specific things from the resume.
Migrate the webapp from sreamlit to flask.Add some good UI.

NOTE: If you can implement any of the above mentioned feature, please feel free to make a PR. Except, that if you have any problem understanding the above mentioned features feel free to creat an issue.

File Structure📂:

File/Folder Name	Usage of that file/folder
Notebooks	Data collection,Model training every thing is done in the ipynbs,file names are self explanatory so, you will be understand their usage
data	All the CSV and the tagged data is provided here
data/resume_data2_(used in training).csv	is used for classification
data/train_data.pkl	Used for NER
rf_score_model.pkl/tfidf_vectorizer.pkl	Used for classification model training
Resume_analyzer_app.py	is the streamlit app
resume_app_main_flask.py	flask app
pdf scraper.py	for scrap pdfs
Dockerfile	is the Dockerfile

Note: if you dont get the file structure currect feel free to make an issue.

Run Locally💻:

Run Locally:

1.1 git clone <repo link>

1.2 cd Resume-analyzer

1.3 pip install -r requirements.txt

1.4 streamlit run <file_name>

Connect with me If you need any help🤝:

alan-ai-learner / resume-analyzer Goto Github PK

resume-analyzer's Introduction

About the Project🌟:

User Interface📱:

Project workflow🧾:

Data Preparation:

Model Building:

Model Deployment:

Documentation:

Future Improvements✊"

File Structure📂:

Run Locally💻:

Connect with me If you need any help🤝:

resume-analyzer's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent