Coder Social home page Coder Social logo

alan-ai-learner / resume-analyzer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from soumya997/resume-analyzer

0.0 0.0 0.0 14.85 MB

Resume Analyzer is a tool for recruiters which can help them to select candidates based on their resume and it also helps by providing a overall summary of the resume using which recruiters can know that individual in a more better way in less time.

Home Page: https://share.streamlit.io/soumya997/resume-analyzer/main/Resume_analyzer_app.py

License: MIT License

Jupyter Notebook 92.86% HTML 1.10% Python 6.01% Dockerfile 0.03%

resume-analyzer's Introduction

Open in Streamlit


Contributions Welcome
GitHub

Resume Analyzer is a tool for recruiters which can help them to select candidates based on their resume and it also helps by providing a overall summary of the resume using which recruiters can know that individual in a more better way in less time.

About the Project🌟:

The whole application is having two tools right now,

  1. Resume Score Generator
  2. Resume Summarizer

Resume Score Generator: Its a NLP classification problem usecase, where multiple resumes are taken and a certaing score(between 1 to 10) is assigned. And a classification model is trained to classify any resume between 1 to 10. and this score is the score what they get for their resume.Total data points was 300+.file name is resume_data2_(used in training).csv and its under data folder. Resume Summarizer: Custom NER is used to summarize any resume. Its done using spacy. Data for this provided in the data folder,file name is train_data.pkl. Total custom tagged data is 150+.

User Interface📱:

   

Project workflow🧾:

Data Preparation:

  1. I scraped app the sample resumes from overlife.com using the pdf scraper.py file.And parsed all the text from each resume. The resumes from this source is mostly for Engineering and programming field.And data quality is not so good.
  2. I took some data from here also. Most of the resume of this source is for software development and data analyst role.
  3. There was almost 300+ resumes(122 by scrapping and ~200 from the above repo), I did not get chance to label all the data to I randomly assigned some score from 1 to 10.
  4. Created a csv file combining all the data source.
  5. Data from above repo is already tagged for NER so i did not do that.

Model Building:

  1. For classificating I tried RNNs but as the dataset size was too less deep learning was working poor. I tried different ML models like random forest ,naive bayes classifier, random forest with RandomizedSearchCV. As we were having accurecy_score as the evaluation method so i went with random forest as it was giving more accuracy.(The dataset is not so balanced and upsampling, weighted baised approach can be applied and some different method of evaluation could be applied like recall or f1 but because of some time constraints I was unable to do that). Notebook is provided under Notebooks folder,file name is classification model training notebook.ipynb.

  2. For Custom NER I used Spacy to do that. As per the Spacy docs they used Convolutional layers with residual connections, layer normalization and maxout non-linearity are used,which giving much better efficiency than the standard BiLSTM solution.Source

Model Deployment:

  1. For that I went with flask 1st, but as the UI was not good so, finally i switched to streamlit. The python file for flask and streamlit bot are present in the repo.

  2. As it was a streamlit app, and as I just got the approval to use their deployment plateform from the streamlit team itself. So, I decided to use that. You can see the deployed app here.

  3. I have also Containerized the whole app using Docker. So you can that also to get the app locally.

Documentation:

In the form of readme I am providing the details of the project. Below, I also have provided that ditails explanation for the file structure and how you can run the application locally.

Future Improvements✊"

  1. Making the models more robust.As its not right now, because of some reason 1.1. Data is not labeled correctly. 1.2. Dataset is imbalanced,
df5['score'].value_counts()
1    47
7    39
9    35
3    33
5    32
0    32
4    31
8    28
2    22
6    20
Name: score, dtype: int64

1.3. Adding More data in the dataset for both the task.

  1. Adding a QnA based model for easy query search option. As it will provide the user to make some query in the form of a question and extract answer in the form of model output. It will help people to search specific things from the resume.

  2. Migrate the webapp from sreamlit to flask.Add some good UI.


NOTE: If you can implement any of the above mentioned feature, please feel free to make a PR. Except, that if you have any problem understanding the above mentioned features feel free to creat an issue.

File Structure📂:

File/Folder Name Usage of that file/folder
Notebooks Data collection,Model training every thing is done in the ipynbs,file names are self explanatory so, you will be understand their usage
data All the CSV and the tagged data is provided here
data/resume_data2_(used in training).csv is used for classification
data/train_data.pkl Used for NER
rf_score_model.pkl/tfidf_vectorizer.pkl Used for classification model training
Resume_analyzer_app.py is the streamlit app
resume_app_main_flask.py flask app
pdf scraper.py for scrap pdfs
Dockerfile is the Dockerfile

Note: if you dont get the file structure currect feel free to make an issue.

Run Locally💻:

  1. Run Locally:

1.1 git clone <repo link>

1.2 cd Resume-analyzer

1.3 pip install -r requirements.txt

1.4 streamlit run <file_name>

Connect with me If you need any help🤝:


Twitter Follow xtenzq GitHub followers Gmail


resume-analyzer's People

Contributors

soumya997 avatar imgbotapp avatar streamlit-badge-bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.