Coder Social home page Coder Social logo

automation-of-ranking-resumes's Introduction

Automation-of-Ranking-Resumes

Problem Statement

  • Many institutions and companies use resumes as a way of short-listing candidates in preliminary rounds.
  • Going through every resume and short-listing is an arduous process and in many instances, it is prone to human errors.
  • This project aims to automate this short-listing process by retrieval of relevant resumes to a given job description(JD).

Dataset

  • Resumes and Job descriptions are collected from indeed.com and linkedIn.
  • Also job reviews are scraped from ambitionbox.com. Also data related to the selected candidates are available which is used for further analysis

Tools

  • Django framework is used to build the user interface.
  • BeautifulSoup in python is used for scrapping documents from websites.
  • NLTk and Spacy libraries are use to process documents.
  • PDFMiner.six is used to convert pdf to text.

Implementation Details:

The project is developed in 3 phases.

Phase 1

  • Data collected from resumes and job sections of Naukri.com and linkedIn.
  • Data Cleaning
  • Dictionary and Posting list are created.
  • Then each resume is vectorized.
  • Building search engine.
  • Query is Job description given by the company
  • Similarity score is computed between job description and resumes.
  • Finally, search engine will rank and display the suitable candidates.
  • Scope for improvement: incorporating NLP techniques like NER into the above model and improving on ranking methods.

Phase 2

  • Vectorizing:

    • With the help of indexing we build document vectors using tf-idf approach(both tf and idf normalized).
    • We get a 570*3500 dimenstional matrix where 570 is the no of documents and 3500 is the dimension of each vector.
  • Querying:

    • Here query is of two types. We can either query documents based on keywords(eg: skillset,college,etc) or based on job description document.
    • For the latter approach we have to upload JD do all the pre-processing (as in phase1) and then vectorize the query.
    • We integrated it in django framework.
  • Ranking:

    • For ranking initially cosine similarity is used to retrieve top N documents.
    • List of selected candidates is available for cross verification. Also Binomial model was used to rank the documents.
    • Both the ranking methods were evaluated using precision@R metric.
  • User interface:

    • UI for querying is built with the help of django framework.
    • Database for resume uploading is also created.

Phase 3

In phase 1 and phase 2 we build a basic search engine to retrieve the resumes which matches the JD. In Phase 3 we improved the model by working on 3 major aspects: Vectorization, Information extraction and an ensemble ranking approach.

  • Vectorization:

    • Words are vectorized using GloVe vectorization.
    • GloVe model is built by taking both global statistics and concurent word similarities.
    • So it is a better fit than the tf-idf approach.
  • Information Extraction:

    • For the information extraction part we used Named Entity Recognition to extract skillset info from resume.
    • For this we built a custom corpus and trained our own model so as to suit the context of our dataset.
    • This will help us to capture better features.
  • Ranking:

    • Finally for the ranking we used a combined strategy of classification and ranking.
    • Classification is sone using Naive Bayes, SVM and RNN models and for ranking BM25 and cosine similarity measures were taken.
    • For a ranking, documents are chosen based on max voting of the classifiers and then the selected documents are ranking based on the average of the 2 ranking scores.
  • Evaluation:

    • Models are evaluated using precision @R
    • For the base model we got precision@R value of 32%
    • After adapting research paper and implementing them, we improved precision@R to 71%.

Setting up the Model:

  • Create a virtual environment for installing dependencies. creating venv
  • Install dependencies from requirements.txt
  • Run the djando framework. run django

automation-of-ranking-resumes's People

Contributors

nehalmuthu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.