omkarpathak / pyresparser Goto Github PK

A simple resume parser used for extracting information from resumes

License: GNU General Public License v3.0

Python 100.00%

resume-parser resume python python3 nlp parser machine-learning natural-language-processing resumes parsers

pyresparser's Introduction

pyresparser

A simple resume parser used for extracting information from resumes

Built with ❤︎ and ☕ by Omkar Pathak

Features

Extract name
Extract email
Extract mobile numbers
Extract skills
Extract total experience
Extract college name
Extract degree
Extract designation
Extract company names

Installation

You can install this package using

pip install pyresparser

For NLP operations we use spacy and nltk. Install them using below commands:

# spaCy
python -m spacy download en_core_web_sm

# nltk
python -m nltk.downloader words
python -m nltk.downloader stopwords

Documentation

Official documentation is available at: https://www.omkarpathak.in/pyresparser/

Supported File Formats

PDF and DOCx files are supported on all Operating Systems
If you want to extract DOC files you can install textract for your OS (Linux, MacOS)
Note: You just have to install textract (and nothing else) and doc files will get parsed easily

Usage

Import it in your Python project

from pyresparser import ResumeParser
data = ResumeParser('/path/to/resume/file').get_extracted_data()

CLI

For running the resume extractor you can also use the cli provided

usage: pyresparser [-h] [-f FILE] [-d DIRECTORY] [-r REMOTEFILE]
                   [-re CUSTOM_REGEX] [-sf SKILLSFILE] [-e EXPORT_FORMAT]

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  resume file to be extracted
  -d DIRECTORY, --directory DIRECTORY
                        directory containing all the resumes to be extracted
  -r REMOTEFILE, --remotefile REMOTEFILE
                        remote path for resume file to be extracted
  -re CUSTOM_REGEX, --custom-regex CUSTOM_REGEX
                        custom regex for parsing mobile numbers
  -sf SKILLSFILE, --skillsfile SKILLSFILE
                        custom skills CSV file against which skills are
                        searched for
  -e EXPORT_FORMAT, --export-format EXPORT_FORMAT
                        the information export format (json)

Notes:

If you are running the app on windows, then you can only extract .docs and .pdf files

Result

The module would return a list of dictionary objects with result as follows:

[
  {
    'college_name': ['Marathwada Mitra Mandal’s College of Engineering'],
    'company_names': None,
    'degree': ['B.E. IN COMPUTER ENGINEERING'],
    'designation': ['Manager',
                    'TECHNICAL CONTENT WRITER',
                    'DATA ENGINEER'],
    'email': '[email protected]',
    'mobile_number': '8087996634',
    'name': 'Omkar Pathak',
    'no_of_pages': 3,
    'skills': ['Operating systems',
              'Linux',
              'Github',
              'Testing',
              'Content',
              'Automation',
              'Python',
              'Css',
              'Website',
              'Django',
              'Opencv',
              'Programming',
              'C',
              ...],
    'total_experience': 1.83
  }
]

References that helped me get here

Some of the core concepts behind the algorithm have been taken from https://github.com/divapriya/Language_Processing which has been summed up in this blog https://medium.com/@divalicious.priya/information-extraction-from-cv-acec216c3f48. Thanks to Priya for sharing this concept
https://www.kaggle.com/nirant/hitchhiker-s-guide-to-nlp-in-spacy
https://www.analyticsvidhya.com/blog/2017/04/natural-language-processing-made-easy-using-spacy-%E2%80%8Bin-python/
Special thanks to dataturks for their annotated dataset

Donation

If you have found my softwares to be of any use to you, do consider helping me pay my internet bills. This would encourage me to create many such softwares 😄

PayPal
₹ (INR)

Stargazer over time

pyresparser's People

Contributors

Stargazers

Watchers

Forkers

aileenlam thomassrob fzeeshan mgoodnight techwitz jvnrahul7 arifchauhan counterply jacksonx9 sudeep-menon davenportzxl gautamamber laxminarayen 24601 akafle1003 swarn10 johnathaningle allensmile mnthnjain fighting41love jay1493 cjwfg samyu emailhy perrysong sutirthaguha hanatharesh2712 jarcrack gowdapmohan ravikiranraoj21 omkarnimgire garyng radha45 probachaiyu tysoncung ghavan ssekuwanda rgunti1 davidirhs micodes2 nphihung94 vverman timileyin-alabi tejkumarspatil treenity mukulanand8 rastogiujj12 shivayv rocktimrajkumar msuvojit shweteekta gtossou fontecir hounddogdan matiaso vdoan oo7kartik dmitrymotorin nikitaboyko awoziji siddharth1701 gustawdaniel guoruijiao orionw rajat-ds nikita-121 kausalflow nullnotfound kevinwan divakar25 phani497 julianotten vintic jamunakarthik johnfelixyz phani43796 rypin fajnefarita flytwins parthakonda xinxin7 no7dw wafagabouj santosh544 varunvijaymenon the-black-knight-01 lwcreel elliott-king srinavyak lokeshsoni moayadalsouqi rplmco noknow606 bgonzalezfractal srikanthmaganti omkar-atugade karthikbhat13 hoggaan guruvarankl leodo1998

pyresparser's Issues

Custom NER Trained set

Hi,
I need to know which dataset is used to train the custom nlp ?
I understand that it is trained using datatrunks annotation tool but need to know what is the dataset bring used here?
Is it the same dataset of 200 something indeed resumes which is used in the datatrunks website?
Please give me a clarity on that.

Education and Collage Name filed does not work properly.

I can learn this project. But I can fetch Education and collage name properly. How I can solve it. Please give me some advise to solve it.

Parser doesn't get any skills

I've been trying to use the parser to fetch skill and experience data from resumes and the parser doesn't get any skills? I've attached the resumes I've been testing with
Brendan_Herger_Resume.pdf
Game-Programmer-Resume-A.pdf
Game-Programmer-Resume-B.pdf
Game-Programmer-Resume-C.pdf
john_smith.pdf
Layla_Martin_Resume.pdf
resume_Meyer.pdf
SGresume-1.pdf

Any help on this would be super awesome! Thanks!

OSError: [E053] while running both Django & pyresparser package

Hi OP,

I am facing issues while running this package. Screenshot are attached for the reference.

Error with loading en_core_web_sm with Spacy

Dear Mr.Omkar

I am installing Pyresparser, on a different server, with Python 3.6.3

I did the installation using the following commands,

============

aj@ubuntu:~$ pip3 install pyresparser

aj@ubuntu:~$ pip install -U spacy

aj@ubuntu:~$ python -m spacy download en_core_web_sm

aj@ubuntu:~$ pip install --user -U nltk

aj@ubuntu:~$ python -m nltk.downloader words

aj@ubuntu:~$ python

import nltk
nltk.download('stopwords')

=================

Error Message:

aj@ubuntu:~/webapps/app-quitzon/uploaded-documents$ pyresparser -f ffc6f69b791e2aecbd859e0932a5ea97ccdfeccaef67e64f8c93f7c684b5c99b.pdf
Extracting data from: ffc6f69b791e2aecbd859e0932a5ea97ccdfeccaef67e64f8c93f7c684b5c99b.pdf
Traceback (most recent call last):
File "/home/aj/.local/bin/pyresparser", line 11, in
sys.exit(main())
File "/home/aj/.local/lib/python3.6/site-packages/pyresparser/command_line.py", line 77, in main
pprint(cli_obj.extract_resume_data())
File "/home/aj/.local/lib/python3.6/site-packages/pyresparser/command_line.py", line 28, in extract_resume_data
return self.__extract_from_file(args.file)
File "/home/aj/.local/lib/python3.6/site-packages/pyresparser/command_line.py", line 37, in __extract_from_file
resume_parser = ResumeParser(file)
File "/home/aj/.local/lib/python3.6/site-packages/pyresparser/resume_parser.py", line 13, in init
nlp = spacy.load('en_core_web_sm')
File "/home/aj/.local/lib/python3.6/site-packages/spacy/init.py", line 27, in load
return util.load_model(name, **overrides)
File "/home/aj/.local/lib/python3.6/site-packages/spacy/util.py", line 139, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Please suggest, if I can need to do anything, whenever I try installing it, all steps happened correctly, for now and when the file is being extracted, I get this error now

thank you

With Best Regards
Raghu Veer

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte

when i run below code i got below error .
please help me

from pyresparser import ResumeParser
data = ResumeParser('abc.pdf').get_extracted_data()

File "C:\Users\user.conda\envs\env_bank\lib\site-packages\spacy\language.py", line 934, in
p, exclude=["vocab"]
File "tokenizer.pyx", line 528, in spacy.tokenizer.Tokenizer.from_disk
File "tokenizer.pyx", line 569, in spacy.tokenizer.Tokenizer.from_bytes
File "C:\Users\user.conda\envs\env_bank\lib\site-packages\spacy\util.py", line 630, in from_bytes
msg = srsly.msgpack_loads(bytes_data)
File "C:\Users\user.conda\envs\env_bank\lib\site-packages\srsly_msgpack_api.py", line 26, in msgpack_loads
msg = msgpack.loads(data, raw=False, use_list=use_list)
File "C:\Users\user.conda\envs\env_bank\lib\site-packages\srsly\msgpack_init_.py", line 64, in unpackb
return _unpackb(packed, **kwargs)
File "srsly\msgpack_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte

I appreciate your suggestion thank you advance.

Improve name parsing and add address parsing

Excellent work!

Can you please try to improve name parsing?

Can you please add address parsing too?

Thanks.

No module name pyreparser

I am getting the following error while running he file:
(base) C:\Users\tarun\Desktop\ResumeParser-master\resume_parser>python manage.py runserver
Watching for file changes with StatReloader
Performing system checks...

Exception in thread django-main-thread:
Traceback (most recent call last):
File "C:\Users\tarun\Anaconda3\lib\threading.py", line 917, in _bootstrap_inner
self.run()
File "C:\Users\tarun\Anaconda3\lib\threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\utils\autoreload.py", line 53, in wrapper
fn(*args, **kwargs)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\management\commands\runserver.py", line 117, in inner_run
self.check(display_num_errors=True)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\management\base.py", line 395, in check
include_deployment_checks=include_deployment_checks,
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\management\base.py", line 382, in run_checks
return checks.run_checks(**kwargs)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\checks\registry.py", line 72, in run_checks
new_errors = check(app_configs=app_configs)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\checks\urls.py", line 13, in check_url_config
return check_resolver(resolver)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\checks\urls.py", line 23, in check_resolver
return check_method()
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\urls\resolvers.py", line 407, in check
for pattern in self.url_patterns:
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\utils\functional.py", line 48, in get
res = instance.dict[self.name] = self.func(instance)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\urls\resolvers.py", line 588, in url_patterns
patterns = getattr(self.urlconf_module, "urlpatterns", self.urlconf_module)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\utils\functional.py", line 48, in get
res = instance.dict[self.name] = self.func(instance)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\urls\resolvers.py", line 581, in urlconf_module
return import_module(self.urlconf_name)
File "C:\Users\tarun\Anaconda3\lib\importlib_init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in call_with_frames_removed
File "C:\Users\tarun\Desktop\ResumeParser-master\resume_parser\resume_parser\urls.py", line 21, in
path('', include('parser_app.urls'))
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\urls\conf.py", line 34, in include
urlconf_module = import_module(urlconf_module)
File "C:\Users\tarun\Anaconda3\lib\importlib_init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "C:\Users\tarun\Desktop\ResumeParser-master\resume_parser\parser_app\urls.py", line 17, in
from . import views
File "C:\Users\tarun\Desktop\ResumeParser-master\resume_parser\parser_app\views.py", line 2, in
from pyreparser import ResumeParser
ModuleNotFoundError: No module named 'pyreparser'

Support for different languages / NER

Hi,
Please, add support for different languages.
May be, depending on document language (that we should detect at first) we should use different NER

Found great project that can be useful: russian language support - https://github.com/natasha
natasha - very well match person name

Some Skills are Missing

Some of the items are missing while using pyresparser.
Eg: MongoDB, GCP

Significant struggles with name identification

Thank you very much for the work you've done on this.

While the results of this are currently fairly good I've noticed names are a big struggle. I even ran your resume as a sample through the system and it returned "www.omkarpathak.in" for that field.

Do you think adding negative patterns for it to check against is the smartest short term solution for this problem? Otherwise do you think more training is required on the part of the NLP model regarding names?

If you need access to more data I have access to a large amount of CVs which I'd be happy to share.

Thanks again for your continued work on this project.

Error while installing pip3 install pyresparser

Hi Omkar,

I am trying to install the pyresparser and try your resume parser. This is the error I have been getting.

ERROR: Command errored out with exit status 1:
command: 'c:\python39\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Roopa\AppData\Local\Temp\pip-install-fx3izk7z\srsly\setup.py'"'"'; file='"'"'C:\Users\Roopa\AppData\Local\Temp\pip-install-fx3izk7z\srsly\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\Roopa\AppData\Local\Temp\pip-pip-egg-info-3k_1ms37'
cwd: C:\Users\Roopa\AppData\Local\Temp\pip-install-fx3izk7z\srsly
Complete output (5 lines):
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Roopa\AppData\Local\Temp\pip-install-fx3izk7z\srsly\setup.py", line 7, in
from Cython.Build import cythonize
ModuleNotFoundError: No module named 'Cython'
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I am an absolute beginner, any help would be greatly appreciated.

Thank you
Baptista Albert

using pyresparser by providing string to ResumeParser()

I am using this library to parse job descriptions as they come in email bodies in the form of text. But the ResumeParser() api only allows pdf or doc(x). Can someone suggest how to parse a python string using this?

skills are not extracting properly

Hi,
I am trying to extract skills and I have found an issue. The technical skills are merging with non technical skills. Could anyone guide me??

Link not working for labeled dataset

https://dataturks.com/blog/named-entity-recognition-in-resumes.php

Link not working , Can you please provide any other link ?

Improve codecov

Currently the code coverage is around 70%. Need help to write tests so as to obtain a code coverage above 90%

no parsing done for tables in the resume pdf/doc

Hi,
Have been trying to run the parser with the resumes containing data in tabular format like skills or experience in the resume is listed in a table , that information is skipped and is not parsed by the parser.

can you help in correcting the issue.

The library does not work for resumes with tables

I tried the library on a simple table type resume - resume.pdf
but got most of the values as null and also got the name wrong

@samacker77 this request has been in my bucket for a long. I Will try to make it a priority. Thanks

Originally posted by @OmkarPathak in #45 (comment)

Format data for learning

Hi,
I think it's good to have a tool that will be able to read input data:

text,pdf,doc,docx
name,
age,
skills,
... (Designation, worked at)

Find all this in text, to extract "start", "end" of every feature and append them to traindata json.

json decode issue when done in php command line applications

recently, when attempting to parse the response, I always used to get NULL, when checked,

we have text like:

"Extracting Data from "

then, after removing all extra whitespace, the json is still not being decoded in php.

a) when checked, I see some commas, missing in some parts of the json object (Example: Designation)

b) I did notice None, without quotes for some key value pairs (it happened with Degree & College Name w.r.t. a particular resume)

c) double quotes served better than single quotes, when tried validating the Modified Version of the Received JSON using https://jsonlint.com/

All the above errors happened when tried using pyresparser response in php (in php based Commandline applications/Cron Jobs),

do appreciate inputs on this,

thank you

pyresparser-extracted-text.txt

API Support for raw text input (in memory) instead of PDF/Word Doc file path

Can you please support providing an API to parse the resume that takes raw text (string or stream) as input instead of a file like doc or PDF on hard disk?

getting command not found error for pyresparser

Dear Omkar

I did install Pyresparser on my ubuntu server using the following command,

pip install pyresparser

While the script is correctly installed with all dependencies, I am getting Command not found error, when trying to parse a resume, by typing the command through Putty SSH Client.

aj@ubuntu:~/webapps/app-aj/files$ pyresparser -f resume.pdf
pyresparser: command not found

I would request you to please share, if I need to do anything else?

thank you

With Best Regards
Raghu veer

better experience parsing

When I sent the attached file through your parser, the returned experience data is

[ 'Bank of America',
'Dec 2013 - June 2014',
'Sales and Service Specialist',
'Mill Valley, CA',
'· Promoted due to proven ability to resolve complex service issues and process transactions accurately',
'and eﬃciently to guarantee customer satisfaction and build customer conﬁdence and trust. Responsible',
'for establishing, retaining and deepening relationships with customers to achieve team sales goals as',
'well as providing proactive sales activities of basic products while referring more complex requests such',
'as mortgages and investment products.',
'Bank of America',
'Aug 2012 - Dec 2013',
'Teller',
'Mill Valley, CA',
'· Gained proﬁciency in retail banking operations, including computing ﬁgures, processing transactions',
'with speed and accuracy and building customer loyalty through exceptional customer service. Learned',
'to control large amounts of cash ﬂow, work within established policies, procedures and guidelines and',
'acquired the ability to advise customers on products and services the bank has to oﬀer. Earned a',
'promotion to the position of Sales and Service Specialist.' ]

The parser seems to get 5 different kinds of data, the company, the duration, the job title, the job location, and a job description. Could you make it so that these different kinds of experience data are better organized? On a related note, for the job description data, it only gets it one line at a time. Could you make that data one string?

Thanks for building this and hope you'll be able to find solutions to my questions!
resume_Meyer.pdf

Exporting Output

How can you export the output using the CLI and python. I tried using the '-e' export format with no luck. Could you provide an example?

Build dependency error

Installing build dependencies ... error
ERROR: Command errored out with exit status 1:
command: 'c:\python 38\python.exe' 'c:\python 38\lib\site-packages\pip' install --ignore-installed --no-user --prefix 'C:\Users\XXX\AppData\Local\Temp\pip-build-env-tbkbkos8\overlay' --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools wheel 'cython>=0.25' 'cymem>=2.0.2,<2.1.0' 'preshed>=3.0.2,<3.1.0' 'murmurhash>=0.28.0,<1.1.0' thinc==7.4.1
cwd: None

and so on............
Help me out,I'm using python 3.8

Reading files from external resources

Basically, the resume parser will read file from local. Is there any way to read files from other than local path.
Eg: S3

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte

This is absolutely great.

Using pyresparser package I am able to extract the fields from a resume. To check the implementation downloaded code and did the setup as mentioned. When executed with the same resume it ended with error, details are below. Resume used for this doesn't contain any images and it is working with pyresparser.

Command: python resume_parser.py

Traceback (most recent call last):
File "resume_parser.py", line 133, in
data = ResumeParser('OmkarResume.pdf').get_extracted_data()
File "resume_parser.py", line 20, in init
custom_nlp = spacy.load(os.path.dirname(os.path.abspath(file)))
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy_init_.py", line 27, in load
return util.load_model(name, **overrides)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 133, in load_model
return load_model_from_path(Path(name), **overrides)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 173, in load_model_from_path
return nlp.from_disk(model_path)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\language.py", line 791, in from_disk
util.from_disk(path, deserializers, exclude)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 630, in from_disk
reader(path / key)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\language.py", line 781, in
deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(p, exclude=["vocab"])
File "tokenizer.pyx", line 391, in spacy.tokenizer.Tokenizer.from_disk
File "tokenizer.pyx", line 432, in spacy.tokenizer.Tokenizer.from_bytes
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 606, in from_bytes
msg = srsly.msgpack_loads(bytes_data)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\srsly_msgpack_api.py", line 29, in msgpack_loads
msg = msgpack.loads(data, raw=False, use_list=use_list)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\srsly\msgpack_init_.py", line 60, in unpackb
return _unpackb(packed, **kwargs)
File "_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte

Unable to understand why it is failing. Need your help in resolving this.

Thanks,
Praneeth

How to use multiple resume parser

where to use this ?
pyresparser -d /path/to/resume/directory/

OSError: [E053] Could not read meta.json from E:\majorProject\meta.json

Hii!!
I am trying to extract information through resume,but getting this error.Could anyone help me with this??

!pip install nltk
!pip install spacy==2.3.5
!pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
!pip install pyresparser

from pyresparser import ResumeParser
data = ResumeParser('resumes\\Resume.pdf').get_extracted_data()

Error:

OSError                                   Traceback (most recent call last)
<ipython-input-16-514bc438f146> in <module>
      1 from pyresparser import ResumeParser
----> 2 data = ResumeParser('resumes//Resume.pdf').get_extracted_data()

E:\majorProject\pyresparser.py in __init__(self, resume, skills_file, custom_regex)
     18     ):
     19         nlp = spacy.load('en_core_web_sm')
---> 20         custom_nlp = spacy.load(os.path.dirname(os.path.abspath(__file__)))
     21         self.__skills_file = skills_file
     22         self.__custom_regex = custom_regex

e:\pyresparser\lib\site-packages\spacy\__init__.py in load(name, **overrides)
     28     if depr_path not in (True, False, None):
     29         warnings.warn(Warnings.W001.format(path=depr_path), DeprecationWarning)
---> 30     return util.load_model(name, **overrides)
     31 
     32 

e:\pyresparser\lib\site-packages\spacy\util.py in load_model(name, **overrides)
    170             return load_model_from_package(name, **overrides)
    171         if Path(name).exists():  # path to model data directory
--> 172             return load_model_from_path(Path(name), **overrides)
    173     elif hasattr(name, "exists"):  # Path or Path-like to model data
    174         return load_model_from_path(name, **overrides)

e:\pyresparser\lib\site-packages\spacy\util.py in load_model_from_path(model_path, meta, **overrides)
    196     pipeline from meta.json and then calls from_disk() with path."""
    197     if not meta:
--> 198         meta = get_model_meta(model_path)
    199     # Support language factories registered via entry points (e.g. custom
    200     # language subclass) while keeping top-level language identifier "lang"

e:\pyresparser\lib\site-packages\spacy\util.py in get_model_meta(path)
    251     meta_path = model_path / "meta.json"
    252     if not meta_path.is_file():
--> 253         raise IOError(Errors.E053.format(path=meta_path))
    254     meta = srsly.read_json(meta_path)
    255     for setting in ["lang", "name", "version"]:

OSError: [E053] Could not read meta.json from E:\majorProject\meta.json

Thank You

Education extraction & Experience extraction is not working properly

To make education extraction more robust, let's introduce unigram and bigram extraction of words and match with both upper and lower case.

To make the experience more promising let's pull everything from the experience section and slice paragraphs as experiences.

running issue

how to run this project .please email at [email protected]

init() got an unexpected keyword argument 'codec'

I recently installed pyreparser and ran the code like so:

from pyresparser import ResumeParser
data = ResumeParser('r.pdf').get_extracted_data()

I keep getting the following error. I think it's a pdfminer error but not sure how to fix it inside pyreparser.

OSError: [E053] Could not read config.cfg from .....\venv\lib\site-packages\pyresparser\config.cfg

I have installed all the packages, but when I run:

from pyresparser import ResumeParser data = ResumeParser('C:/Users/Asus/Desktop/ResumeParserExample/test.pdf').get_extracted_data()

I get this error:

C:\Users\Asus\Desktop\ResumeParserExample\venv\lib\site-packages\spacy\util.py:715: UserWarning: [W094] Model 'en_training' (0.0.0) specifies an under-constrained spaCy version requirement: >=2.1.4. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.1,<3.1.0 warnings.warn(warn_msg)

OSError: [E053] Could not read config.cfg from C:\Users\Asus\Desktop\ResumeParserExample\venv\lib\site-packages\pyresparser\config.cfg

Does anyone know what the problem is?

Edit: looks like config.cfg doesn't get created at all

Add documentation in readthedocs

Add the documentation about implementation and features of this library

os.path.splitext(self.__resume)[1].split

Hi,
I am getting following errors. Please check if I am missing something.

Error:

  File "/home/****/.local/lib/python3.6/site-packages/pyresparser/resume_parser.py", line 40, in __init__
    ext = os.path.splitext(self.__resume)[1].split('.')[1]

IndexError: list index out of range

Thanks.

[E053] Could not read config.cfg from /root/anaconda3/lib/python3.8/site-packages/resume_parser/degree/model/config.cfg

Should be cust_ent['College_name']

pyresparser/pyresparser/resume_parser.py

Line 87 in 1755198

self.__details['college_name'] = entities['College Name']

Extraction of Names is not always correct

I have ran this module through bunch of 1000+ resumes and the name extraction itself is not always correct. Hardly 200 resumes extraction was successful. Have you validated this code against variety of resume samples?

Codec error while running parser

TypeError: init() got an unexpected keyword argument 'codec' while running parser

how can I import pyresparser in python code ?

OSError: [E053] Could not read config.cfg

i am getting above error after following the steps correctly . please help me with this

Multiple Documents Parser

Can you provide an example with output of multiple document parsing?

I can parse one pdf with no problem but when I point to a directory and try to parse multiple pdfs I can an error. It may be user error. I would appreciate any assistance. Thanks.

Below is the error I'm receiving:

ResumeParser('/Users/jthomas/Documents/resumes/').get_extracted_data()
Traceback (most recent call last):
File "<pyshell#86>", line 1, in
ResumeParser('/Users/jthomas/Documents/resumes/').get_extracted_data()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyresparser/resume_parser.py", line 40, in init
ext = os.path.splitext(self.__resume)[1].split('.')[1]
IndexError: list index out of range

UserWarning: [W031] Model 'en_training' (0.0.0) requires spaCy v2.1 and is incompatible with the current spaCy version (2.3.0)

Hi Omar,

Question, maybe you can help me out. So I've installed all the packages needed for pyreparser. Eventually I ended up with the newer version of Spacy(2.3.0) and en-core-web-sm(2.3.0), because the older versions gave me some errors while installing(I also tried to install the Requirements.txt in a different virtual environment, but that didn't work either).

As soon as I try the python command from your "usage" header, I get the following message:
C:\Users\Arthur\Documents\Python.venv\lib\site-packages\spacy\util.py:271: UserWarning: [W031] Model 'en_training' (0.0.0) requires spaCy v2.1 and is incompatible with the current spaCy version (2.3.0). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)

Any idea on how I can train the Model 'en_training' on version 2.3.0 of Spacy?

Cheers,

Arthur

Add filds to extract.

Would be great to extract fields like:

intendent position,
age ( birthday ),
known foreign languages,

Could you add support for more languages?

Right now only English is supported, spacy has support also for Spanish, Portugues, Italian and other languages.

It will be possible?

Cool code by the way ;)

data = ResumeParser("C:/Users/yriva/Desktop/NLP/resume.pdf").get_extracted_data()
C:\Users\yriva\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy\util.py:717: UserWarning: [W094] Model 'en_training' (0.0.0) specifies an under-constrained spaCy version requirement: >=2.1.4. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.6,<3.1.0
warnings.warn(warn_msg)
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\yriva\Desktop\pyresparser-master\pyresparser\resume_parser.py", line 21, in init
custom_nlp = spacy.load(os.path.dirname(os.path.abspath(file)))
File "C:\Users\yriva\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy_init_.py", line 50, in load
return util.load_model(
File "C:\Users\yriva\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy\util.py", line 326, in load_model
return load_model_from_path(Path(name), **kwargs)
File "C:\Users\yriva\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy\util.py", line 390, in load_model_from_path
config = load_config(config_path, overrides=dict_to_dot(config))
File "C:\Users\yriva\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy\util.py", line 547, in load_config
raise IOError(Errors.E053.format(path=config_path, name="config.cfg"))
OSError: [E053] Could not read config.cfg from C:\Users\yriva\Desktop\pyresparser-master\pyresparser\config.cfg

Please help. Thanks

omkarpathak / pyresparser Goto Github PK

pyresparser's Introduction

pyresparser

Features

Installation

Documentation

Supported File Formats

Usage

CLI

Notes:

Result

References that helped me get here

Donation

Stargazer over time

pyresparser's People

Contributors

Stargazers

Watchers

Forkers

pyresparser's Issues

Recommend Projects

Recommend Topics

Recommend Org