Coder Social home page Coder Social logo

omkarpathak / pyresparser Goto Github PK

View Code? Open in Web Editor NEW
787.0 17.0 401.0 5.9 MB

A simple resume parser used for extracting information from resumes

License: GNU General Public License v3.0

Python 100.00%
resume-parser resume python python3 nlp parser machine-learning natural-language-processing resumes parsers

pyresparser's Introduction

pyresparser

A simple resume parser used for extracting information from resumes

Built with ❤︎ and ☕ by Omkar Pathak


GitHub stars PyPI Downloads GitHub PyPI - Python Version Say Thanks! Build Status codecov

Features

  • Extract name
  • Extract email
  • Extract mobile numbers
  • Extract skills
  • Extract total experience
  • Extract college name
  • Extract degree
  • Extract designation
  • Extract company names

Installation

  • You can install this package using
pip install pyresparser
  • For NLP operations we use spacy and nltk. Install them using below commands:
# spaCy
python -m spacy download en_core_web_sm

# nltk
python -m nltk.downloader words
python -m nltk.downloader stopwords

Documentation

Official documentation is available at: https://www.omkarpathak.in/pyresparser/

Supported File Formats

  • PDF and DOCx files are supported on all Operating Systems
  • If you want to extract DOC files you can install textract for your OS (Linux, MacOS)
  • Note: You just have to install textract (and nothing else) and doc files will get parsed easily

Usage

  • Import it in your Python project
from pyresparser import ResumeParser
data = ResumeParser('/path/to/resume/file').get_extracted_data()

CLI

For running the resume extractor you can also use the cli provided

usage: pyresparser [-h] [-f FILE] [-d DIRECTORY] [-r REMOTEFILE]
                   [-re CUSTOM_REGEX] [-sf SKILLSFILE] [-e EXPORT_FORMAT]

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  resume file to be extracted
  -d DIRECTORY, --directory DIRECTORY
                        directory containing all the resumes to be extracted
  -r REMOTEFILE, --remotefile REMOTEFILE
                        remote path for resume file to be extracted
  -re CUSTOM_REGEX, --custom-regex CUSTOM_REGEX
                        custom regex for parsing mobile numbers
  -sf SKILLSFILE, --skillsfile SKILLSFILE
                        custom skills CSV file against which skills are
                        searched for
  -e EXPORT_FORMAT, --export-format EXPORT_FORMAT
                        the information export format (json)

Notes:

  • If you are running the app on windows, then you can only extract .docs and .pdf files

Result

The module would return a list of dictionary objects with result as follows:

[
  {
    'college_name': ['Marathwada Mitra Mandal’s College of Engineering'],
    'company_names': None,
    'degree': ['B.E. IN COMPUTER ENGINEERING'],
    'designation': ['Manager',
                    'TECHNICAL CONTENT WRITER',
                    'DATA ENGINEER'],
    'email': '[email protected]',
    'mobile_number': '8087996634',
    'name': 'Omkar Pathak',
    'no_of_pages': 3,
    'skills': ['Operating systems',
              'Linux',
              'Github',
              'Testing',
              'Content',
              'Automation',
              'Python',
              'Css',
              'Website',
              'Django',
              'Opencv',
              'Programming',
              'C',
              ...],
    'total_experience': 1.83
  }
]

References that helped me get here

Donation

If you have found my softwares to be of any use to you, do consider helping me pay my internet bills. This would encourage me to create many such softwares 😄

PayPal Donate via PayPal!
₹ (INR) Donate via Instamojo

Stargazer over time

Stargazers over time

pyresparser's People

Contributors

dependabot[bot] avatar elliott-king avatar omkarpathak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyresparser's Issues

Custom NER Trained set

Hi,
I need to know which dataset is used to train the custom nlp ?
I understand that it is trained using datatrunks annotation tool but need to know what is the dataset bring used here?
Is it the same dataset of 200 something indeed resumes which is used in the datatrunks website?
Please give me a clarity on that.

Error with loading en_core_web_sm with Spacy

Dear Mr.Omkar

I am installing Pyresparser, on a different server, with Python 3.6.3

I did the installation using the following commands,

============

aj@ubuntu:~$ pip3 install pyresparser

aj@ubuntu:~$ pip install -U spacy

aj@ubuntu:~$ python -m spacy download en_core_web_sm

aj@ubuntu:~$ pip install --user -U nltk

aj@ubuntu:~$ python -m nltk.downloader words

aj@ubuntu:~$ python

import nltk
nltk.download('stopwords')

=================

Error Message:

aj@ubuntu:~/webapps/app-quitzon/uploaded-documents$ pyresparser -f ffc6f69b791e2aecbd859e0932a5ea97ccdfeccaef67e64f8c93f7c684b5c99b.pdf
Extracting data from: ffc6f69b791e2aecbd859e0932a5ea97ccdfeccaef67e64f8c93f7c684b5c99b.pdf
Traceback (most recent call last):
File "/home/aj/.local/bin/pyresparser", line 11, in
sys.exit(main())
File "/home/aj/.local/lib/python3.6/site-packages/pyresparser/command_line.py", line 77, in main
pprint(cli_obj.extract_resume_data())
File "/home/aj/.local/lib/python3.6/site-packages/pyresparser/command_line.py", line 28, in extract_resume_data
return self.__extract_from_file(args.file)
File "/home/aj/.local/lib/python3.6/site-packages/pyresparser/command_line.py", line 37, in __extract_from_file
resume_parser = ResumeParser(file)
File "/home/aj/.local/lib/python3.6/site-packages/pyresparser/resume_parser.py", line 13, in init
nlp = spacy.load('en_core_web_sm')
File "/home/aj/.local/lib/python3.6/site-packages/spacy/init.py", line 27, in load
return util.load_model(name, **overrides)
File "/home/aj/.local/lib/python3.6/site-packages/spacy/util.py", line 139, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Please suggest, if I can need to do anything, whenever I try installing it, all steps happened correctly, for now and when the file is being extracted, I get this error now

thank you

With Best Regards
Raghu Veer

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte

when i run below code i got below error .
please help me

from pyresparser import ResumeParser
data = ResumeParser('abc.pdf').get_extracted_data()

File "C:\Users\user.conda\envs\env_bank\lib\site-packages\spacy\language.py", line 934, in
p, exclude=["vocab"]
File "tokenizer.pyx", line 528, in spacy.tokenizer.Tokenizer.from_disk
File "tokenizer.pyx", line 569, in spacy.tokenizer.Tokenizer.from_bytes
File "C:\Users\user.conda\envs\env_bank\lib\site-packages\spacy\util.py", line 630, in from_bytes
msg = srsly.msgpack_loads(bytes_data)
File "C:\Users\user.conda\envs\env_bank\lib\site-packages\srsly_msgpack_api.py", line 26, in msgpack_loads
msg = msgpack.loads(data, raw=False, use_list=use_list)
File "C:\Users\user.conda\envs\env_bank\lib\site-packages\srsly\msgpack_init_.py", line 64, in unpackb
return _unpackb(packed, **kwargs)
File "srsly\msgpack_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte

I appreciate your suggestion thank you advance.

No module name pyreparser

I am getting the following error while running he file:
(base) C:\Users\tarun\Desktop\ResumeParser-master\resume_parser>python manage.py runserver
Watching for file changes with StatReloader
Performing system checks...

Exception in thread django-main-thread:
Traceback (most recent call last):
File "C:\Users\tarun\Anaconda3\lib\threading.py", line 917, in _bootstrap_inner
self.run()
File "C:\Users\tarun\Anaconda3\lib\threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\utils\autoreload.py", line 53, in wrapper
fn(*args, **kwargs)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\management\commands\runserver.py", line 117, in inner_run
self.check(display_num_errors=True)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\management\base.py", line 395, in check
include_deployment_checks=include_deployment_checks,
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\management\base.py", line 382, in run_checks
return checks.run_checks(**kwargs)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\checks\registry.py", line 72, in run_checks
new_errors = check(app_configs=app_configs)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\checks\urls.py", line 13, in check_url_config
return check_resolver(resolver)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\core\checks\urls.py", line 23, in check_resolver
return check_method()
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\urls\resolvers.py", line 407, in check
for pattern in self.url_patterns:
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\utils\functional.py", line 48, in get
res = instance.dict[self.name] = self.func(instance)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\urls\resolvers.py", line 588, in url_patterns
patterns = getattr(self.urlconf_module, "urlpatterns", self.urlconf_module)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\utils\functional.py", line 48, in get
res = instance.dict[self.name] = self.func(instance)
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\urls\resolvers.py", line 581, in urlconf_module
return import_module(self.urlconf_name)
File "C:\Users\tarun\Anaconda3\lib\importlib_init
.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in call_with_frames_removed
File "C:\Users\tarun\Desktop\ResumeParser-master\resume_parser\resume_parser\urls.py", line 21, in
path('', include('parser_app.urls'))
File "C:\Users\tarun\Anaconda3\lib\site-packages\django\urls\conf.py", line 34, in include
urlconf_module = import_module(urlconf_module)
File "C:\Users\tarun\Anaconda3\lib\importlib_init
.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "C:\Users\tarun\Desktop\ResumeParser-master\resume_parser\parser_app\urls.py", line 17, in
from . import views
File "C:\Users\tarun\Desktop\ResumeParser-master\resume_parser\parser_app\views.py", line 2, in
from pyreparser import ResumeParser
ModuleNotFoundError: No module named 'pyreparser'

Support for different languages / NER

Hi,
Please, add support for different languages.
May be, depending on document language (that we should detect at first) we should use different NER

Found great project that can be useful: russian language support - https://github.com/natasha
natasha - very well match person name

Significant struggles with name identification

Thank you very much for the work you've done on this.

While the results of this are currently fairly good I've noticed names are a big struggle. I even ran your resume as a sample through the system and it returned "www.omkarpathak.in" for that field.

Do you think adding negative patterns for it to check against is the smartest short term solution for this problem? Otherwise do you think more training is required on the part of the NLP model regarding names?

If you need access to more data I have access to a large amount of CVs which I'd be happy to share.

Thanks again for your continued work on this project.

Error while installing pip3 install pyresparser

Hi Omkar,

I am trying to install the pyresparser and try your resume parser. This is the error I have been getting.

ERROR: Command errored out with exit status 1:
command: 'c:\python39\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Roopa\AppData\Local\Temp\pip-install-fx3izk7z\srsly\setup.py'"'"'; file='"'"'C:\Users\Roopa\AppData\Local\Temp\pip-install-fx3izk7z\srsly\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\Roopa\AppData\Local\Temp\pip-pip-egg-info-3k_1ms37'
cwd: C:\Users\Roopa\AppData\Local\Temp\pip-install-fx3izk7z\srsly
Complete output (5 lines):
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Roopa\AppData\Local\Temp\pip-install-fx3izk7z\srsly\setup.py", line 7, in
from Cython.Build import cythonize
ModuleNotFoundError: No module named 'Cython'
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I am an absolute beginner, any help would be greatly appreciated.

Thank you
Baptista Albert

skills are not extracting properly

Hi,
I am trying to extract skills and I have found an issue. The technical skills are merging with non technical skills. Could anyone guide me??

Improve codecov

Currently the code coverage is around 70%. Need help to write tests so as to obtain a code coverage above 90%

no parsing done for tables in the resume pdf/doc

Hi,
Have been trying to run the parser with the resumes containing data in tabular format like skills or experience in the resume is listed in a table , that information is skipped and is not parsed by the parser.

can you help in correcting the issue.

Format data for learning

Hi,
I think it's good to have a tool that will be able to read input data:

  1. text,pdf,doc,docx
  2. name,
  3. age,
  4. skills,
  5. ... (Designation, worked at)

Find all this in text, to extract "start", "end" of every feature and append them to traindata json.

json decode issue when done in php command line applications

recently, when attempting to parse the response, I always used to get NULL, when checked,

we have text like:

"Extracting Data from "

then, after removing all extra whitespace, the json is still not being decoded in php.

a) when checked, I see some commas, missing in some parts of the json object (Example: Designation)

b) I did notice None, without quotes for some key value pairs (it happened with Degree & College Name w.r.t. a particular resume)

c) double quotes served better than single quotes, when tried validating the Modified Version of the Received JSON using https://jsonlint.com/

All the above errors happened when tried using pyresparser response in php (in php based Commandline applications/Cron Jobs),

do appreciate inputs on this,

thank you

pyresparser-extracted-text.txt

getting command not found error for pyresparser

Dear Omkar

I did install Pyresparser on my ubuntu server using the following command,

pip install pyresparser

While the script is correctly installed with all dependencies, I am getting Command not found error, when trying to parse a resume, by typing the command through Putty SSH Client.

aj@ubuntu:~/webapps/app-aj/files$ pyresparser -f resume.pdf
pyresparser: command not found

I would request you to please share, if I need to do anything else?

thank you

With Best Regards
Raghu veer

better experience parsing

When I sent the attached file through your parser, the returned experience data is

[ 'Bank of America',
'Dec 2013 - June 2014',
'Sales and Service Specialist',
'Mill Valley, CA',
'· Promoted due to proven ability to resolve complex service issues and process transactions accurately',
'and efficiently to guarantee customer satisfaction and build customer confidence and trust. Responsible',
'for establishing, retaining and deepening relationships with customers to achieve team sales goals as',
'well as providing proactive sales activities of basic products while referring more complex requests such',
'as mortgages and investment products.',
'Bank of America',
'Aug 2012 - Dec 2013',
'Teller',
'Mill Valley, CA',
'· Gained proficiency in retail banking operations, including computing figures, processing transactions',
'with speed and accuracy and building customer loyalty through exceptional customer service. Learned',
'to control large amounts of cash flow, work within established policies, procedures and guidelines and',
'acquired the ability to advise customers on products and services the bank has to offer. Earned a',
'promotion to the position of Sales and Service Specialist.' ]

The parser seems to get 5 different kinds of data, the company, the duration, the job title, the job location, and a job description. Could you make it so that these different kinds of experience data are better organized? On a related note, for the job description data, it only gets it one line at a time. Could you make that data one string?

Thanks for building this and hope you'll be able to find solutions to my questions!
resume_Meyer.pdf

Exporting Output

How can you export the output using the CLI and python. I tried using the '-e' export format with no luck. Could you provide an example?

Build dependency error

Installing build dependencies ... error
ERROR: Command errored out with exit status 1:
command: 'c:\python 38\python.exe' 'c:\python 38\lib\site-packages\pip' install --ignore-installed --no-user --prefix 'C:\Users\XXX\AppData\Local\Temp\pip-build-env-tbkbkos8\overlay' --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools wheel 'cython>=0.25' 'cymem>=2.0.2,<2.1.0' 'preshed>=3.0.2,<3.1.0' 'murmurhash>=0.28.0,<1.1.0' thinc==7.4.1
cwd: None

and so on............
Help me out,I'm using python 3.8

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte

This is absolutely great.

Using pyresparser package I am able to extract the fields from a resume. To check the implementation downloaded code and did the setup as mentioned. When executed with the same resume it ended with error, details are below. Resume used for this doesn't contain any images and it is working with pyresparser.

Command: python resume_parser.py

Traceback (most recent call last):
File "resume_parser.py", line 133, in
data = ResumeParser('OmkarResume.pdf').get_extracted_data()
File "resume_parser.py", line 20, in init
custom_nlp = spacy.load(os.path.dirname(os.path.abspath(file)))
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy_init_.py", line 27, in load
return util.load_model(name, **overrides)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 133, in load_model
return load_model_from_path(Path(name), **overrides)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 173, in load_model_from_path
return nlp.from_disk(model_path)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\language.py", line 791, in from_disk
util.from_disk(path, deserializers, exclude)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 630, in from_disk
reader(path / key)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\language.py", line 781, in
deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(p, exclude=["vocab"])
File "tokenizer.pyx", line 391, in spacy.tokenizer.Tokenizer.from_disk
File "tokenizer.pyx", line 432, in spacy.tokenizer.Tokenizer.from_bytes
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 606, in from_bytes
msg = srsly.msgpack_loads(bytes_data)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\srsly_msgpack_api.py", line 29, in msgpack_loads
msg = msgpack.loads(data, raw=False, use_list=use_list)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\srsly\msgpack_init_.py", line 60, in unpackb
return _unpackb(packed, **kwargs)
File "_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte

Unable to understand why it is failing. Need your help in resolving this.

Thanks,
Praneeth

OSError: [E053] Could not read meta.json from E:\majorProject\meta.json

Hii!!
I am trying to extract information through resume,but getting this error.Could anyone help me with this??

!pip install nltk
!pip install spacy==2.3.5
!pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
!pip install pyresparser

from pyresparser import ResumeParser
data = ResumeParser('resumes\\Resume.pdf').get_extracted_data()

Error:

OSError                                   Traceback (most recent call last)
<ipython-input-16-514bc438f146> in <module>
      1 from pyresparser import ResumeParser
----> 2 data = ResumeParser('resumes//Resume.pdf').get_extracted_data()

E:\majorProject\pyresparser.py in __init__(self, resume, skills_file, custom_regex)
     18     ):
     19         nlp = spacy.load('en_core_web_sm')
---> 20         custom_nlp = spacy.load(os.path.dirname(os.path.abspath(__file__)))
     21         self.__skills_file = skills_file
     22         self.__custom_regex = custom_regex

e:\pyresparser\lib\site-packages\spacy\__init__.py in load(name, **overrides)
     28     if depr_path not in (True, False, None):
     29         warnings.warn(Warnings.W001.format(path=depr_path), DeprecationWarning)
---> 30     return util.load_model(name, **overrides)
     31 
     32 

e:\pyresparser\lib\site-packages\spacy\util.py in load_model(name, **overrides)
    170             return load_model_from_package(name, **overrides)
    171         if Path(name).exists():  # path to model data directory
--> 172             return load_model_from_path(Path(name), **overrides)
    173     elif hasattr(name, "exists"):  # Path or Path-like to model data
    174         return load_model_from_path(name, **overrides)

e:\pyresparser\lib\site-packages\spacy\util.py in load_model_from_path(model_path, meta, **overrides)
    196     pipeline from meta.json and then calls from_disk() with path."""
    197     if not meta:
--> 198         meta = get_model_meta(model_path)
    199     # Support language factories registered via entry points (e.g. custom
    200     # language subclass) while keeping top-level language identifier "lang"

e:\pyresparser\lib\site-packages\spacy\util.py in get_model_meta(path)
    251     meta_path = model_path / "meta.json"
    252     if not meta_path.is_file():
--> 253         raise IOError(Errors.E053.format(path=meta_path))
    254     meta = srsly.read_json(meta_path)
    255     for setting in ["lang", "name", "version"]:

OSError: [E053] Could not read meta.json from E:\majorProject\meta.json

Thank You

__init__() got an unexpected keyword argument 'codec'

I recently installed pyreparser and ran the code like so:

from pyresparser import ResumeParser
data = ResumeParser('r.pdf').get_extracted_data()

I keep getting the following error. I think it's a pdfminer error but not sure how to fix it inside pyreparser.
image

OSError: [E053] Could not read config.cfg from .....\venv\lib\site-packages\pyresparser\config.cfg

I have installed all the packages, but when I run:

from pyresparser import ResumeParser data = ResumeParser('C:/Users/Asus/Desktop/ResumeParserExample/test.pdf').get_extracted_data()

I get this error:

C:\Users\Asus\Desktop\ResumeParserExample\venv\lib\site-packages\spacy\util.py:715: UserWarning: [W094] Model 'en_training' (0.0.0) specifies an under-constrained spaCy version requirement: >=2.1.4. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.1,<3.1.0 warnings.warn(warn_msg)

OSError: [E053] Could not read config.cfg from C:\Users\Asus\Desktop\ResumeParserExample\venv\lib\site-packages\pyresparser\config.cfg

Does anyone know what the problem is?

Edit: looks like config.cfg doesn't get created at all

os.path.splitext(self.__resume)[1].split

Hi,
I am getting following errors. Please check if I am missing something.

Error:

  File "/home/****/.local/lib/python3.6/site-packages/pyresparser/resume_parser.py", line 40, in __init__
    ext = os.path.splitext(self.__resume)[1].split('.')[1]

IndexError: list index out of range

Thanks.

Extraction of Names is not always correct

I have ran this module through bunch of 1000+ resumes and the name extraction itself is not always correct. Hardly 200 resumes extraction was successful. Have you validated this code against variety of resume samples?

Multiple Documents Parser

Can you provide an example with output of multiple document parsing?

I can parse one pdf with no problem but when I point to a directory and try to parse multiple pdfs I can an error. It may be user error. I would appreciate any assistance. Thanks.

Below is the error I'm receiving:

ResumeParser('/Users/jthomas/Documents/resumes/').get_extracted_data()
Traceback (most recent call last):
File "<pyshell#86>", line 1, in
ResumeParser('/Users/jthomas/Documents/resumes/').get_extracted_data()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyresparser/resume_parser.py", line 40, in init
ext = os.path.splitext(self.__resume)[1].split('.')[1]
IndexError: list index out of range

UserWarning: [W031] Model 'en_training' (0.0.0) requires spaCy v2.1 and is incompatible with the current spaCy version (2.3.0)

Hi Omar,

Question, maybe you can help me out. So I've installed all the packages needed for pyreparser. Eventually I ended up with the newer version of Spacy(2.3.0) and en-core-web-sm(2.3.0), because the older versions gave me some errors while installing(I also tried to install the Requirements.txt in a different virtual environment, but that didn't work either).

As soon as I try the python command from your "usage" header, I get the following message:
C:\Users\Arthur\Documents\Python.venv\lib\site-packages\spacy\util.py:271: UserWarning: [W031] Model 'en_training' (0.0.0) requires spaCy v2.1 and is incompatible with the current spaCy version (2.3.0). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)

Any idea on how I can train the Model 'en_training' on version 2.3.0 of Spacy?

Cheers,

Arthur

Add filds to extract.

Would be great to extract fields like:

  1. intendent position,
  2. age ( birthday ),
  3. known foreign languages,

train model shows entity overlap

@OmkarPathak Can you please help me on train a custom model. Help me to train without overlapping. Is there function/methodology to avoid overlapping.

ValueError: [E103] Trying to set conflicting doc.ents: '(4774, 4778, 'Location')' and '(4744, 4789, 'College Name')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

spacy.gold Not found

On my windows AMD64 machine, I get below error:

from spacy.gold import GoldParse
ModuleNotFoundError: No module named 'spacy.gold'

Errors running the program.

I'm a newbie at this and I'm doing a university project that includes extraction of information from a resume but when I try to run your parser I experience a lot of errors such as:

data = ResumeParser("C:/Users/yriva/Desktop/NLP/resume.pdf").get_extracted_data()
C:\Users\yriva\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy\util.py:717: UserWarning: [W094] Model 'en_training' (0.0.0) specifies an under-constrained spaCy version requirement: >=2.1.4. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.6,<3.1.0
warnings.warn(warn_msg)
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\yriva\Desktop\pyresparser-master\pyresparser\resume_parser.py", line 21, in init
custom_nlp = spacy.load(os.path.dirname(os.path.abspath(file)))
File "C:\Users\yriva\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy_init_.py", line 50, in load
return util.load_model(
File "C:\Users\yriva\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy\util.py", line 326, in load_model
return load_model_from_path(Path(name), **kwargs)
File "C:\Users\yriva\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy\util.py", line 390, in load_model_from_path
config = load_config(config_path, overrides=dict_to_dot(config))
File "C:\Users\yriva\AppData\Local\Programs\Python\Python39\lib\site-packages\spacy\util.py", line 547, in load_config
raise IOError(Errors.E053.format(path=config_path, name="config.cfg"))
OSError: [E053] Could not read config.cfg from C:\Users\yriva\Desktop\pyresparser-master\pyresparser\config.cfg

Please help. Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.