A Python library that scrapes essential information from PDFs of LinkedIn profiles.
This is a parser that extracts important information from a LinkedIn profile PDF. It converts the PDF to a list of strings, and then uses LinkedIn's headers to create a dictionary that maps said headers to string values that contain the most relevant parts of a candidate's profile.
Install the library's dependencies and build the library using:
pip install ez-parse
Visit the LinkedIn profile that you would like to parse. Under the individual's basic profile information, there is a button labeled "More". Click on this button, and then click on "Save to PDF".
In your code, begin by importing the package:
from ez-parse import parser
You can extract the text data from the PDF like so:
data = parser.extract_pdf(<path_to_linkedin_pdf>)
This parsed data can also be stored in a dictionary:
res = parser.get_many(data)
Below are some minimal examples of how the helper functions for each header work. Note how each helper function is designed to exclude information after encountering an irrelevant section header:
- get_contact
>>> li = ["999-999-9999", "Email", "URL", "Top Skills"]
>>> print(get_contact(li, -1)[0])
["999-999-9999", "Email", "URL"]
- get_skills
>>> li = ["Python", "Java", "C++", "Certifications"]
>>> print(get_skills(li, -1)[0])
["Python", "Java", "C++"]
- get_certifications
>>> li = ["QuickBooks", "CPR", "Bartending", "Honors-Awards"]
>>> print(get_certifications(li, -1)[0])
["QuickBooks", "CPR", "Bartending"]
- get_honors
>>> li = ["USACO Gold", "USAMO Silver", "USACO Bronze"]
>>> print(get_honors(li, -1)[0])
["USACO Gold", "USAMO Silver", "USACO Bronze"]
- get_summary
>>> li = ["A", "mysterious", "person.", "Languages"]
>>> print(get_summary(li, -1)[0])
["A", "mysterious", "person."]
- get_languages
>>> li = ["English", "Spanish", "Latin"]
>>> print(get_languages(li, -1)[0])
["English", "Spanish", "Latin"]
For a more in-depth example that extracts text from the PDF and relies on all of these helper functions, please see the documentation.