Coder Social home page Coder Social logo

j4nn0 / linkedin-web-scraper Goto Github PK

View Code? Open in Web Editor NEW
33.0 4.0 6.0 162 KB

Python Web Scraper for LinkedIn to collect and store company data (e.g. name, description, industry, etc.) into .xls file

Home Page: https://youtu.be/2pSjPOuMDhk

License: GNU General Public License v3.0

Python 100.00%
webscraping webscraper webscraper-website webscraper-api webscraping-search selenium selenium-webdriver selenium-python scrapy scrapy-crawler

linkedin-web-scraper's Introduction

LinkedIn Web Scraper

This is a LinkedIn Python Web Scraper for companies. The script fully simulate a human activity (using Selenium library) in order to get data from LinkedIn web pages. The purpose is store data from companies of a certain zone, such as:

  • Name
  • Overview/Description
  • Size
  • Website link
  • Industry
  • etc.

After collected the above information, these will be stored into an .xls file.

Disclaimer

Any actions and or activities related to the material contained within this repo is solely your responsibility. The misuse of the information in this repo can result in criminal charges brought against the company in question. The author will not be held responsible in the event any criminal charges be brought against any individuals misusing the information in this repo to break the law.

As written in Linkedin User Agreement: you agree you will not use [...] any bots or other automated methods to access the Services, add or download contacts, send or redirect messages.

Terms And Conditions

  • I do not promote, encourage, support or excite any illegal activity or hacking without written permission in general. The repo and author of the repo is no way responsible for any misuse of the information.
  • "linkedin-web-scraper" is just a terms that represents the name of the repo and is not a repo that provides any illegal information.
  • This repo is totally meant for providing information on Computer Software, Computer Programming and other related topics.
  • The Software's and Scripts provided by the repo should only be used for education purposes. The repo or the author can not be held responsible for the misuse of them by the users.
  • I am not responsible for any direct or indirect damage caused due to the usage of the code provided on this site. All the information provided on this repo are for educational purposes only.

Demo

Watch the video

Table of Contents

Usage

  1. Clone project

    git clone https://github.com/J4NN0/linkedin-web-scraper.git
    cd linkedin-web-scraper
    
  2. Install requirements

    pip install -r requirements.txt
    
  3. Download the web driver you prefer and put it inside project folder:

  4. Set missing configs in config.ini:

    • LinkedIn credentials i.e., EMAIL and PASSWORD.
    • The WEBDRIVER (downloaded on step 3).
    • And CITY from which companies have to be fetched.

    Note that also others kind of parameters can be set.

  5. Run script

    python3 main.py
    

    Data will be store into companies.xlsx file.

Troubleshooting

  • It could happen that, after the logging phase, LinkedIn could ask you to perform some actions/operations (e.g. "I'm not a robot", etc.) instead of redirecting you to the feed (https://www.linkedin.com/feed/) page.

    In this case:

    1. Stop the script.
    2. Log in with a browser in your account.
    3. Skip the required actions.
    4. Re-run the code.
  • If you face some issues using Python 3.9 (e.g. on installing dependencies), please try with Pyton 3.7 or below (but not earlier than version Python 3.0).

Resources

linkedin-web-scraper's People

Contributors

j4nn0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

linkedin-web-scraper's Issues

Unable to login to LinkedIn

Hello, when I run the script, the email and password fields are filled automatically on the LinkedIn login page, but the login button does not click. I get an error like this in the console because it doesn't login. Error code:

File "C:\Users\oktay\Desktop\linkedin-web-scraper\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="app__container"]/main/div/form/div[3]/button"}
(Session info: chrome=111.0.5563.65)

Process finished with exit code 1

Linkedin

can you help me?
thanks

Error on Browsernavigator.py

Hello, I run the code But I have an error in this line into Broswernavigator.py

raise NoSuchElementException("after ", self.sleep_time, " attempts the element is still not
found.")

Culd you please help me ?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.