Coder Social home page Coder Social logo

ayushjainrksh / conactivity Goto Github PK

View Code? Open in Web Editor NEW
35.0 2.0 16.0 135 KB

A tool built with Puppeteer that parses the LinkedIn profiles of a company's employees and returns the list of active employees.

License: MIT License

JavaScript 100.00%
webscraping networking javascript hacktoberfest hacktoberfest2021 nodejs hacktoberfest2022

conactivity's Introduction

Hi ๐Ÿ‘‹, this project is being actively developed and maintained. If you would like to receive updates about the progress, you can follow @ayushjn_ on Twitter ๐Ÿ‘ค.

ConActivity

"Connect with active LinkedIn users"

License: MIT GitHub issues PRs welcome! All Contributors GitHub stars Follow @ayushjn_

Do you find it difficult to network with people on LinkedIn?

If most of your connection requests to recruiters or employees are not being accepted then there's a possibility that:

  • You didn't attach an invite note with the connection request (but you can eliminate this next time when you send the requests).
  • A particular employee might not be interested in accepting the connections or is not interested in your profile (this rarely happens).
  • The most frequent reason for a connection request not being accepted is that the employee may not be active on LinkedIn or does not have time to check their account.

To avoid waste of time and effort sending connection requests to inactive LinkedIn members, use ConActivity.

ConActivity is a tool that scrapes LinkedIn data and returns the profile links of a company's employees active on LinkedIn. Using ConActivity, you can target active LinkedIn users and send connection requests.

Getting started

Try it out!

Prerequisites

Usage

  • Clone the repo.

    git clone https://github.com/ayushjainrksh/conactivity.git

  • Navigate to the cloned repo.

    cd conactivity

  • In the root directory, install dependencies.

    npm install

  • Add your credentials
    • Create a .env file in the root directory.

      touch .env

    • Add your LinkedIn account credentials and the company's LinkedIn handle. Your .env file should look like:
      EMAIL=<LinkedIn email ID>
      PASSWORD=<LinkedIn password>
      COMPANY=google
      
  • Now you're all set. Run the script.

    npm start

Wait for the script to complete parsing. The links would appear in the terminal. You can visit the active user profiles and connect by attaching an invite note. Update the .env file to repeat the process for any other company.

How does it work?

  1. The user enters the company's LinkedIn handle and runs ConActivity.
  2. The script launches an automated browser tab.
  3. The user is logged in with their account credentials automatically.
  4. The script redirects to the company's profile page and visits the all employees page from there.
  5. Now the script scrapes all the links to user profiles and visits their activity pages one by one.
  6. It parses the last 5 activities(likes, comments, posts, etc.) of employees.
  7. The script return urls of the employees active on linkedIn within a week.
  8. You can use these URLs to visit the profiles and send connection requests.

Features

  • Get direct LinkedIn handles of active employees of a company in a few minutes.
  • LinkedIn users won't be notified when you use the script as it doesn't visit their profiles.

Caveats

  • Problematic with slow internet speed (check your internet connection and try again).
  • There's a limit to the number of LinkedIn logins at a given time (if you see a security check on login, please wait for some time before using the script again).

LICENCE

ConActivity is licenced under the MIT Licence.

Contributing โค๏ธ

Follow contributing.md to start contributions.

Code of Conduct

Read our code_of_conduct.md

If you get stuck somewhere, feel free to open an issue for discussion or shoot a DM on my socials.

Terms of service

Please read LinkedIn's User agreement before using this script.

This script is being used for educational purposes only and discourages users to scrape large amount of data at a time as this can lead to the termination of their LinkedIn account. The author or any of the contributor doesn't hold any responsibility in such a case whatsoever. It is recommended to use a secondary LinkedIn account to use the script for a longer period of time to avoid the risk of losing your LinkedIn account.

Contributors โœจ

Thanks goes to these wonderful people (emoji key):


Ayush Jain

๐Ÿ’ป ๐Ÿ“–

Nancy Chauhan

๐Ÿ’ป ๐Ÿ›

Rachitt Shah

๐Ÿ“–

Swapnil Sengupta

๐Ÿ“–

Rajkumar S

๐Ÿ’ป ๐Ÿ›

Cedric Wille

๐Ÿ’ป

Aman Desai

๐Ÿ’ป

Stanislav Petrosyan

๐Ÿ“–

Tushar Singh

๐Ÿ’ป

This project follows the all-contributors specification. Contributions of any kind welcome!

conactivity's People

Contributors

allcontributors[bot] avatar amandesai01 avatar ayushjainrksh avatar cwille97 avatar nancy-chauhan avatar rachittshah avatar rajkumaar23 avatar stanipetrosyan avatar swapnil-2001 avatar tushar1210 avatar utkarshsingh99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

conactivity's Issues

Cannot bypass LinkedIn security checks

Describe the bug
If the user uses the script multiple times in a very short interval of time to log in to their LinkedIn account, LinkedIn starts asking for security checks after login. This results in the script to fail at that point(as it is unable to check profiles).

To Reproduce
Steps to reproduce the behavior:

  1. Run the script 20-30 times in a very short interval using the same LinkedIn account.
  2. The security check pops up on the automated chrome tab.

Expected behavior
At the very minimum, the script should be able to report to the user that they cannot use the script for some time and should try again later because of the security checks. The best solution would be a way to bypass the security checks as well (which I don't think is possible but maybe there's a workaround).

Desktop (please complete the following information):

  • OS: Linux

Some Useful Changes

Below are some small problems I personally faced while using and will improve user experience to a great extent.

Script Crashes Under following situations:

  1. When you don't provide .env file.
  2. When You enter wrong credentials.

Script doesn't take care of following cases:

  1. When we change credentials in .env file. Since we reload previous session.
  2. Sometimes, Linkedin doesn't provide links and names. It just says 'LinkedIn Member' (Happens mostly when you have new account, since they don't want random people to exploit data. Which most of the our users will be doing since they don't want to risk their original account)
  3. Since we have established most of the users will create new account, when logged in, LinkedIn gives suggestion to add phone number. Or some other random page. Which makes script crash.

Proposed Solutions:


  1. It is a simple validation.
  2. We can find a specific tag that will indicate that wrong password is entered. Then, create a LoginValidator Module which will be responsible for that.

  1. We can map cookies with email+password. That way, we will also be able to maintain multiple sessions and reload appropriate one. Since cookies.json will never be pushed, there is no harm storing that way.
{
"EMAIL+PASSWORD" : <Cookie Object>,
"EMAIL+PASSWORD" : <Cookie Object>
.
.
}
  1. We can inspect main company page and find a class / tag which is specific to that page and then use it as identifier to validate that we are on that page.
  2. Point no 2 will take care of this issue as well.

Update the readme file.

Describe the bug
The description of the .env file has " : ",which does not work. Replaced " : " with " = "

To Reproduce
Steps to reproduce the behavior:

  1. Go to the .env file
  2. Edit the login details like

EMAIL=<LinkedIn email ID>
PASSWORD=<LinkedIn password>
COMPANY=google

Expected behavior
Ideally, : should work , but = doesn't show errors

Desktop (please complete the following information):

  • OS: Windows 10

Additional context

Refactor code into smaller components.

Describe the bug
The codebase is growing without any restrictions on the contribution style(to welcome contributions from people new to open source). It has become difficult to read and needs refactoring.

Expected behavior

  • Refactor code into modular bits that perform specific operations and are easy to understand.
  • Some parts of code can be improved (in terms of the coding style and choices).

Additional context
Might as well create separate files for large reproducible operations.

Save links to a file for future use

Is your feature request related to a problem? Please describe.
The results of the scraping are displayed on the terminal. A good to have feature would be to add the scraped profile links to a file if the user wants to look back or connect with someone else.

Describe the solution you'd like

  • The output can be appended to a file with the name of the company inside a folder.
  • Add the folder to .gitignore.

Describe alternatives you've considered
Printing on terminal is what we have for now

Persist login session using cookies

Is your feature request related to a problem? Please describe.
Currently, the script performs a login every time the user runs the script. This can lead to security checks to pop up and stop the script from executing. Also, it causes suspicion because of multiple logins. A great improvement would be to login once, persist the session, and use it next time the user visits.

Describe the solution you'd like
This can be solved by storing the session details in a local cookie.json file when the user uses the script for the first time. The script would read the session details from the file to get the session and the script would skip the user login.

Let's recognize every contribution

Is your feature request related to a problem? Please describe.
Add all-contributor to the project to recognize every contribution towards making this project a success.

Convert code to Typescript

Is your feature request related to a problem? Please describe.
Difficult to code and maintain plain javascript. Makes debugging a hell.

Describe the solution you'd like

  • Convert code to typescript.
  • Write type definitions.

Sort the returned profiles based on activity

Is your feature request related to a problem? Please describe.
Currently, the script returns URLs or active profiles in order in which they are parsed. A good to have feature would be finding the N most active employees on the M first employee pages and return the URLs sorted on the basis of activity.

Describe the solution you'd like
The user should be able to configure their search by modifying the .env file. Additional arguments can be passed to the scraping function.

Iterate over the list of profileListSelectors

Is your feature request related to a problem? Please describe.
Generalize this section of code to use a loop instead of hardcoding array indexes. This would be useful if we want to add any more profileListSelectors in the future. Moreover, generalizing it would help in code clarity (Add comments to explain the approach in the code).

conactivity/scrape.js

Lines 110 to 114 in 83c3c40

const profileListNodes =
(document.querySelectorAll(profileListSelectors[0]).length &&
document.querySelectorAll(profileListSelectors[0])) ||
(document.querySelectorAll(profileListSelectors[1]).length &&
document.querySelectorAll(profileListSelectors[1]));

Additional context
To understand what this piece of code does, go through #30

Open discussion

Hi ๐Ÿ‘‹ This is an open discussion to discuss absolutely anything about this project or puppeteer in general. You can use this as a public chat channel. Please be mindful while addressing others and read our code of conduct.

Topics of discussion can be :

  • How to improve this project?
  • What are some bugs that you encounter?
  • Help related to puppeteer.
  • How does this project help?
  • What are it's use cases?

You get the idea.

Add number of pages option to env

Is your feature request related to a problem? Please describe.
The number of pages that the script would visit to fetch profiles is hardcoded to 2 by default. It would be good to have it in an environment variable that can be declared in .env file and defaults to 2 (in not provided).

Describe the solution you'd like
Pass an argument numberOfPages (an environment variable) to the scrapeLinkedIn function and other places where it is required.

P.S. This would require a documentation update.

Filter employees based on their jobs

Is your feature request related to a problem? Please describe.
Currently, the script fetches and parses all the employees of a company. It would be really useful to have an option to filter out employees based on their job titles such as HR, Software Developer, or industry such as Computer Software and Marketing.

Describe the solution you'd like
This can be achieved by applying LinkedIn filters or modifying the search parameters.
Screenshot from 2020-09-09 13-32-05

Additional context
It would also help people trying to connect with only HRs for job search.

Doesn't work well for accounts with a few number of connections

Is your feature request related to a problem? Please describe.
The scraper can scrape the data only when you have a decent amount of connections(500+). This is because there's a concept of "people in your network" on LinkedIn which hides LinkedIn members out of your network.
Screenshot from 2020-09-19 22-43-31

Describe the solution you'd like
The idea is to either bypass this restriction somehow or to bypass user login and use some helper accounts made for this purpose. Both works for this problem but the preferred solution would be to bypass "in your network" restrictions.

Add function descriptions in jsdocs format

Describe the bug
The newly added functions do not have function jsdocs to explain their functionality.

Expected behavior
Add a jsdocs on the top of each function just like the function scrapeLinkedIn.

Unable to fetch users

Describe the bug
The script is able to log me in a new browser session and open the company's linkedin page, but it fails in the end, giving this error from the catch block:

Error: No node found for selector: a.ember-view.link-without-visited-state.inline-block

I believe it's somehow unable to execute line 239 of scrape.js.

To Reproduce
Steps to reproduce the behavior:

  1. I simply followed the instructions in the README, but was unable to get the script working.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
I made a small change in the scrape.js after it didn't work originally.
So the error being displayed is just the error e from the catch block that is being logged along with the console.error() message.
2021-02-02-113515_935x456_scrot

Desktop (please complete the following information):

  • OS: Ubuntu
  • Window Manager: i3
  • Default browser that opens while executing the script: Chromium

Scraping error due to query selectors

Describe the bug

Active users on page 0:  []
Active users on page 1:  []
Oops! An error occured.
Error: No node found for selector: .artdeco-pagination__button.artdeco-pagination__button--next
    at Object.exports.assert (/home/rajkumar/Documents/projects/linkedin-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15)
    at DOMWorld.click (/home/rajkumar/Documents/projects/linkedin-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/DOMWorld.js:273:21)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async scrapeLinkedIn (/home/rajkumar/Documents/projects/linkedin-scraper/scrape.js:149:7)

To Reproduce
Steps to reproduce the behavior:

  1. Insert google in place of COMPANY in .env
  2. Run npm start
  3. Expect the error above

Expected behavior
It should work as expected.

Screenshots
NA

Desktop (please complete the following information):

  • OS: elementaryOS

Deploy to cloud for running scraper in the background

Is your feature request related to a problem? Please describe.
Currently, the script can only be run locally on the user's system which is a time-consuming task. The goal is to be able to run the script in the background on a server and be able to check the results when the execution finishes.

Describe the solution you'd like
This can be achieved by wrapping up the script into an API and deploying it on the cloud such as AWS.

Says No Active Users Found

Describe the bug
When I run the script, It says no active users found on page 0. Although, there are users listed. I used process.env.company as Google

Console Logs

DevTools listening on ws://127.0.0.1:57518/devtools/browser/bf2b62bf-8ce0-41ff-aac9-419958394897
Previous session loaded successfully!
[38436:775:1002/184946.651985:ERROR:device_event_log_impl.cc(208)] [18:49:46.651] FIDO: touch_id_context.mm:125 Touch ID authenticator unavailable because keychain-access-group entitlement is missing or incorrect
[38453:50951:1002/184959.974540:ERROR:batching_media_log.cc(38)] MediaEvent: {"error":"FFmpegDemuxer: no supported streams"}
[38453:775:1002/184959.986721:ERROR:batching_media_log.cc(35)] MediaEvent: {"pipeline_error":14}
Active users on page 0:  []
2020-10-02 18:50:45.398 Chromium Helper (Renderer)[38453:12184431] CoreText note: Client requested name ".PingFangSC-Medium", it will get Times-Roman rather than the intended font. All system UI font access should be through proper APIs such as CTFontCreateUIFontForLanguage() or +[NSFont systemFontOfSize:].
2020-10-02 18:50:45.398 Chromium Helper (Renderer)[38453:12184431] CoreText note: Set a breakpoint on CTFontLogSystemFontNameRequest to debug.
2020-10-02 18:50:45.409 Chromium Helper (Renderer)[38453:12184431] CoreText note: Client requested name ".PingFangSC-Medium", it will get Times-Roman rather than the intended font. All system UI font access should be through proper APIs such as CTFontCreateUIFontForLanguage() or +[NSFont systemFontOfSize:].
2020-10-02 18:50:45.410 Chromium Helper (Renderer)[38453:12184431] CoreText note: Client requested name ".PingFangSC-Medium", it will get Times-Roman rather than the intended font. All system UI font access should be through proper APIs such as CTFontCreateUIFontForLanguage() or +[NSFont systemFontOfSize:].
2020-10-02 18:50:45.411 Chromium Helper (Renderer)[38453:12184431] CoreText note: Client requested name ".PingFangSC-Medium", it will get Times-Roman rather than the intended font. All system UI font access should be through proper APIs such as CTFontCreateUIFontForLanguage() or +[NSFont systemFontOfSize:].
2020-10-02 18:50:45.461 Chromium Helper (Renderer)[38453:12184431] CoreText note: Client requested name ".AppleSDGothicNeoI-SemiBold", it will get Times-Roman rather than the intended font. All system UI font access should be through proper APIs such as CTFontCreateUIFontForLanguage() or +[NSFont systemFontOfSize:].
2020-10-02 18:50:45.468 Chromium Helper (Renderer)[38453:12184431] CoreText note: Client requested name ".PingFangSC-Medium", it will get Times-Roman rather than the intended font. All system UI font access should be through proper APIs such as CTFontCreateUIFontForLanguage() or +[NSFont systemFontOfSize:].
2020-10-02 18:50:45.496 Chromium Helper (Renderer)[38453:12184431] CoreText note: Client requested name ".PingFangSC-Medium", it will get Times-Roman rather than the intended font. All system UI font access should be through proper APIs such as CTFontCreateUIFontForLanguage() or +[NSFont systemFontOfSize:].
2020-10-02 18:50:45.497 Chromium Helper (Renderer)[38453:12184431] CoreText note: Client requested name ".PingFangSC-Medium", it will get Times-Roman rather than the intended font. All system UI font access should be through proper APIs such as CTFontCreateUIFontForLanguage() or +[NSFont systemFontOfSize:].
2020-10-02 18:50:45.501 Chromium Helper (Renderer)[38453:12184431] CoreText note: Client requested name ".PingFangSC-Medium", it will get Times-Roman rather than the intended font. All system UI font access should be through proper APIs such as CTFontCreateUIFontForLanguage() or +[NSFont systemFontOfSize:].
2020-10-02 18:50:45.510 Chromium Helper (Renderer)[38453:12184431] CoreText note: Client requested name ".PingFangSC-Medium", it will get Times-Roman rather than the intended font. All system UI font access should be through proper APIs such as CTFontCreateUIFontForLanguage() or +[NSFont systemFontOfSize:].
[38453:50951:1002/185051.643056:ERROR:batching_media_log.cc(38)] MediaEvent: {"error":"FFmpegDemuxer: no supported streams"}
[38453:775:1002/185051.714533:ERROR:batching_media_log.cc(35)] MediaEvent: {"pipeline_error":14}
[38453:50951:1002/185051.719896:ERROR:batching_media_log.cc(38)] MediaEvent: {"error":"FFmpegDemuxer: no supported streams"}
[38453:775:1002/185051.775038:ERROR:batching_media_log.cc(35)] MediaEvent: {"pipeline_error":14}
[38453:50951:1002/185051.887375:ERROR:batching_media_log.cc(38)] MediaEvent: {"error":"FFmpegDemuxer: no supported streams"}
[38453:775:1002/185051.888243:ERROR:batching_media_log.cc(35)] MediaEvent: {"pipeline_error":14}
^Cnpm ERR! code ELIFECYCLE
npm ERR! errno 130
npm ERR! [email protected] start: `node scrape.js`
npm ERR! Exit status 130
npm ERR! 
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /Users/aman/.npm/_logs/2020-10-02T13_20_59_693Z-debug.log

To Reproduce
Steps to reproduce the behavior:
Set COMPANY=Google in .env file. and run as directed in README.

Expected behavior
A clear and concise description of what you expected to happen.

Desktop (please complete the following information):

  • OS: macOS

Improve algorithm

Is your feature request related to a problem? Please describe.
Yes, it is a waste of bandwidth and time revisiting the profiles list page (because of .back()) after browsing the recent activity of every user.

Describe the solution you'd like
Instead, initially all the users' profile links of 2 (or) whatever the max number of pages to traverse can be cached into an array in memory before-hand and the same can be used for traversing each. This will remove the unnecessary usage of .back(). Rather, we can traverse through the array and basically visit one profile after other.

Describe alternatives you've considered
NA

Additional context
NA

Doesn't work on accounts with 2FA

Describe the bug
Even if I give in right credentials, the program fails to login when I have 2FA enabled in my LinkedIn account.

To Reproduce
Steps to reproduce the behavior:

  1. Enable 2FA in Linkedin
  2. Use the same account to run this program.
  3. It fails

Expected behavior
It should throw a prompt asking to enter the TOTP

Screenshots
image

Desktop (please complete the following information):

  • OS: elementaryOS

Additional context
NA

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.