Coder Social home page Coder Social logo

linkedin-scraper's People

Contributors

abhisheksingh23 avatar alexbelov avatar anisharamnani avatar aupond avatar cernyjakub avatar damianmarti avatar devadigayatish avatar francesc avatar hilben avatar jaym3s avatar jgrevich avatar kimlima avatar linouk23 avatar msjonker avatar omertu avatar prabhpreet avatar prathamesh-sonpatki avatar sagarjunnarkar avatar stifitman avatar vpoola88 avatar yatish27 avatar yatish27coupa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linkedin-scraper's Issues

languages && certifications empty response

Hi!

I just tested the library and is working great with a few minor glitches: json response returns empty values for languages and certifications (I tested on my account and i have both completed). In your code i saw this:

def languages
@languages ||= @page.search(".background-languages #languages ol li").map do |item|
language = item.at("h4").text rescue nil
proficiency = item.at("div.languages-proficiency").text.gsub(/\s+|\n/, " ").strip rescue nil
{ :language => language, :proficiency => proficiency }
end
end

def certifications
  @certifications ||= @page.search("background-certifications").map do |item|
    name       = item.at("h4").text.gsub(/\s+|\n/, " ").strip rescue nil
    authority  = item.at("h5").text.gsub(/\s+|\n/, " ").strip rescue nil
    license    = item.at(".specifics/.licence-number").text.gsub(/\s+|\n/, " ").strip rescue nil
    start_date = item.at(".certification-date").text.gsub(/\s+|\n/, " ").strip rescue nil

    { :name => name, :authority => authority, :license => license, :start_date => start_date }
  end
end

On the public profile there are no .background-languages and background-certifications classes. I use the following code in php with simpledom library and is working:

$education = $html->find('#education > li.school');
foreach ($education as $school) {
$school_name = $school->find('.item-title', 0)->innertext;
$title = $school->find('.item-subtitle', 0)->innertext;
$start_date = !empty($school->find('.date-range > time', 0)) ? date('Y', strtotime($school->find('.date-range > time', 0)->innertext)) : '';
$end_date = !empty($school->find('.date-range > time', 1)) ? date('Y', strtotime($school->find('.date-range > time', 1)->innertext)) : '';

        if (!empty($school_name) && !empty($title)) {
            $candidate_education[] =  $start_date . ' - ' . $end_date . ' ' . $school_name . ' - ' . $title . ' <br />';
        }
    }

    $certifications = $html->find('#certifications > li.certification');

    foreach ($certifications as $certification) {
        $name = $certification->find('h4.item-title > a', 0);

        if (!empty($name)) {
            $candidate_certifications[] = [
                'name' => $name->innertext,
                'url' => $name->href
            ];
        }
    }

Maybe this helps you.

Lib paths?

Hello there,
let me first thank you very much for a great piece of software!

I was having little trouble installing this both on Mac OS X and Ubuntu, error being pretty much the same.

kaisar@kaisar-MS-7641:~$ linkedin-scraper http://www.linkedin.com/in/jeffweiner08

/home/kaisar/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/site_ruby/2.0.0/rubygems/core_ext/kernel_require.rb:55:in require': cannot load such file -- ./lib/linkedin-scraper (LoadError) from /home/kaisar/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/site_ruby/2.0.0/rubygems/core_ext/kernel_require.rb:55:inrequire'
from /home/kaisar/.rvm/gems/ruby-2.0.0-p247/gems/linkedin-scraper-0.0.11/bin/linkedin-scraper:3:in <top (required)>' from /home/kaisar/.rvm/gems/ruby-2.0.0-p247/bin/linkedin-scraper:23:inload'
from /home/kaisar/.rvm/gems/ruby-2.0.0-p247/bin/linkedin-scraper:23:in <main>' from /home/kaisar/.rvm/gems/ruby-2.0.0-p247/bin/ruby_executable_hooks:15:ineval'
from /home/kaisar/.rvm/gems/ruby-2.0.0-p247/bin/ruby_executable_hooks:15:in `

'

Could you give me a pointer as to how I may solve this?

Thank you very much

Usage example does not appear to be working.

When walking through the usage instructions contained within the README, nearly all of the suggested methods were returning nil or an empty array.

I was having issues scraping LinkedIn using Nokogiri and Ruby's OpenURI module before trying out this gem. Perhaps LinkedIn is doing something to interfere with scraping attempts?

Tested using the following:

linkedin-scraper (0.1.2)
ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin12.0]

Uninitialized Constant LinkedIn

When running the following in irb:

require 'linkedin-scraper'
profile = LinkedIn::Profile.get_profile("http://www.linkedin.com/in/robertwdempsey")

I get the following error:

NameError: uninitialized constant LinkedIn
from (irb):2
from /Users/roger/.rvm/rubies/ruby-1.9.3-p327/bin/irb:18:in `

'

I'm running ruby version 1.9.3p327 and have all the other required gems installed.

Any ideas?

Thanks!

  • Robert

NameError: uninitialized constant Linkedin in Rails

Hi!

Today I've tried to use this gem with Rails 4.2.3 and catched NameError: uninitialized constant Linkedin
The same one was described previously in issue #63
Then I've tried to change linkedin_scraper back to linkedin-scraper everywhere in project and it start working. I'm not pretty sure, but seems like gem name (from gemspec) different than main file name in lib directory make it broken when using with Rails.

You can try my PR #84

Unable to scrape locally hosted profileExample.html file

Hello All,

I had a project working a couple months ago, returned to it this weekend and ran into an issue. Hopefully someone can point me in the right direction, i'm at a loss. I did a fresh install of linkedin-scraper with the latest version.

In the past, i was able to save the source code from a profile, host it locally, then run "linkedin-scraper http:localhost:9999/jeffweiner08_local.html". This worked perfectly.

Now when I do this, it comes up with empty arrays (see below). When i point it back to the actual public profile (http://www.linkedin.com/in/jeffweiner08), everything works as expected.

Any ideas what i'm doing wrong? I'm currently on mac OSX, in the past i was running RHEL 7.

Example Result when using a local file:

########:~ user$ linkedin-scraper http://localhost:9999/jeffweiner08_local.html
{
  "name": "Jeff Weiner",
  "first_name": "Jeff",
  "last_name": "Weiner",
  "title": "CEO at LinkedIn",
  "location": "San Francisco Bay Area",
  "number_of_connections": "3",
  "country": "San Francisco Bay Area",
  "industry": null,
  "summary": null,
  "picture": "https://media.licdn.com/mpr/mpr/shrinknp_400_400/p/6/005/07c/31e/153cdd3.jpg",
  "projects": [

  ],
  "linkedin_url": "http://localhost:9999/Jeff_Weiner_local.html",
  "education": [

  ],
  "groups": [

  ],
  "websites": [

  ],
  "languages": [

  ],
  "skills": [

  ],
  "certifications": [

  ],
  "organizations": [

  ],
  "past_companies": [

  ],
  "current_companies": [

  ],
  "recommended_visitors": [

  ]
}

Legal

How do you cope with linkedin user agreement part 8.2?

8.2. Don'ts. You agree that you will not:
Scrape or copy profiles and information of others through any means (including crawlers, browser plugins and add-ons, and any other technology or manual work);

Having trouble calling skills

Great gem. Thank you.

I'm having difficult calling :skills and I was wondering if I was doing something wrong. From the console, the skills return as Mechanize page links, but when I return a JSON object the skills are nowhere to be found.

Screen Shot 2013-03-12 at 10 39 49 AM

Here's the rabl view:

object @Profile => :profile
attributes :first_name, :last_name, :title, :location, :country, :industry, :current_companies, :past_companies, :websites, :groups, :skills, :education

And here's the app that return JSON:

require 'sinatra'
require 'rabl'
require 'linkedin-scraper'
require 'active_support/core_ext'
require 'active_support/inflector'
require 'builder'

Rabl.register!

get '/' do
"Hello Index"
end

get '/profile' do
@Profile = Linkedin::Profile.get_profile(params[:url])
render :rabl, :profile, format: 'json'
end

Still unable to use proxy IP

Anyway we can get a proxy variable dropped in to this script please , I know how to do it php , it is so easy when using CURL, I mean it must be simper in ruby to do this!

need to type in login info somewhere

The error message I got is here:

C:/Ruby193/lib/ruby/gems/1.9.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent/ssl_reuse.rb:70:in connect': SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError) from C:/Ruby193/lib/ruby/gems/1.9.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent/ssl_reuse.rb:70:inblock in connect'
from C:/Ruby193/lib/ruby/1.9.1/timeout.rb:55:in timeout' from C:/Ruby193/lib/ruby/1.9.1/timeout.rb:100:intimeout'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent/ssl_reuse.rb:70:in connect' from C:/Ruby193/lib/ruby/1.9.1/net/http.rb:756:indo_start'
from C:/Ruby193/lib/ruby/1.9.1/net/http.rb:751:in start' from C:/Ruby193/lib/ruby/gems/1.9.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:700:instart'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:631:in connection_for' from C:/Ruby193/lib/ruby/gems/1.9.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:994:inrequest'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:259:in fetch' from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:976:inresponse_redirect'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:300:in fetch' from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize.rb:440:inget'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/linkedin-scraper-0.1.3/lib/linkedin-scraper/profile.rb:20:in initialize' from C:/Ruby193/lib/ruby/gems/1.9.1/gems/linkedin-scraper-0.1.3/bin/linkedin-scraper:4:innew'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/linkedin-scraper-0.1.3/bin/linkedin-scraper:4:in <top (required)>' from C:/Ruby193/bin/linkedin-scraper:23:inload'
from C:/Ruby193/bin/linkedin-scraper:23:in `

'

Are there any configurations I have not done?

Return Status code from page

As scraping linkedin may fail for various reasons, you should probably return the page status. Something like this, but you may have a better idea.

@status = http_client.head( url ).code.to_i

999 => for -- https://www.linkedin.com/in/some-profile

Hey, this issue is still not resolved: impossible to make even the first request, I fall directly on the 999 error whereas the

curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I --url

works perfectly well.

LoadError

I installed linkedin-scraper and tried to include it in my script . I get the following load error .

LoadError: cannot load such file -- linkedin-scraper
from /home/abhi/.rbenv/versions/2.1.5/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in require' from /home/abhi/.rbenv/versions/2.1.5/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:inrequire'
from (irb):3
from /home/abhi/.rbenv/versions/2.1.5/bin/irb:11:in `

'

Groups gives an empty set back

Hi,

it seems like the groups section doesn't work anymore, it always return an empty set.
Tried corrected code to:

  @groups ||= @page.search('#groups .groups-name').map do |item|
    name = item.text.gsub(/\s+|\n/, ' ').strip if item.at('strong')

    { :name => name }
  end

Not working so far ..
thanks

Static tests

Need to test using fixtures for profile as well the company pages.

Bad lib path

I installed linkedin-scraper on Lubuntu 13.10 by "gem install linkedin-scraper".
App "linkedin-scraper http://www.linkedin.com/in/jeffweiner08" crash with this error:

$ linkedin-scraper http://www.linkedin.com/in/jeffweiner08
/usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- ./lib/linkedin-scraper (LoadError)
    from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /var/lib/gems/1.9.1/gems/linkedin-scraper-0.0.11/bin/linkedin-scraper:3:in `<top (required)>'
    from /usr/local/bin/linkedin-scraper:23:in `load'
    from /usr/local/bin/linkedin-scraper:23:in `<main>'

Error is in file "/var/lib/gems/1.9.1/gems/linkedin-scraper-0.0.11/bin/linkedin-scraper". Diff (one line fix):

--- /var/lib/gems/1.9.1/gems/linkedin-scraper-0.0.11/bin/linkedin-scraper_OLD   2013-10-20 19:31:04.675030346 +0200
+++ /var/lib/gems/1.9.1/gems/linkedin-scraper-0.0.11/bin/linkedin-scraper   2013-10-20 19:31:25.787013101 +0200
@@ -1,5 +1,5 @@
 #!/usr/bin/env ruby

-require './lib/linkedin-scraper'
+require '/var/lib/gems/1.9.1/gems/linkedin-scraper-0.0.11/lib/linkedin-scraper'
 profile = Linkedin::Profile.new(ARGV[0])
 puts profile.to_json```

after that output is ok:

pokus@pokus:$ linkedin-scraper http://www.linkedin.com/in/jeffweiner08
{"name":"Jeff Weiner","first_name":"Jeff","last_name":"Weiner","title":"CEO at LinkedIn","location":"Mounta:

...

Scraper cannot scrape names somtimes

Sometimes when I scrape my own profile with url="https://www.linkedin.com/in/johnwu93", I am able to get my name by doing Linkedin::Profile.get_profile(url).name

However, sometimes, this will not work and Linkedin::Profile.get_profile(url) will actually return nil. I looked at the issues and it seems that the console will output a connection error and then the function will return nil. However, for my case, the console does not putout anything. Is there a way to fix this?

A newbie has an Issue - Error: connection refused: www.linkedin.com:443

Hello everyone,
thanks for sharing this great gem!

I´m in the following situation:
Soon, I´ll start to write my master thesis. To create the necessary data base I want use the information of a few thousand public linkedin profiles.
I have never programed at all and currently try to find a way to create the data base.
In this context I found this gem. While using this gem, I came to the issue, that I get the described issue.
As soon as I use the following, I get the error below:
profile = Linkedin::Profile.get_profile("https://www.linkedin.com/in/xxx", {:proxy_ip=>'127.0.0.1',:proxy_port=>'3128', :username=>"xxx", :password=>'xxx'})

output: connection refused: www.linkedin.com:443
=> nil

Can you help me on that? What does it mean? How can I solve the issue?

I wrote before, I am a totally newbie and glad about any help I can get.
Do you have any general recommondation for my situation? Is this gem suitable for my issue?

Thanks alot

some public profiles have different format and do not parse

Some profiles are not able to be parsed, as they have slightly different CSS. For example, see http://www.linkedin.com/pub/carl-reichenbach/0/717/271 (random person). ".full-name" has no nested given-name or family-name elements, just a single text value:

<span class="n fn"><span class="full-name">Carl Reichenbach</span><span></span></span>

The profile image does not have id "#profile-picture", but rather, class ".profile-picture".

<div class="profile-picture"><a href="https://www.linkedin.com/reg/join-pprofile?_ed=0_01QnG4t0pWUEtxKrTXmsjRWJlD5Jt4ZMf87SacJnJbVIQMerUlnPlhlEJudlwN_Ln2SYzqPPSVnMDbi0r_G5A1&amp;trk=pprof-0-ts-view_full-0"> <img src="http://m.c.lnkd.licdn.com/media/p/4/000/16f/3c0/08a6c0e.jpg" alt="Carl Reichenbach"></a><span></span></div>

I don't know if this is a new format they're transitioning to or from, but if you could please update the code to look for these patterns in addition the existing ones (and any other differences found on this page, which I trust is representative), then we could successfully scrape them.

NameError: uninitialized constant Linkedin

Hello Yatish,

Thanks a lot for creating this great gem. And even more for taking the time to maintain it...

I used it in a previous app, and it worked fine. In this new one (rails), i'm facing a strange issue that I don't understand. When trying to get a profile's data (rails c / s), I get the following error:

2.2.0 :001 > u = "https://www.linkedin.com/in/nicolassarkozy"
 => "https://www.linkedin.com/in/nicolassarkozy" 
2.2.0 :002 > profile = Linkedin::Profile.get_profile u

NameError: uninitialized constant Linkedin
    from (irb):2
    from /Users/me/.rvm/gems/ruby-2.2.0/gems/railties-4.2.5/lib/rails/commands/console.rb:110:in `start'
    from /Users/me/.rvm/gems/ruby-2.2.0/gems/railties-4.2.5/lib/rails/commands/console.rb:9:in `start'
    from /Users/me/.rvm/gems/ruby-2.2.0/gems/railties-4.2.5/lib/rails/commands/commands_tasks.rb:68:in `console'
    from /Users/me/.rvm/gems/ruby-2.2.0/gems/railties-4.2.5/lib/rails/commands/commands_tasks.rb:39:in `run_command!'
    from /Users/me/.rvm/gems/ruby-2.2.0/gems/railties-4.2.5/lib/rails/commands.rb:17:in `<top (required)>'
    from /Users/me/code/lnkdn-xtrct/bin/rails:8:in `<top (required)>'
    from /Users/me/.rvm/rubies/ruby-2.2.0/lib/ruby/site_ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in `require'
    from /Users/me/.rvm/rubies/ruby-2.2.0/lib/ruby/site_ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in `require'
    from -e:1:in `<main>'

Would you have any idea how to solve this ?

Thanks a lot,
Brice

Proxy on command line tool

Some comments in other issues indicate that a second command line arg can be used to allow usage of a proxy with the command line tool. That doesn't seem to be the case - looking at linkedin-scraper's source, it doesn't seem to call the gem with {:proxy_ip=>'127.0.0.1',:proxy_port=>'3128'} as the second arg.

Is this a feature that is planned? I'm not a ruby dev but this seems like it's easy enough to add, any interest in a PR?

Install hangs at Building Native Extensions in OSX Mavericks

Hey guys:

Forgive me, I'm somewhat new to Ruby and creating gems. When I attempt to install scraper on OSX Mavericks, it seems to hang indefinitely at this stage. I've tried Verbose mode, but that only adds the following information:

Installing gem unf_ext-0.0.6
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/.document
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/.gitignore
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/Gemfile
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/LICENSE.txt
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/README.md
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/Rakefile
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/ext/unf_ext/extconf.rb
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/ext/unf_ext/unf.cc
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/ext/unf_ext/unf/normalizer.hh
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/ext/unf_ext/unf/table.hh
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/ext/unf_ext/unf/trie/char_stream.hh
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/ext/unf_ext/unf/trie/node.hh
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/ext/unf_ext/unf/trie/searcher.hh
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/ext/unf_ext/unf/util.hh
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/lib/unf_ext.rb
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/lib/unf_ext/version.rb
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/test/helper.rb
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/test/normalization-test.txt
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/test/test_unf_ext.rb
/Library/Ruby/Gems/2.0.0/gems/unf_ext-0.0.6/unf_ext.gemspec
Building native extensions. This could take a while...
/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/bin/ruby extconf.rb
checking for main() in -lstdc++...

I've updated the command line tools (based on this recommendation) http://stackoverflow.com/questions/23429145/error-failed-to-build-gem-native-extension-ruby-extconf-rb-mac-osx.

That doesn't seem to help. I can't seem to find the exact reason for the hang in Stack Overflow (or anywhere else). Any suggestions on next steps or how to fix this monster would be greatly appreciated.

SocketError: getaddrinfo: Name or service not known

Hi , when I use function current_positions and past_positions I always get this error :SocketError: getaddrinfo: Name or service not known
all other function on that profile work fine.
I do realize that it is making another call to get the page and it fails.
How do I fix that ?

this is The trace for the error :

from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/socksify-1.5.0/lib/socksify.rb:172:in initialize' from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/socksify-1.5.0/lib/socksify.rb:172:ininitialize'
from /home/chinmay/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:879:in open' from /home/chinmay/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:879:inblock in connect'
from /home/chinmay/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/timeout.rb:76:in timeout' from /home/chinmay/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:878:inconnect'
from /home/chinmay/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:863:in do_start' from /home/chinmay/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:858:instart'
from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:700:in start' from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:965:inreset'
from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:628:in connection_for' from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:994:inrequest'
from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:257:in fetch' from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/mechanize-2.7.2/lib/mechanize.rb:432:inget'
from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/linkedin-scraper-0.1.1/lib/linkedin-scraper/profile.rb:180:in get_company_details' from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/linkedin-scraper-0.1.1/lib/linkedin-scraper/profile.rb:166:inblock in get_companies'
from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/nokogiri-1.6.1/lib/nokogiri/xml/node_set.rb:237:in block in each' from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/nokogiri-1.6.1/lib/nokogiri/xml/node_set.rb:236:inupto'
from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/nokogiri-1.6.1/lib/nokogiri/xml/node_set.rb:236:in each' from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/linkedin-scraper-0.1.1/lib/linkedin-scraper/profile.rb:151:inget_companies'
from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/linkedin-scraper-0.1.1/lib/linkedin-scraper/profile.rb:69:in current_companies' from (irb):7 from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/railties-4.0.4/lib/rails/commands/console.rb:90:instart'
from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/railties-4.0.4/lib/rails/commands/console.rb:9:in start' from /home/chinmay/.rvm/gems/ruby-2.1.1/gems/railties-4.0.4/lib/rails/commands.rb:62:in<top (required)>'
from bin/rails:4:in `require'

Thanks

Does linked scraper follow links?

I'm using linked scraper in a web configuration. I ran after 4 'scrapes' into 999 (blocked by linkedIn). I made a workaround using WGET first, storing the profile and serve this on my own server. After this I http://localhost/profile.html with linkedin-scraper. This works like a charm... however after a number of 'scrapes' I run into a 999 blocked again?? I'm not a ruby programmer and have a hard time following the code.
My question is simple, is linkedin-scraper following links in the profile, and therefore still accessing linkedin.com? And if so.. does it really need to? I noticed that opening the stored profile on my server as a webpage loads the linkedin page again (there is a redirect in the file).

Hope you can shed some light on this.

Getting it to Work for a Company Page

This has been working really great for me so far for individual public profiles, thank you. I'm really interested, though, in scraping a company page. Here's an example of one i'm interested in:

https://www.linkedin.com/company/nestle-s-a-?trk=affco

These pages have a lot of unique data and we have a lot of potential accounts that we'd like to know the industry for, and/or number of employees. I understand that searching by individual yields company data. But we have a lot of company names for which we don't yet have contacts.

I'm interested in creating a version that could scrape company pages. I would be happy to create a pull and build it myself, and I will attempt to do so. Unfortunately I'm fairly new to programming and I'm very new to Ruby. So if you would be willing to give me a head start by telling me what files I should be modifying, or just help me build it, or (if you really feel like it) build it yourself, I would really appreciate it. Thanks so much!

Scraper not working

The scraper fails on profile picture:

/usr/local/share/gems/gems/linkedin-scraper-0.1.7/lib/linkedin_scraper/profile.rb:75:in picture': undefined methodvalue' for nil:NilClass (NoMethodError)
from /usr/local/share/gems/gems/linkedin-scraper-0.1.7/lib/linkedin_scraper/profile.rb:177:in block in to_json' from /usr/local/share/gems/gems/linkedin-scraper-0.1.7/lib/linkedin_scraper/profile.rb:177:ineach'
from /usr/local/share/gems/gems/linkedin-scraper-0.1.7/lib/linkedin_scraper/profile.rb:177:in reduce' from /usr/local/share/gems/gems/linkedin-scraper-0.1.7/lib/linkedin_scraper/profile.rb:177:into_json'
from /usr/local/share/gems/gems/linkedin-scraper-0.1.7/bin/linkedin-scraper:5:in <top (required)>' from /usr/local/bin/linkedin-scraper:23:inload'
from /usr/local/bin/linkedin-scraper:23:in `

'

Maybe it is better to catch the exception and return a partial result

999 response error while running from racksapce

HI I using the scraper form my Rackspece machine I receive a 999 response.
Details of the error below:
linkedin-scraper http://www.linkedin.com/in/jeffweiner08
/usr/local/rvm/gems/ruby-2.0.0-p247/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:933:in response_read': 999 => for -- http://www.linkedin.com/in/jeffweiner08 (Mechanize::ResponseCodeError) from /usr/local/rvm/gems/ruby-2.0.0-p247/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:262:inblock in fetch'
from /usr/local/rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http.rb:1413:in block (2 levels) in transport_request' from /usr/local/rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http/response.rb:162:inreading_body'
from /usr/local/rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http.rb:1412:in block in transport_request' from /usr/local/rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http.rb:1403:incatch'
from /usr/local/rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http.rb:1403:in transport_request' from /usr/local/rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http.rb:1376:inrequest'
from /usr/local/rvm/gems/ruby-2.0.0-p247/gems/net-http-persistent-2.9/lib/net/http/persistent.rb:986:in request' from /usr/local/rvm/gems/ruby-2.0.0-p247/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:259:infetch'
from /usr/local/rvm/gems/ruby-2.0.0-p247/gems/mechanize-2.7.3/lib/mechanize.rb:440:in get' from /usr/local/rvm/gems/ruby-2.0.0-p247/gems/linkedin-scraper-0.1.3/lib/linkedin-scraper/profile.rb:20:ininitialize'
from /usr/local/rvm/gems/ruby-2.0.0-p247/gems/linkedin-scraper-0.1.3/bin/linkedin-scraper:4:in new' from /usr/local/rvm/gems/ruby-2.0.0-p247/gems/linkedin-scraper-0.1.3/bin/linkedin-scraper:4:in<top (required)>'
from /usr/local/rvm/gems/ruby-2.0.0-p247/bin/linkedin-scraper:23:in load' from /usr/local/rvm/gems/ruby-2.0.0-p247/bin/linkedin-scraper:23:in

'

Gem::Ext::BuildError: ERROR: Failed to build gem native extension.

Great gem, exactly what I have been looking for.
I get the following error when trying to install the gem through the command
gem install linkedin-scraper -v 1.0.4, gem install linkedin-scraper or in my gemfile with gem 'linkedin-scraper'
Seems like it has a problem on the dependency of unf_ext

`Installing unf_ext 0.0.7.2 with native extensions

Gem::Ext::BuildError: ERROR: Failed to build gem native extension.

/home/jan/.rbenv/versions/2.2.3/bin/ruby -r ./siteconf20160324-3908-iq7116.rb extconf.rb
checking for main() in -lstdc++... no
creating Makefile

make "DESTDIR=" clean

make "DESTDIR="
compiling unf.cc
make: g++: Command not found
make: *** [unf.o] Error 127

make failed, exit code 2

Gem files will remain installed in /home/jan/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/unf_ext-0.0.7.2 for inspection.
Results logged to /home/jan/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/extensions/x86-linux/2.2.0-static/unf_ext-0.0.7.2/gem_make.out
An error occurred while installing unf_ext (0.0.7.2), and Bundler cannot continue.
Make sure that gem install unf_ext -v '0.0.7.2' succeeds before bundling.`

Any idea what the pronlem could be?
Thank you

[RFC] Need for connections information

It would be great if we can use this scraper in a way that you can also get the information about the connections that a person has.... This would help build a better profile about a "scraped profile". This would require logging into the Linkedin directly from the code or based upon a cookie that is provided to the scraper.

Class name clash with 'browser' gem

The new dependency on random_user_agent gem has added a nasty classname clash with widely used browser gem. Both gems define a Browser class without any namespacing.

I am afraid I am not the only one who was hit by this - it seems to me the browser gem is widely used.

Scraper is conflating start and end date at companies

Hi, I love the gem, but I am having some issues parsing the dates in Linkedin::Profile#past_companies

It looks like the hashes returned in #current_companies give start_date and end_date keys, while #past_companies only have start_date. Example output:
screen shot 2016-02-14 at 12 13 51 pm
Actual LinkedIn:
screen shot 2016-02-14 at 12 21 50 pm

Thanks

[RFC] direct database store

It would be great if the information can directly be send to a table in a database. This would help the adoption of the code I think. People whould be able to use the scraped data directly from the database as soon as it is commited.

API not working any more

hi,

I had the code installed on my own laptop a month ago, and everything worked fine.
Now, i just created a new instance on EC2, and installed/pulled the code, but getting:

ubuntu@ip-172-31-42-56:~$ linkedin-scraper http://www.linkedin.com/in/jeffweiner08
/home/ubuntu/.rvm/gems/ruby-2.2.1/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:933:in response_read': 999 => for -- http://www.linkedin.com/in/jeffweiner08 (Mechanize::ResponseCodeError) from /home/ubuntu/.rvm/gems/ruby-2.2.1/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:262:inblock in fetch'

This is the first call i am ever making, so it cannot be already blocked from Linkedin !! Do you know what could be happening ?
Thanks!
Matt

doesn't scrape company details anymore

The company details are not returned anymore for me. Same for you ?

profile = Linkedin::Profile.get_profile("http://www.linkedin.com/in/jeffweiner08")
website = profile.current_companies.first[:website]
# nil

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.