Comments (17)
Hum... That's unfortunate but HackerNews just updated its header to include Content-Security-Policy
thus forbidding arbitrary script execution. You'll have to use a browser extension bypassing those headers and I should probably find another site as example in my docs :)
from artoo.
No worries. I figured as much. Thanks for the response. Where can I ask for help with using artoo that's unrelated to this issue?
from artoo.
Well here seems to be a good place to do so :)
from artoo.
from artoo.
To select items by tag + class, here is what you need to write in CSS:
tagname.class
So, using artoo, you'd probably do something of the kind:
artoo.scrape('tagname.class', ...);
from artoo.
from artoo.
OK, so on this page, I'm running
I'm running artoo.scrape('li.card-btn square ', { text: {sel: 'span', method: 'text'}, url: {sel: 'a', attr: 'href'} });
and I'm getting an empty array. I isolated the element that's on the page and pasted it on this pastebin service.
https://dpaste.de/Oz5n
what I wan't to pull out from the page results that look like this
{
name: 'Yelena M Stepanenko',
address: 'Spc 157'
}
What am I doing wrong? I also realize that the selector is wrong. I haven't gotten to that part yet I have no CSS background. I'm more of a desktop programmer, so It's a little slower for me to figure this out. Thanks for your patience.
from artoo.
selector should be li.card-btn.square
since you attempt to match two classes.
from artoo.
from artoo.
artoo.scrape('li.card-btn.square', { text: {sel: 'span', method: 'text'}, url: {sel: 'a', attr: 'href'} });
from artoo.
from artoo.
Yes. You have several classes listed in your example. You should probably do a quick html/css tutorial before scraping. It will definitely help you achieve your goals. Scraping is basically html/css retro-engineering.
from artoo.
from artoo.
Just going to close this. Researching Jquery and CSS taught me a lot about selectors!
from artoo.
I should probably find another site as example in my docs
Please do @Yomguithereal -- I need a working example as the sprint board to jump further. thx.
from artoo.
How about echojs.com?
from artoo.
Yeah, super.
While you are at it changing the scrapping code, please throw in some comment as well, as you helped me before:
artoo.ajaxSpider(
// This function is an iterator.
// Its aim is to return the next url to fecth or false if you want to stop
//-- 'i' is the index in the iteration of urls
//-- '$data' is the jQuery-parsed data of the last fetched url
function(i, $data) {
// nextUrl is a function that take a jQuery selector and returns
// the next url to fetch
// If !i then, we are only starting the spider meaning that the next url
// is available on the current page rather than the last fetched one.
return nextUrl(!i ? artoo.$(document) : $data);
},
// Spider's settings
{
// We want to fetch a maximum of two pages
limit: 2,
// We are going to scrape the pages using the scrape definition written above in the doc example
scrape: scraper,
// We want to concatenate results so we have [title1, title2, title3, title4]
// rather than [[title1, title2], [title3, title4]]
concat: true,
// Final callback fired when the spider retrieved everything
//-- 'data' is the scraped data
done: function(data) {
artoo.log.debug('Finished retrieving data. Downloading...');
artoo.savePrettyJson(
frontpage.concat(data),
{filename: 'hacker_news.json'}
);
}
}
);
thx
from artoo.
Related Issues (20)
- reLoading Artoo and page ready
- Firefox Extension HOT 1
- Default yeoman project scaffolding for UI, throws error & on fix default bookmarklet fails HOT 5
- Artoo Chrome Extension Breaks Facebook HOT 2
- npm audit security vulnerability in lodash (dependency of cheerio) HOT 2
- artoo.ajaxSpider on dynamic data HOT 9
- Getting Chrome extension to work on linkedin.com HOT 4
- Subselecting within a configObject using Artoo
- Roadmap and compatibility with chrome extension HOT 3
- Help with ajaxSpider debugging HOT 3
- how to take the sniffer off / how to use artoo.ajaxSniffer.off HOT 4
- Element.createShadowRoot deprecated HOT 9
- Mistake on Artoo's website: "useless" => "useful" HOT 1
- Arrow Function Shorthand Syntax HOT 1
- artoo.saveCsv function causing TypeError exception
- Import to ReactJs HOT 4
- How to list existing ajaxSniffers? HOT 3
- Crawling quotestoscrape
- Lack of thanks HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from artoo.