Scrape.js
is an easy to use web scraping library for Node.js.
const data = await scrape("https://example.com");
// { url, html }
Features
- Fast
- Scrape nearly any website
- Headless JavaScript scraping
- Auto proxy rotation
- ...it just works
- MIT License
Install Scrape.js
from NPM:
npm install @themaximalist/scrape.js
Scrape.js
uses Zen Rows for proxy rotation. To use it acquire a Zen Rows API key and setup the environment variable.
ZENROWS_API_KEY=abcxyz123
Scrape.js
can be used without proxies, but is less effective.
Using Scrape.js
is as simple as calling a function with a website URL.
const scrape = require("@themaximalist/scrape.js");
await scrape("http://example.com"); // { url, html }
You can specify additional options to scrape()
for more control:
const data = await scrape("https://example.com", {
headless: true,
proxy: true
});
// { url, html }
The Scrape.js
API is a simple function you call with your URL, with an optional config object.
await scrape(
url, // URL to scrape
{
headless: true, // Use JavaScript headless scraping
proxy: true, // Use proxy rotation
method: "GET", // HTTP Request method
timeout: 3000, // Scrape timeout in ms
userAgent: "Mozilla/5.0...", // User Agent
}
);
URL (required)
url
<string>
: URL to scrape
Options
headless
<bool>
: Enable JavaScript. Default istrue
.proxy
<bool>
: Use proxy with request. Default istrue
.method
<string>
: HTTP request method, usuallyGET
orPOST
. Default isGET
.timeout
<int>
: Max request time in ms. Default is3500
.userAgent
<string>
: User agent for request. Default isMozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
.
Response
Scrape.js
returns an object
containing the final url
and html
content.
const { url, html } = await scrape("https://example.com");
console.log(url); // https://example.com/
console.log(html); // <html...
The Scrape.js
API is a simple and reliable way to scrape the HTML from any website.
Scrape.js
uses the debug
npm module with the scrape.js
namespace.
View debug logs by setting the DEBUG
environment variable.
> DEBUG=scrape.js*
> node src/get_website_html.js
# debug logs
View tests to examples on how to use Scrape.js
.
Scrape.js
is currently used in the following projects:
- News Score โ score the news, score the news, rewrite the headlines
MIT
Created by The Maximalist, see our open-source projects.