Coder Social home page Coder Social logo

june-node's Introduction

june-node

A simple file scraper with Node.js

Getting Started

These instructions will get you a copy of the project up and running on your local machine.

Prerequisites

  • Node.js (tested with v12.18.0)

Installing

Install the necessary packages in your local directory:

npm i

Usage

Set the webpage url in main.js to the desired page containing files for download:

const WEBPAGE = 'https://...'

Then modify the file types array to contain only the file types you want to download:

const FILE_TYPES = ["mp4", "gif", ... ]

To start the script, run the following command:

npm start

If the webpage to be scraped requires login credentials, specify the login credentials in main.js:

// Set the username and password to automate login if needed
const USERNAME = "";
const PASSWORD = "";

and specify the css selectors for the username input box, the password input box, and the login/submit button:

// Set the html elements to automate login if needed
const USERNAME_ELEMENT = ""; // css selector for input box username field on webpage
const PASSWORD_ELEMENT = ""; // css selector for input box password field on webpage
const LOGIN_ELEMENT = ""; // css selector for login button on webpage

All files found in an anchor, image, or link tag on the given webpage will be downloaded in parallel and written to the current folder with their original file names. If other tags are needed or different attributes should be scraped, add the desired tag to TAG_TYPES and the desired tag/attribute pair to TAG_ATTR_MAP:

// Tag and attribute types
const TAG_TYPES = ["a", "img", "link"]; // html tags to scrape
const TAG_ATTR_MAP = { // attributes to scrape for each tag
    "a": "href",
    "img": "src",
    "link": "href"
};

Debugging

If you encounter an error during downloading, check your connection first, but lower the number of concurrent requests in case throttling is suspected:

// Number of files to download in parallel
const ASYNC_LIMIT = 2; // decrease to 1 if throttling suspected, increase if download is too slow

If the downloads are blocking for too long, increase the limit.

Built With

  • nightmare - Browser automation library
  • jsdom - JavaScript based headless browser
  • async - Asynchronous processing library
  • request - HTTP library

june-node's People

Contributors

rickycordero avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.