Coder Social home page Coder Social logo

beenotung / scan-link Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 23 KB

Scan given website recursively and report 404 links

Home Page: https://www.npmjs.com/package/scan-link

License: BSD 2-Clause "Simplified" License

JavaScript 1.22% TypeScript 98.78%
404-errors broken-links cli csv-report http-status link-checker link-validator npx seo-tools url-scanner

scan-link's Introduction

scan-link

Scan given website recursively and report 404 links

npm Package Version

Features

  • Start scanning from a specified entry URL
  • Follow links within specified origins
  • Report links that lead to 404 pages
  • Export 404 error report as a CSV file

Remark

The links and page status code are stored in the db.sqlite3 file of the current working directory. You may run mkdir and cd to a specific directory to avoid storing it in the home directory.

Installation (optional)

You can install scan-link for version control, or execute it via npx without installation.

To install scan-link, use npm:

npm install scan-link

You may install it as dev dependency or global dependency based on your preference.

Usage

You can use scan-link from the command line via npx. The configuration can be provided via environment variables or interactively during execution.

Usage with dev/global installation:

npx scan-link [entryUrl]

Usage without installation:

npx -y scan-link [entryUrl]

The entryUrl can be specified in argument, loaded from environment variable, or answered in the interactive prompt.

Environment Variables

  • SITE_URL: The entry URL for the scan
  • ORIGINS: A comma-separated list of origins to limit the scan
  • REPORT_404_CSV_FILE: Path of the CSV file where the 404 error report will be saved

Example content of .env file:

SITE_DIR=https://example.com
ORIGINS=https://example.com,https://sub.example.com
REPORT_404_CSV_FILE=report.csv

Interactive Usage

If environment variables are not set, scan-link will prompt you for the necessary information.

npx scan-link

You will be prompted to setup above variables.

Example Interactive Session

$ npx -y scan-link
entryUrl: http://localhost:8200/

Please specified the origins of links to follow.
Multiple origins can be delimited by comma (",").
origins (default: "http://localhost:8200"):
origins: [ 'http://localhost:8200' ]

path of CSV file to be saved (default "404.csv"): report.csv
scanned: 12 | pending: 85 | scanning: http://localhost:8200/about
...
scanned: 119 pages
{
  '404 link count': 1447,
  'total link count': 5036,
  'page count with 404 link': 11,
  'total page count': 119
}
exported 404 pages to file: report.csv

API

For advanced usage, you can import and use the scanAndFollow() functions programmatically.

export function scanAndFollow(options: {
  /** @example 'http://localhost:8200/' */
  entryUrl: string

  /** @default same as entryUrl */
  origins?: string[]

  /** @description report stats on 404 pages and links */
  report_404_stats?: boolean

  /** @description specified filename to report 404 links. Skip reporting if not specified. */
  export_404_csv_file?: string

  /**
   * @description auto close browser after all scanning
   * @default true
   */
  close_browser?: boolean
}): Promise<void>

/** @description called by `scanAndFollow()` if `options.report_404_stats` is true */
export function get404Report(options: { origin: string }): {
  '404 link count': number
  'total link count': number
  'page count with 404 link': number
  'total page count': number
}

/** @description called by `scanAndFollow()` if `options.export_404_csv_file` is specified */
export function export404Pages(options: {
  csv_file: string
  origin: string
}): void

/** @description close the lazy loaded browser instance if it's launched */
export function closeBrowser(): Promise<void>

License

This project is licensed with BSD-2-Clause

This is free, libre, and open-source software. It comes down to four essential freedoms [ref]:

  • The freedom to run the program as you wish, for any purpose
  • The freedom to study how the program works, and change it so it does your computing as you wish
  • The freedom to redistribute copies so you can help others
  • The freedom to distribute copies of your modified versions to others

scan-link's People

Contributors

beenotung avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.