Coder Social home page Coder Social logo

grunt-link-checker's Introduction

grunt-link-checker

Run node-simple-crawler to discover broken links on your website.

NPM version Build Status Dependency Status devDependency Status

Getting Started

If you haven't used grunt before, be sure to check out the Getting Started guide, as it explains how to create a gruntfile as well as install and use grunt plugins. Once you're familiar with that process, install this plugin with this command:

npm install grunt-link-checker --save-dev

Then add this line to your project's Gruntfile.js gruntfile:

grunt.loadNpmTasks('grunt-link-checker');

Documentation

grunt-link-checker will by default find any broken internal links on the given site and will also find broken fragment identifiers by using cheerio to ensure that an element exists with the given identifier. You can figure more options that are available via node-simplecrawler.

Minimal Usage

The minimal usage of grunt-link-checker runs with a site specified and an optional options.initialPort:

linkChecker: {
  dev: {
    site: 'localhost',
    options: {
      initialPort: 9001
    }
  }
}

Recommended Usage

In addition to the above config which tests a local version of your site before deployment, you can add an additional target to run post-deployment. This will verify that your assets were deployed correctly and are being resolved correctly after any revisioning or path modifications during deployment:

linkChecker: {
  // Use a large amount of concurrency to speed up check
  options: {
    maxConcurrency: 20
  },
  dev: {
    site: 'localhost',
    options: {
      initialPort: 9001
    }
  },
  postDeploy: {
    site: 'mysite.com'
  }
}

Custom options

checkRedirect

  • Type: Boolean
  • Default: false

Set this to true to check for redirects.

noFragment

  • Type: Boolean
  • Default: false

Set this to true to speed up your test by not verfiying fragment identifiers.

callback

Type: Function

Function that receives the instantiated crawler object so that you can add events or other listeners/config to the crawler.

Here is an example config using the callback option to ignore localhost links which have different ports:

linkChecker: {
  dev: {
    site: 'localhost',
    options: {
      initialPort: 9001,
      callback: function (crawler) {
        crawler.addFetchCondition(function (url) {
          return url.port === '9001';
        });
      }
    }
  }
}

simple-crawler options

Every option specified in the node-simplecrawler is available:

https://github.com/cgiffard/node-simplecrawler#configuring-the-crawler

Changelog

  • 0.2.0 - Updated dependencies.
  • 0.1.0 - Updated dependencies, changed task name to linkChecker.
  • 0.0.6 - Added logging for initially fetched URL and logged status codes for failed fetches.
  • 0.0.5 - Added error reporting if initial site URL fails.
  • 0.0.4 - Added callback option.
  • 0.0.3 - Fixed repo link in package.json and fixed error reporting for a failed initial URL.
  • 0.0.2 - Added noFragment flag.
  • 0.0.1 - Check to make sure # URLs resolve to content with a corresponding ID.
  • 0.0.0 - Initial release.

grunt-link-checker's People

Contributors

adam-lynch avatar chriswren avatar greggman avatar nschonni avatar xhmikosr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

grunt-link-checker's Issues

Anchors throwing 404s

I've got a few anchors on my page, and when links to those anchors are being followed, they're getting marked as 404s. A typical example:

<a href="#fast-service" id="fast-service-link" title="Fast service times">Fast service times</a>

As a link to:

<div class="hiccup-right light-grey how-we-are-here-panel" id="fast-service"></div>

And the error:

Resource not found linked from https://[mydomain]/about-us to https://[mydomain]/about-us#fast-service
Status code: 404

Is this expected behaviour?

grunt-link-checker could benefit from a progress indication

grunt-link-checker is really powerful, but it can be slow as it steps through all the pages in a site. It would be nice if there was some kind of progress indication (pages checked/remaining) or similar to give an idea of progress. Not essential, just nice to have.

Decouple from Grunt

Hey, this looks great! It would be good if there was a plain Node module though. That would make it a lot more accessible. Let's say there would be a new "link-checker" (core) module.

Then this project would have the new "link-checker" module as a dependency, so this would just wrap it for grunt.

Then I could also make a gulp module if that made sense (maybe a Gulp plugin isn't needed; maybe using the plain Node module with gulp on its own would make most sense, I don't know).

Would you be open to that? I had a quick look over the source and it doesn't seem like it would be hard. I'd be up for helping anyway.

CONTRIBUTING.md needs work

It looks like the CONTRIBUTING.md is copied from other projects. I started trying to fix the links but I guess you'd probably just want to re-write the whole thing and you'd know better than me what should go there.

readme.md

When registering the Grunt task, took me a while to figure our that you need to enter 'linkChecker' โ€” perhaps it might be good to include in the readme? Thanks :-)

Rename task to `linkChecker`

This should make things simpler to work with grunt templates.

    connect: {
      server: {
        options: {
          port: '<% linkChecker.options.initialPort %>',
          base: 'test/fixtures'
        }
      }
    }

A major version bump should be made though.

Default config throwing error

Hi. When attempting to run a basic setup on a local server, I'm getting this error:

$ grunt checklinks
Running "link-checker:dev" (link-checker) task
Fatal error: Cannot read property 'cyan' of undefined
'link-checker': {
  dev: {
    site: '0.0.0.0:8080'
  }
}

My setup:
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Any ideas? I guess maybe it's something to do with the colors dependency? This is a fresh install of the package today.

Fatal error: "name" and "value" are required for setHeader().

I'm getting the above error after attempting to set up this plugin. These are what my files look like...

Gruntfile.js

module.exports = function(grunt) {

    grunt.initConfig({
        pkg: grunt.file.readJSON('package.json'),

        'link-checker': { 
            dev: {
                site: 'http://www.website.com',
                options: {
                    maxConcurrency: 20,
                }
            }
        }
    });

    grunt.loadNpmTasks('grunt-link-checker');

    grunt.registerTask('default', ['link-checker']);

};

package.json

{
  "name": "Website-Link-Crawler",
  "version": "0.0.1",
  "devDependencies": {
    "grunt": "^0.4.5",
    "grunt-link-checker": "0.0.6"
  }
}

Fatal error: Request path contains unescaped characters.

Hi,
The crawler goes through a couple of pages without any problems, but then throws this error. Any ideas as to why that might be the case, and how I could fix it? My configuration looks as follows:

linkChecker: {
  options: {
    maxConcurrency: 10
  },
  postDeploy: {
    site: 'www.radiologen-konstanz.de'
  }
}

Kind regards,
Max

getting 404 for css files

Hi
I'm getting a very strange behavior with the link checker. It tries to get the CSS from the link tag and fails with 404 by trying to make the URL relative.

Resource not found linked from http://myIP:myPORT/products-services/healthcare-credit-card.html to http://myIP:myPORT/products-services/%27/styles/main.css
Status code: 404

I removed the IP since this client work.
Have you seen anything like this? The CSS resolves fine in the page and all styles work. I was just wondering why this error would appear. It doesn't appear on all pages either, just 3 of them.
Thanks
Joe

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.