Coder Social home page Coder Social logo

jakubbuszynski / sitemapper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from seantomburke/sitemapper

0.0 0.0 0.0 962 KB

parses sitemaps for Node.JS

Home Page: https://www.npmjs.com/package/sitemapper

License: MIT License

JavaScript 83.59% TypeScript 16.41%

sitemapper's Introduction

Sitemap-parser

Code Scanning NPM Publish Version Bump Test Build Status Codecov CodeFactor GitHub license GitHub release date Inline docs LGTM Alerts LGTM Grade Libraries.io dependency status for latest release license Monthly Downloads npm version release scrutinizer

Parse through a sitemaps xml to get all the urls for your crawler.

Version 2

Installation

npm install sitemapper --save

Simple Example

const Sitemapper = require('sitemapper');

const sitemap = new Sitemapper();

sitemap.fetch('https://wp.seantburke.com/sitemap.xml').then(function(sites) {
  console.log(sites);
});

Examples in ES6

import Sitemapper from 'sitemapper';

(async () => {
  const Google = new Sitemapper({
    url: 'https://www.google.com/work/sitemap.xml',
    timeout: 15000, // 15 seconds
  });

  try {
    const { sites } = await Google.fetch();
    console.log(sites);
  } catch (error) {
    console.log(error);
  }
})();

// or

const sitemapper = new Sitemapper();
sitemapper.timeout = 5000;

sitemapper.fetch('https://wp.seantburke.com/sitemap.xml')
  .then(({ url, sites }) => console.log(`url:${url}`, 'sites:', sites))
  .catch(error => console.log(error));

Options

You can add options on the initial Sitemapper object when instantiating it.

  • requestHeaders: (Object) - Additional Request Headers (e.g. User-Agent)
  • timeout: (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds)
  • url: (String) - Sitemap URL to crawl
  • debug: (Boolean) - Enables/Disables debug console logging. Default: False
  • concurrency: (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10
  • retries: (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0
  • rejectUnauthorized: (Boolean) - If true, it will throw on invalid certificates, such as expired or self-signed ones. Default: True
  • lastmod: (Number) - Timestamp of the minimum lastmod value allowed for returned urls
  • field : (Object) - An object of fields to be returned from the sitemap. For Example: { loc: true, lastmod: true, changefreq: true, priority: true }. Leaving a field out has the same effect as field: false. If not specified sitemapper defaults to returning the 'classic' array of urls.
const sitemapper = new Sitemapper({
  url: 'https://art-works.community/sitemap.xml',
  rejectUnauthorized: true,
  timeout: 15000,
  requestHeaders: {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'
  }
});

An example using all available options:

const sitemapper = new Sitemapper({
  url: 'https://art-works.community/sitemap.xml',
  timeout: 15000,
  requestHeaders: {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'
  },
  debug: true,
  concurrency: 2,
  retries: 1,
});

Examples in ES5

var Sitemapper = require('sitemapper');

var Google = new Sitemapper({
  url: 'https://www.google.com/work/sitemap.xml',
  timeout: 15000 //15 seconds
});

Google.fetch()
  .then(function (data) {
    console.log(data);
  })
  .catch(function (error) {
    console.log(error);
  });


// or


var sitemapper = new Sitemapper();

sitemapper.timeout = 5000;
sitemapper.fetch('https://wp.seantburke.com/sitemap.xml')
  .then(function (data) {
    console.log(data);
  })
  .catch(function (error) {
    console.log(error);
  });

Version 1

npm install [email protected] --save

Simple Example

var Sitemapper = require('sitemapper');

var sitemapper = new Sitemapper();

sitemapper.getSites('https://wp.seantburke.com/sitemap.xml', function(err, sites) {
    if (!err) {
     console.log(sites);
    }
});

sitemapper's People

Contributors

seantomburke avatar actions-user avatar dependabot[bot] avatar esamattis avatar alexm92 avatar oudmane avatar zijua avatar james-mckinnon avatar jasonaibrahim avatar phanect avatar mmeinzer avatar bsq-panagiotis avatar saschanowak avatar terry-au avatar tijevlam avatar achimkoellner avatar jakubbuszynski avatar y16ra avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.