Coder Social home page Coder Social logo

sromexs / get-sitemap-links Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 0.0 19 KB

This package get, fetch, crawl, sitemap pages recursively and fetch all links in between <loc> tag.

TypeScript 100.00%
crawl-sitemap fetch-sitemap get-sitemap sitemap sitemap-crawler sitemap-links sitemap-xml sitemapper typescript

get-sitemap-links's Introduction

I am Sadra ๐Ÿ‘‹

Available for a remote job:

Gmail Badge Telegram Badge Linkedin Badge

My main skills:

TypeScript
NodeJS
JavaScript
MongoDB
React
Material UI
Apollo-GraphQL
Nginx
Ubuntu
Next JS
Express.js
GraphQL
Git
CSS3
HTML5

Intermediate skills

PHP
MySQL

Low skills

C#
Python
Laravel

get-sitemap-links's People

Contributors

sromexs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

get-sitemap-links's Issues

The links should parse XML data format link <CData /> when parsing sitemaps

First of all, thanks for your work here :)

I am using this library for an internal tool and realized that this fails to extract URLs correctly when the loc data is in CDAta format like below

<sitemap>
ย  <loc><![CDATA[https://example.com/post-sitemap.xml]]></loc>
ย  <lastmod><![CDATA[2020-11-16T18:13:33+00:00]]></lastmod>
</sitemap>

In this case, the expected return value is https://example.com/post-sitemap.xm but instead we get <![CDATA[https://example.com/post-sitemap.xml]]>

We perhaps need to add a regex somewhere to extract data between CData section

Something wrong with regex pattern

Regex rules may miss-match sometimes
for example code below may find next url https://www.orangemarketing.com</loc><lastmod>2023-01-30</lastmod></url><url><loc>https://www.orangemarketing.com/555

Its contains part of XML sitemap, looks like issue with missing trailing slash

const GetSitemapLinks = require("get-sitemap-links").default;

(async () => {
  const array = await GetSitemapLinks(
    "https://www.orangemarketing.com/sitemap.xml"
  );
  
  console.table(array);
})();

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.