ekalinin / sitemap.js Goto Github PK
View Code? Open in Web Editor NEWSitemap-generating framework for node.js
License: MIT License
Sitemap-generating framework for node.js
License: MIT License
According to http://www.sitemaps.org/protocol.html#xmlTagDefinitions:
These tags are optional:
Actually, it seems to be impossible to get rid of them in the generated xml ...
Even if can understand the '0.5' default value for the priority, I think there is no point for adding it: it will increase the page weight for nothing
I could not find that default value for changefreq was 'weekly' in the specification, but the same goes for this tag
An option to add a style sheet would be greatly appreciated!
In lib/sitemap.xml
you have this:
Sitemap.prototype.toXML = function (callback) {
if (typeof callback === 'undefined') {
return this.toString();
}
var self = this;
process.nextTick( function () {
if (callback.length === 1) {
callback( self.toString() );
} else {
callback( null, self.toString() );
}
});
}
However, Sitemap.prototype.toString()
may throw error, since it will call new SitemapItem()
and that may throw error too (when there is no URL, no protocol, etc). The async-style API never pass the error to the callback so the error becomes an uncaught exception, which will kill Node process. How about changing .toXML()
to:
Sitemap.prototype.toXML = function (callback) {
if (typeof callback === 'undefined') {
return this.toString();
}
var self = this;
process.nextTick( function () {
try {
return callback(null, self.toString());
} catch (err) {
return callback(err);
}
});
}
I know this will break backward compatibility but I think this is the proper way to do it. If agreed, I can work on this.
From 1 300 000 urls I get a segmentation fault ...
It failed on utils.chunckArray() function.
I rewrite this function with underscore module with success but it add a new dependency...
Is there any chance to specify http and https toghether?
Thanks, G.
This would be useful to add new data to existing sitemaps
First of all, nice module! I stumbled upon one little problem though,
Enabling cache on this module starts an interval timer. This makes my program unable to exit. This is not a problem in production since it shouldn't exit there but it blocks my unit tests from finishing gracefully.
https://github.com/ekalinin/sitemap.js/blob/master/lib/sitemap.js#L150-L152
The current mechanism is also wrong in the way that if a sitemap is generated right before the interval fires, the cache is cleared immediately and the sitemap is not cached for the desired time.
I think it would be better to introduce a caching mechanism where on generating a sitemap the current time is stored with the cache and this time is checked on retrieval with the desired caching time. Thus eliminating the need for an interval timer.
I resolved the issue for me by disabling cache and rolling my own. (I will stick with this even if you fix this issue since it's a better separation of concerns anyway)
What do you think about using a dependency (e.g. https://github.com/davidcalhoun/jstoxml) to create a valid xml output instead of doing string concatenations? In my opinion this would definitely make the code easier to understand and maintain.
When i install it, i see no "tests" folder, you forgot to add it into npm? or it's my fault that i'm missing some extra command?
Thanks!
When using url's like this:
url: '/#/home'
it get's stripped as:
https://hostname.com#/home
If sitemap.js is pulled into a browserify build, the javascript has frontend errors, because the use of fs.readFileSync to get the version number is done in a way that brfs can't understand:
/**
* Framework version.
*/
var fs = require('fs')
, path = require('path')
, pack_file = path.join(__dirname, 'package.json');
if ( !module.exports.version ) {
module.exports.version = JSON.parse(
fs.readFileSync(pack_file, 'utf8')).version;
}
If this was changed to be something like:
var fs = require('fs');
if ( !module.exports.version ) {
module.exports.version = JSON.parse(
fs.readFileSync(__dirname + "/package.json", 'utf8')).version;
}
It should work with brfs
If I add img : ['img1','img2'] it simply converts this back to img: img1,img2
Which limits # of images you can put per page down to 1. Should allow multiple
Support sitemap for videos, and is there anyway i can create sitemap for video pages
Thanks
Nice module!
I was wondering if there was a way to pass in an express object, or express router object to sitemap.js and have it spit out the sitemap.xml file.
So:
hostname: 'http://website.com'
with url: 'page.html'
=> http://website.compage.html
hostname: 'http://website.com/'
with url: '/page.html'
=> http://website.com//page.html
I'd expect this module to remove and add slashes where necessary.
Sure, when adding pages while the app is running. Makes totally sense.
But how do I replace urls, since they are passed as array at creation time.
module.js:338
throw err;
^
Error: Cannot find module 'underscore'
at Function.Module._resolveFilename (module.js:336:15)
at Function.Module._load (module.js:278:25)
at Module.require (module.js:365:17)
at require (module.js:384:17)
at Object. (/Users/joaoribeiro/Documents/Projects/cloudtasks.io/node_modules/sitemap/lib/utils.js:7:9)
at Module._compile (module.js:460:26)
at Object.Module._extensions..js (module.js:478:10)
at Module.load (module.js:355:32)
at Function.Module._load (module.js:310:12)
at Module.require (module.js:365:17)
at require (module.js:384:17)
I want to generate sitemap index that gets updated every month with new sitemaps. I tried the following:
var opts = {
cacheTime: 600000,
hostname: 'https://xxx.com/sitemaps',
sitemapName: 'sitemap',
sitemapSize: 1,
targetFolder: path.join(__dirname, '../public/sitemaps')
};
var arr = [];
for (var x in res) {
console.log(res[x].url);
arr.push(res[x].url);
}
opts['urls'] = arr;
var sitemapIndex = sm.createSitemapIndex(opts);
But this generates the following files:
sitemap-index.xml
sitemap-0.xml
sitemap-1.xml
sitemap-2.xml
where the sitemap-index file contains:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<sitemap>
<loc>https://xxx/sitemaps/sitemap-0.xml</loc>
</sitemap>
<sitemap>
<loc>https://xxx/sitemaps/sitemap-1.xml</loc>
</sitemap>
<sitemap>
<loc>https://xxx/sitemaps/sitemap-2.xml</loc>
</sitemap>
</sitemapindex>
And each contains:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url> <loc>https://actual_site_url</loc> </url>
</urlset>
I actually want the sitemap-index to be:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<sitemap>
<loc>https://path_to_xml1_specified_by_me.xml</loc>
</sitemap>
<sitemap>
<loc>https://path_to_xml2_specified_by_me.xml</loc>
</sitemap>
<sitemap>
<loc>https://path_to_xml3_specified_by_me.xml</loc>
</sitemap>
</sitemapindex>
How can I give an array of paths to sitemap xmls while executing .createSitemapIndex instead of having it create sitemap-1, sitemap-2 on its own.
I am using S3 to store xml files, so I want to give the path to xmls on my Amazon S3 server inside sitemapindex.
Is it possible to generate sitemap.xml from local files in a directory? How?
or I need a local server?
var sm = require('sitemap')
var staticUrls = ['/', '/terms', '/login']
var sitemap = sm.createSitemap({
hostname: process.env.HOST_URL || 'http://babeleo.com',
urls: staticUrls
})
sitemap.add({
url: '/details/' + 'url1'
})
console.log('sitemap1 urls', sitemap.urls)
var sitemap2 = sm.createSitemap({
hostname: process.env.HOST_URL || 'http://babeleo.com',
urls: staticUrls
})
console.log('sitemap2 urls', sitemap2.urls)
And the output is:
sitemap1 urls [ '/', '/terms', '/login', { url: '/details/url1' } ]
sitemap2 urls [ '/', '/terms', '/login', { url: '/details/url1' } ]
Expected output:
sitemap1 urls [ '/', '/terms', '/login', { url: '/details/url1' } ]
sitemap2 urls [ '/', '/terms', '/login' ]
According to readme...
make env
make: *** No rule to make target 'env'. Stop.
Please don't close the issue this time. Thanks!
Hi,
I need to create news sitemap for our company blog.
it needs to the support the following
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"> <url> <loc>https://www.website.com/news/article/[ARTICLE_SLUG]</loc> <news:news> <news:publication> <news:name>[Company Name]</news:name> <news:language>en</news:language> </news:publication> <news:genres>Blog</news:genres> <news:publication_date>[PUBLISH_DATE]</news:publication_date> <news:title>[ARTICLE_TITLE]</news:title> <news:keywords>[META_KEYWORDS]</news:keywords> </news:news> </url> </urlset>
is this package support this?
I get the Error
ERROR: (gcloud.preview.app.deploy) Error Response: [400] Invalid character in filename: node_modules/sitemap/env/lib/python2.7/site-packages/setuptools/script (dev).tmpl
{ url: '', lastmod: moment().toISOString(), changefreq: 'daily', priority: 0.3 },
{ url: '/culture', lastmod: moment().toISOString(), changefreq: 'monthly', priority: 0.9 },
{ url: '/culture/putting-dent-into-the-universe', lastmod: moment().toISOString(), changefreq: 'monthly', priority: 0.89 },
{ url: '/culture/workmanship-is-the-key', lastmod: moment().toISOString(), changefreq: 'monthly', priority: 0.88 }
We always get Nan-Nan-Nan for the lastmod field in generated sitemap.xml file.
Any idea ??
Hi,
I'm interested in adding the expires tag to my sitemap for use with Google Custom Search as described here: https://developers.google.com/custom-search/docs/indexing#on-demand-removal
It doesn't look like there's currently a way to do this with sitemap.js. Is this something you'd feel okay with being in the project? I realize its not currently a standard.
Hi, whenever I add an image with its caption I always end up being a wrong url such as:
<url> <loc> https://URL </loc> <image:image> <image:loc>https://URL/[object Object]</image:loc> </image:image> </url>
The image:loc is wrong and the caption doesn't seem to appear...
The problem seems to be at ligne 383 in your sitemap.js file
Sitemap protocol supports image tags. Do you have any plans to add it to your module?
I want to run a cronjob everynight that builds all my urls and writes out to ./public/sitemap.xml
Is this possible?
The version published to npm contains an env folder inside which is huge and doesn't look like is used anywhere in the code. It contains the sources of nodejs /node-v5.1.0-linux-x64
.
It would be great it would be supported that individual pages can be removed or added.
section: https://github.com/ekalinin/sitemap.js#example-of-sitemap-index-as-string
current:
var sm = require('sitemap') , smi = new sm.buildSitemapIndex({ urls: ['https://example.com/sitemap1.xml', 'https://example.com/sitemap2.xml'] xslUrl: 'https://example.com/style.xsl' // optional });
should be:
var sm = require('sitemap') , smi = sm.buildSitemapIndex({ urls: ['https://example.com/sitemap1.xml', 'https://example.com/sitemap2.xml'] xslUrl: 'https://example.com/style.xsl' // optional });
because sm.buildSitemapIndex
is a static method.
self is not defined in buildSitemapIndex (line 469 in sitemap.js)
The code says:
// The date of last modification (YYYY-MM-DD)
It is possible to specify the hour/minute/second somehow or needs PR?
Surely createSitemapIndex()
should work like createSitemap()
? Instead I find that I'm having to write a file and then read it! Thus I'm not using this module to generate my sitemap index any more.
If I get a chance, would you mind if I did a PR to implement functionality so that I can use createSitemapIndex(conf).toString()
?
The following code from example will not work
sitemap.toXML( function(xml){ console.log(xml) });
You have to change example to handle error
sitemap.toXML( function(err, xml){ if (!err){console.log(xml)} });
As indicated in this link: https://developers.google.com/search/docs/guides/create-URLs
We have to add entries for both canonical and AMP URLs.
It's a little similar to androidLink support.
Just make those changes to sitemap.js:
...
+ this.ampLink = conf['ampLink'] || null;
...
- var xml = '<url> {loc} {img} {lastmod} {changefreq} {priority} {links} {androidLink} {mobile} {news}</url>'
- , props = ['loc', 'img', 'lastmod', 'changefreq', 'priority', 'links', 'androidLink', 'mobile', 'news']
+ var xml = '<url> {loc} {img} {lastmod} {changefreq} {priority} {links} {androidLink} {ampLink} {mobile} {news}</url>'
+ , props = ['loc', 'img', 'lastmod', 'changefreq', 'priority', 'links', 'androidLink', 'ampLink', 'mobile', 'news']
...
+ } else if (this[p] && p == 'ampLink') {
+ xml = xml.replace('{' + p + '}', '<xhtml:link rel="amphtml" href="' + this[p] + '" />');
...
Thank you.
The tests currently fail on my machine probably due to timezone offset (Iโm currently in New Zealand, UTC +12) and the test fails because
(calculated by sitemap.js with input lastmod
2011-06-27 and timezone offset) 2011-06-28 โ 2011-06-27 (set in comparison string)
Seems like we should have a look at the date calculation.
Given a cacheDuration of say 5000, if requests come in every 4 seconds then the cache will never expire and therefore never allow new dynamic items to be added when using isCacheValid.
This line effectively resets the cacheSetTimestamp anytime it's called. Therefore creating a sliding expiration.
Maybe I'm using the cache in an unintended way but it seems to me a typical use case to build a sitemap, and rebuild it after a set amount of time to allow new dynamic items to appear.
Since the cache behavior is not documented I'm not sure if this could be considered a bug or not.
I tried to generate a sitemap and to validate it with:
https://validator.w3.org/
And it does not validate.
This is not necessearly a problem, but it would probably be better to fix it.
(Thank you for this library).
Does this module support multiple sitemaps for particularly large sites?
E.g. chunking based on the sitemap max size?
I have an AngularJs powered website which uses hashbang (#!) in the URLs.
I tried setting my hostname and map url to every possible way but couldn't make the hashbang url work.
For example my url is: "www.example.com/#!/home". No matter what I configure I always get "www.example.com#!/home". Note the missing slash after dot com.
It might be a bug in the "url-join" component. Please investigate.
Please add support for <image:caption>
as it is seen here: https://support.google.com/webmasters/answer/183668
Thank you!!
Is it possible to add a lastmod
property for entries in a sitemap index?
I currently have code like this to create the sitemap index XML string:
// create the xml to be used for the sitemap-index by giving it an array of urls
const sitemapIndexXML = pd.xml(sitemap.buildSitemapIndex({
urls: fileNameArray.map((fileName) => `${baseUrl}/${fileName}`)
}));
Is there any way get the generated sitemap index to have a <lastmod>
element, similar to how the module allows you to pass a lastmodISO
property when creating a sitemap?
Ideally it would produce XML like this:
<sitemap>
<loc>http://example.com/sitemap-us.xml</loc>
<lastmod>2016-11-22T18:06:38.207Z</lastmod>
</sitemap>
<sitemap>
<loc>http://example.com/sitemap-ca.xml</loc>
<lastmod>2016-11-22T18:06:38.207Z</lastmod>
</sitemap>
Sitemaps files shouldn't exceed 10MB or 50,000 URLs per file. This means you need a sitemap index file. I couldn't see this in the api, can it be done already?
For more infomation, see: http://www.sitemaps.org/faq.html#faq_sitemap_size
Hey,
I don't see mobile support in this module.
Described here: http://www.google.com/schemas/sitemap-mobile/1.0/
I can create pull requests with this feature, but when It is can be expected to be merged to master?
How i can do this :
https://support.google.com/webmasters/answer/2620865?hl=en
Priority 0 gets removed from sitemap. Should it not show: <priority>0.0</priority>
instead?
Also setting priority to 1 shows as 1 in sitemap XML. Would that be wise to always print 1 decimal point in the XML e.g. 1 becomes <priority>1.0</priority>
always!
<urlset> Required Encloses all information about the set of URLs included in the Sitemap.
Hi @ekalinin
do you provide an additional parameter for deep linking?
example:
<xhtml:link rel="alternate" href="android-app://com.example.android/example/gizmos" />
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.