Coder Social home page Coder Social logo

php-sitemap-generator's Introduction

Sitemap generator


Object based PHP script that generates a XML sitemap with the given config options. I made this script because I wanted to automate making a sitemap for google indexing and because there were not a lot of open source sitemap generators out there.

Sitemap format: http://www.sitemaps.org/protocol.html

Features

Feel free to help me implement any of the missing features or add extra features

  • Generate a sitemap for your website
  • Multiple options for generating sitemaps
  • Set max time limit
  • Saves last result in a temp file
  • Option to only look through certain filetypes
  • Load client side Javascript content when crawling
  • Parse all relative link types (// , # , ?) and more

Installation

Installing this script is simply just downloading both sitemap_config and sitemap_generator and placing them into your project(same directory).

Usage

After installing the script you can use the script by including it into your script

include "/path/to/sitemap-generator.php";

And initializing the class by calling the constructor

// Create an object of the generator class passing the config file
$smg = new SitemapGenerator(include("sitemap-config.php"));
// Run the generator
$smg->GenerateSitemap();

Config

You can alter some of the configs settings by changing the config values.

// Site to crawl and create a sitemap for.
// <Syntax> https://www.your-domain-name.com/ or http://www.your-domain-name.com/
"SITE_URL" => "https://www.fun4m3.de/",


// Boolean for crawling external links.
// <Example> *Domain = https://www.fun4m3.de* , *Link = https://www.google.com* <When false google will not be crawled>
"ALLOW_EXTERNAL_LINKS" => false,

// Boolean for crawling element id links.
// <Example> <a href="#section"></a> will not be crawled when this option is set to false
"ALLOW_ELEMENT_LINKS" => false,

// If set the crawler will only index the anchor tags with the given id.
// If you wish to crawl all links set the value to ""
// <Example> <a id="internal-link" href="/info"></a> When CRAWL_ANCHORS_WITH_ID is set to "internal-link" this link will be crawled
// but <a id="external-link" href="https://www.google.com"></a> will not be crawled.
"CRAWL_ANCHORS_WITH_ID" => "",

// Array with absolute links or keywords for the pages to skip when crawling the given SITE_URL.
// <Example> https://www.fun4m3.de/search/label/Funny or you can just input fun4m3.de/search/label/ and it will not crawl anything in that directory
// Try to be as specific as you can so you dont skip 300 pages
"KEYWORDS_TO_SKIP" => array(),

// Location + filename where the sitemap will be saved.
"SAVE_LOC" => dirname(__FILE__) . "/sitemap.xml",
    
// Location + filename where the temp-sitemap will be saved.
"SAVE_TEMP" => dirname(__FILE__) . "/temp-sitemap.xml",

// Static priority value for sitemap
"PRIORITY" => 1,

// Static update frequency
"CHANGE_FREQUENCY" => "daily",

// Date changed (today's date)
"LAST_UPDATED" => date('Y-m-d'),

// Max time limit in seconds. If vaule is -1 time limit is unlimited time. 
"TIME_LIMIT" => 60*3,

// Timeout of curl
"CURLOPT_TIMEOUT" => 10,

Output

Example output when generating a sitemap using this script

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <!-- 3 total links-->
  <!-- PHP-sitemap-generator by https://github.com/tristangoossens -->
  <url>
    <loc>https://student-laptop.nl/</loc>
    <lastmod>2021-03-10</lastmod>
    <changefreq>daily</changefreq>
    <priority>1</priority>
  </url>
  <url>
    <loc>https://student-laptop.nl/underConstruction</loc>
    <lastmod>2021-03-10</lastmod>
    <changefreq>daily</changefreq>
    <priority>1</priority>
  </url>
  <url>
    <loc>https://student-laptop.nl/article?article_id=1</loc>
    <lastmod>2021-03-10</lastmod>
    <changefreq>daily</changefreq>
    <priority>1</priority>
  </url>
</urlset>

php-sitemap-generator's People

Contributors

ichbinchrist avatar tristangoossens avatar oblab avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.