Coder Social home page Coder Social logo

rpaul80 / big_sitemap Goto Github PK

View Code? Open in Web Editor NEW

This project forked from alexrabarts/big_sitemap

1.0 2.0 0.0 481 KB

A Sitemap library suitable for large sites. Compatible with most frameworks, including Rails and Merb.

Home Page: http://github.com/alexrabarts/big_sitemap

License: MIT License

big_sitemap's Introduction

BigSitemap

BigSitemap is a Sitemap generator suitable for applications with greater than 50,000 URLs. It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries to minimize memory usage, supports increment updates, can be set up with just a few lines of code and is compatible with just about any framework.

BigSitemap is best run periodically through a Rake/Thor task.

require 'big_sitemap'

sitemap = BigSitemap.new(
  :url_options   => {:host => 'example.com'},
  :document_root => "#{APP_ROOT}/public"
)

# Add a model
sitemap.add Product

# Add another model with some options
sitemap.add(Post,
  :conditions       => {:published => true},
  :path             => 'articles',
  :change_frequency => 'daily',
  :priority         => 0.5
)

# Add a static resource
sitemap.add_static('http://example.com/about', Time.now, 'monthly', 0.1)

# Generate the files
sitemap.generate

The code above will create a minimum of four files:

  1. public/sitemaps/sitemap_index.xml.gz

  2. public/sitemaps/sitemap_products.xml.gz

  3. public/sitemaps/sitemap_posts.xml.gz

  4. public/sitemaps/sitemap_static.xml.gz

If your sitemaps grow beyond 50,000 URLs (this limit can be overridden with the :max_per_sitemap option), the sitemap files will be partitioned into multiple files (sitemap_products_1.xml.gz, sitemap_products_2.xml.gz, …).

Framework-specific Classes

Use the framework-specific classes to take advantage of built-in shortcuts.

Rails

BigSiteMapRails includes UrlWriter (useful for making use of your Rails routes - see the Location URLs section) and deals with setting the :document_root and :url_options initialization options.

Merb

BigSitemapMerb deals with setting the :document_root initialization option.

Install

Via gem:

sudo gem install big_sitemap

Advanced

Initialization Options

  • :url_options – hash with :host, optionally :port and :protocol

  • :base_url – string alternative to :url_options, e.g. 'https://example.com:8080/'

  • :document_root – string

  • :path – string defaults to 'sitemaps', which places sitemap files under the /sitemaps directory

  • :max_per_sitemap50000, which is the limit dictated by Google but can be less

  • :batch_size1001 (not 1000 due to a bug in DataMapper)

  • :gziptrue

  • :ping_googletrue

  • :ping_yahoofalse, needs :yahoo_app_id

  • :ping_bingfalse

  • :ping_askfalse

  • :partial_updatefalse

Chaining

You can chain methods together:

BigSitemap.new(:url_options => {:host => 'example.com'}).add(Post).generate

With the Rails-specific class, you could even get away with as little code as:

BigSitemapRails.new.add(Post).generate

Pinging Search Engines

To ping search engines, call ping_search_engines after you generate the sitemap:

sitemap.generate.ping_search_engines

Location URLs

By default, URLs for the “loc” values are generated in the form:

:base_url/:path|<table_name>/<to_param>|<id>

Alternatively, you can pass a lambda. For example, to make use of your Rails route helper:

sitemap.add(Post,
  :location => lambda { |post| post_url(post) }
)

Change Frequency, Priority and Last Modified

You can control “changefreq”, “priority” and “lastmod” values for each record individually by passing lambdas instead of fixed values:

sitemap.add(Post,
  :change_frequency => lambda { |post| ... },
  :priority         => lambda { |post| ... },
  :last_modified    => lambda { |post| ... }
)

Find Methods

Your models must provide either a find_for_sitemap or all class method that returns the instances that are to be included in the sitemap.

Additionally, you models must provide a count_for_sitemap or count class method that returns a count of the instances to be included.

If you’re using ActiveRecord (Rails) or DataMapper then all and count are already provided and you can make use of any supported parameter: (:conditions, :limit, :joins, :select, :order, :include, :group)

sitemap.add(Track,
  :select     => "id, permalink, user_id, updated_at",
  :include    => :user,
  :conditions => "public = 1 AND state = 'finished' AND user_id IS NOT NULL",
  :order      => "id ASC"
)

If you provide your own find_for_sitemap or all method then it should be able to handle the :offset and :limit options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.

Partial Update

If you enable :partial_update, the filename will include an id smaller than the id of the first entry. This is perfect to update just the last file with new entries without the need to re-generate files being already there.

Lock Generation Process

To prevent another process overwriting from the generated files, use the with_lock method:

sitemap.with_lock do
  sitemap.generate
end

Cleaning the Sitemaps Directory

Calling the clean method will remove all files from the Sitemaps directory.

Limitations

If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). In this case and your database uses incremental primary IDs then you might want to use the :partial_update option, which looks at the last ID instead of paginating.

TODO

Tests for framework-specific components.

Credits

Thanks to Alastair Brunton and Harry Love, who’s work provided a starting point for this library.

Thanks also to those who have contributed patches:

  • Mislav Marohnić

  • Jeff Schoolcraft

  • Dalibor Nasevic

  • Tobias Bielohlawek (www.rngtng.com)

Copyright © 2010 Stateless Systems (statelesssystems.com). See LICENSE for details.

big_sitemap's People

Contributors

alexrabarts avatar dalibor avatar mislav avatar sbecker avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.