Coder Social home page Coder Social logo

middleman-search's Introduction

Middleman::Search

LunrJS-based search for Middleman.

Installation

Add this line to your application's Gemfile:

gem 'middleman-search'

And then execute:

$ bundle

Or install it yourself as:

$ gem install middleman-search

Usage

You need to activate the module in your config.rb, telling the extension how to index your resources:

activate :search do |search|

  search.resources = ['blog/', 'index.html', 'contactus/index.html']

  search.index_path = 'search/lunr-index.json' # defaults to `search.json`
  
  search.lunr_dirs = ['source/vendor/lunr-custom/'] # optional alternate paths where to look for lunr js files

  search.language = 'es' # defaults to 'en'

  search.fields = {
    title:   {boost: 100, store: true, required: true},
    content: {boost: 50},
    url:     {index: false, store: true},
    author:  {boost: 30}
  }
end

Where resources is a list of the beginning of the URL of the resources to index (tested with String#start_with?), index_path is the relative path of the generated index file in your site, and fields is a hash with one entry for each field to be indexed, with a hash of options associated:

  • boost Specifies lunr relevance boost when searching this field
  • store Whether to store this field in the document map (see below), defaults to false
  • index Whether to index this field, defaults to true
  • required The resource will not be indexed if a field marked as required has an empty or null value

Note that a special field id is included automatically, with an autogenerated identifier to be used as the ref for the document.

All fields values are retrieved from the resource data (i.e. its frontmatter), or from the options in the resource.metadata (i.e. any options specified in a proxy page), except for:

  • url which is the actual resource url
  • content the text extracted from the rendered resource, without including its layout

You can then query the index from Javascript via the lunrIndex object (see Index file for more info):

var max_search_entries = 50;

var result = []; //initialize empty array

lunrIndex.search(request.term).forEach( function (item, index) {
  if ( index < max_search_entries ) {
    result.push(lunrData.docs[item.ref]);
  }
});

(Thanks @Jeepler for adapting the lodash v3 code we used to use at Manas)

i18n

This gem includes assets for alternate languages as provided by MihaiValentin/lunr-languages. Please refer to that repository for a list of the languages available.

If you want to work with a language that is not included, set up a lunr.yourlang.js file in a folder in your project, and add that folder to lunr_dirs so the gem knows where to look for it.

Manual index manipulation

You can fully customise the content to be indexed and stored per resource by defining a before_index callback:

activate :search do |search|
  search.before_index = Proc.new do |to_index, to_store, resource|
    if author = resource.data.author
      to_index[:author] = data.authors[author].name
    end
  end
end

This option accepts a callback that will be executed for each resource, and will be executed with the document to be indexed and the map to be stored, in the index and docs objects of the output respectively (see below), as well as the resource being processed. You can use this callback to modify either of those, or throw(:skip) to skip the resource in question.

Lunr pipeline configuration

In some cases, you may want to add new function to the lunr pipeline, both for creating the indexing and then for searching. You can do this by providing a pipeline hash with function names and body, for example:

activate :search do |search|
  search.pipeline = {
    tildes: <<-JS
      function(token, tokenIndex, tokens) {
        return token
          .replace('á', 'a')
          .replace('é', 'e')
          .replace('í', 'i')
          .replace('ó', 'o')
          .replace('ú', 'u');
      }
    JS
  }
end

This will register the tildes function in the lunr pipeline and add it when building the index. From the Lunr documentation:

Functions in the pipeline are called with three arguments: the current token being processed; the index of that token in the array of tokens, and the whole list of tokens part of the document being processed. This enables simple unigram processing of tokens as well as more sophisticated n-gram processing.

The function should return the processed version of the text, which will in turn be passed to the next function in the pipeline. Returning undefined will prevent any further processing of the token, and that token will not make it to the index.

Note that if you add a function to the pipeline, it will also be loaded when de-serialising the index, and lunr will fail with an Cannot load un-registered function: tildes error if it has not been re-registered. You can either register them manually, or simply include the following in a .js.erb file to be executed before loading the index:

<%= search_lunr_js_pipeline %>

Index file

The generated index file contains a JSON object with two properties:

  • index contains the serialised lunr.js index, which you can load via lunr.Index.load(lunrData.index)
  • docs is a map from the autogenerated document ids to an object that contains the attributes configured for storage

You will typically load the index into a lunr index instance, and then use the docs map to look up the returned value and present it to the user.

You should also require the lunr.min.js file in your main sprockets javascript file (if using the asset pipeline) to be able to actually load the index:

//= require lunr.min

If you're using lunr's i18n capabilities, you should also load the Stemmer support and language files (in that order) here:

//= require lunr.min
//= require lunr.stemmer.support
//= require lunr.es

Asset pipeline

The Middleman pipeline (if enabled) does not include json files by default, but you can easily modify this by adding .json to the exts option of the corresponding extensions, such as gzip and asset_hash:

activate :asset_hash do |asset_hash|
  asset_hash.exts << '.json'
end

Note that if you run the index json file through the asset hash extension, you will need to retrieve the actual destination URL when loading the file in the browser for searching, using the search_index_path view helper:

var lunrIndex = null;
var lunrData  = null;

// Download index data
$.ajax({
  url: "<%= search_index_path %>",
  cache: true,
  method: 'GET',
  success: function(data) {
    lunrData = data;
    lunrIndex = lunr.Index.load(lunrData.index);
  }
});

Acknowledgments

A big thank you to:

middleman-search's People

Contributors

drallgood avatar jronallo avatar matiasgarciaisaia avatar rtack avatar spalladino avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.