Coder Social home page Coder Social logo

isnan909 / minisearch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lucaong/minisearch

0.0 2.0 0.0 1.3 MB

Tiny but powerful fulltext search engine for browser and Node

Home Page: https://lucaong.github.io/minisearch/

License: MIT License

JavaScript 100.00%

minisearch's Introduction

MiniSearch MiniSearch

Build Status

MiniSearch is a tiny but powerful in-memory fulltext search engine for JavaScript. It is respectful of resources, and it can comfortably run both in Node and in the browser.

Try out the demo application.

Use case

MiniSearch addresses use cases where full-text search features are needed (e.g. prefix search, fuzzy search, boosting of fields), but the data to be indexed can fit locally in the process memory. While you may not index the whole Wikipedia with it, there are surprisingly many use cases that are served well by MiniSearch. By storing the index in local memory, MiniSearch can work offline, and can process queries quickly, without network latency.

A prominent use-case is search-as-you-type features in web and mobile applications, where keeping the index on the client-side enables fast and reactive UI, removing the need to make requests to a search server.

Features

  • Memory-efficient index, designed to support memory-constrained use cases like mobile browsers.

  • Exact, prefix, and fuzzy search

  • Auto-suggestion engine, for auto-completion of search queries

  • Documents can be added and removed from the index at any time

  • Simple API, providing building blocks to build specific solutions

  • Zero external dependencies, small and well tested code-base

Installation

With npm:

npm install --save minisearch

With yarn:

yarn add minisearch

Then require or import it in your project.

Usage

Basic usage

// A collection of documents for our examples
const documents = [
  { id: 1, title: 'Moby Dick', text: 'Call me Ishmael. Some years ago...' },
  { id: 2, title: 'Zen and the Art of Motorcycle Maintenance', text: 'I can see by my watch...' },
  { id: 3, title: 'Neuromancer', text: 'The sky above the port was...' },
  { id: 4, title: 'Zen and the Art of Archery', text: 'At first sight it must seem...' },
  // ...and more
]

let miniSearch = new MiniSearch({ fields: ['title', 'text'] })

// Index all documents
miniSearch.addAll(documents)

// Search with default options
let results = miniSearch.search('zen art motorcycle')
// => [ { id: 2, score: 2.77258, match: { ... } }, { id: 4, score: 1.38629, match: { ... } } ]

Search options

MiniSearch supports several options for more advanced search behavior:

// Search only specific fields
miniSearch.search('zen', { fields: ['title'] })

// Boost some fields (here "title")
miniSearch.search('zen', { boost: { title: 2 } })

// Prefix search (so that 'moto' will match 'motorcycle')
miniSearch.search('moto', { prefix: true })

// Fuzzy search, in this example, with a max edit distance of 0.2 * term length,
// rounded to nearest integer. The mispelled 'ismael' will match 'ishmael'.
miniSearch.search('ismael', { fuzzy: 0.2 })

// You can set the default search options upon initialization
miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  searchOptions: {
    boost: { title: 2 },
    fuzzy: 0.2
  }
})
miniSearch.addAll(documents)

// It will now by default perform fuzzy search and boost "title":
miniSearch.search('zen and motorcycles')

Auto suggestions

MiniSearch can suggest search queries given an incomplete query:

miniSearch.autoSuggest('zen ar')
// => [ { suggestion: 'zen archery art', terms: [ 'zen', 'archery', 'art' ], score: 1.73332 },
//      { suggestion: 'zen art', terms: [ 'zen', 'art' ], score: 1.21313 } ]

The autoSuggest method takes the same options as the search method, so you can get suggestions for misspelled words using fuzzy search:

miniSearch.autoSuggest('neromancer', { fuzzy: 0.2 })
// => [ { suggestion: 'neuromancer', terms: [ 'neuromancer' ], score: 1.03998 } ]

Tokenization

By default, documents and queries are tokenized splitting on non-alphanumeric characters (accented characters and other diacritics are considered alphanumeric). No stop-word list is applied, but single-character words are excluded. The tokenization logic can be easily changed by passing a custom tokenizer function as the tokenize option:

let stopWords = new Set(['and', 'or', 'to', 'in', 'a', 'the', /* ...and more */ ])

// Tokenize splitting by space and apply a stop-word list
let miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  tokenize: (string) => string.split(/\s+/).filter(word => !stopWords.has(word))
})

Term processing

Terms are downcased by default. No stemming is performed. To customize how the terms are processed upon indexing or searching, for example to normalize them or to apply stemming, the processTerm option can be used:

const removeAccents = (term) =>
  term.replace(/[àá]/, 'a')
      .replace(/[èé]/, 'e')
      .replace(/[ìí]/, 'i')
      .replace(/[òó]/, 'o')
      .replace(/[ùú]/, 'u')

// Perform custom term processing (here removing accents)
let miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  processTerm: (term) => removeAccents(term.toLowerCase())
})

Refer to the API documentation for details about configuration options and methods.

Browser compatibility

MiniSearch natively supports all modern browsers implementing JavaScript standards, but requires a polyfill when used in Internet Explorer, as it makes use functions like Object.entries, Array.includes, and Array.from, which are standard but not available on older browsers. The @babel/polyfill is one such polyfill that can be used to provide those functions.

minisearch's People

Contributors

lucaong avatar

Watchers

James Cloos avatar ishan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.