Coder Social home page Coder Social logo

adamlutka / php-multisearch Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 569 KB

PHP 7 extension which enables efficient multiple strings search

License: MIT License

Makefile 0.60% M4 0.19% Shell 0.06% Dockerfile 0.19% C++ 95.58% C 0.27% PHP 3.11%
php7-extension c-plus-plus aho-corasick string-search php-extension php

php-multisearch's Introduction

PHP-multisearch build:

PHP 7 extension which enables efficient multiple strings search. It uses Aho-Corasick algorithm so the time complexity of the algorithm is linear in the length of the strings plus the length of the searched text plus the number of output matches.

<?php
$needlesBundle = new MultiSearch\NeedlesBundle();
$needlesBundle->insert('key', 'value');
$needlesBundle->insert('key2', 'value2');
$needlesBundle->insert('key3');

$hits = $needlesBundle->searchIn('Haystack contains key3.');

var_dump($hits[0]->getKey());        // string(3) "key"
var_dump($hits[0]->getValue());      // string(5) "value"
var_dump($hits[0]->getPosition());   // int(18)

var_dump($hits[1]->getKey());        // string(4) "key3"
var_dump($hits[1]->getValue());      // string(0) ""
var_dump($hits[1]->getPosition());   // int(18)

Consider following use case. You have file with relatively static set of terms which you want to search frequently. For example blacklist of words for user statuses on your social network. If you use php-fpm then most of the work is done only once during first request and all following requests during worker lifetime use datastructure from memory.

<?php
$filepath = '/etc/passwd';
$storage = MultiSearch\MemoryPersistentStorage::getInstance();

if (!$storage->hasNeedlesBundle($filepath)) {
	$loader = new MultiSearch\NeedlesBundleLoader();
	$needlesBundle = $loader->loadFromFile($filepath);
	$storage->setNeedlesBundle($filepath, $needlesBundle);
} else {
	$needlesBundle = $storage->getNeedlesBundle($filepath);
}

foreach ($needlesBundle->getNeedles() as $needle) {
	var_dump($needle->getKey());
}

You can see more examples in tests or see API reference.

PHP-multisearch isn't thread-safe.

Getting Started

Prerequisites

Build enviroment is prepared in docker. Multisearch extension isn't coupled to docker anyhow so you can build it manually but it's easier to use prepared solution.

Installing

Run make to build and run docker container which builds extension and also runs tests. Builded extension is placed in build/output directory.

make debian.stretch PHP_VERSION=7.1

Run make to install builded extension. You have to be root or use sudo and PHP of specified version has to be installed.

make install PHP_VERSION=7.1

Test that extension is loaded in CLI.

php --ri multisearch

Configuration

The extension configuration is inside INI file.

Needles bundle file format

Needles can be stored in a file like the following one:

key1	value1
k\ne\ty\n2	va\nlu\te2
key3
key4	value4

Each line represents one needle. Everything from the begining of line to the first tab character is key (string that is searched in a haystack). The rest of line is value which can be omitted. You can use escape sequences \t and \n if key or value contain tab or new line character.

php-multisearch's People

Contributors

adamlutka avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.