Coder Social home page Coder Social logo

farouqzaib / fast-search Goto Github PK

View Code? Open in Web Editor NEW
51.0 2.0 3.0 700 KB

Vector Database implemented in Golang with support for full-text and vector search as well as fault tolerance via Raft.

Go 100.00%
full-text-search hnsw vector-database embeddings-similarity nearest-neighbor-search search-engine

fast-search's Introduction

xdb: Distributed lite vector database built from *scratch.

What it does:

  • full-text search using proximity ranking
  • semantic search via HNSW + Cosine distance
  • integrated basic text embedding service (Python HTTP API around a sentence transformer)
  • Reciprocal Rank Fusion for merging full-text + semantic search results
  • in-memory serving + disk persistence
  • fault-tolerance with segment replication using Raft

What it's not:

  • production-ready (code is pretty sus right now)
  • perfect (because enemy of good)

Architecture

Getting started

Install the dependencies for the basic text embedding service in third_party folder using pip.

pip install -r requirements.txt

Then start the service like so:

uvicorn main:app

Set an environment variable EmbeddingHost which points to the address of the embedding service

export EmbeddingHost="http://127.0.0.1:8000/embeddings"

Proceed to start instance(s) of the vector db

flags
  • httpAddr: address of HTTP API service
  • joinAddr: HTTP API service address of primary node to join
  • nodeId: unique identifier for node
  • raftAddr: raft address for node
Run single-node
go run cmd/server/main.go -httpAddr 127.0.0.1:8111 -nodeId 0 -raftAddr 127.0.0.1:9000

API

POST /index

index a document

curl --location '127.0.0.1:8111/index' --header 'Content-Type: application/json' --data '{"text": "some text"}'
GET /search

do a search

curl --location --request GET '127.0.0.1:8111/search' \
--header 'Content-Type: application/json' \
--data '{"query": "some text"}'
Run 3-node cluster

Run the commands below on different machines (at least different instances of the project to simulate)

go run cmd/server/main.go -httpAddr 127.0.0.1:8111 -nodeId 0 -raftAddr 127.0.0.1:9000

Replicas join the primary on 127.0.0.1:9000

go run cmd/server/main.go -httpAddr 127.0.0.1:8112 -nodeId 1 -raftAddr 127.0.0.1:9001 -joinAddr 127.0.0.1:8111
go run cmd/server/main.go -httpAddr 127.0.0.1:8113 -nodeId 2 -raftAddr 127.0.0.1:9002 -joinAddr 127.0.0.1:8111

TODO

  • Indexing
    • Concurrent indexing using goroutines to process terms
  • Retrieval
    • Boolean queries
    • Concurrent memtable search
  • Ranking
  • API
    • Bulk index
    • Document deletion
  • Storage
    • Segment compaction
  • Replication
    • Snapshot working?
  • Deployment
    • Containerisation
  • Code quality
    • Penance for all the atrocities I committed.

*Would not have been possible without these resources:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.