Coder Social home page Coder Social logo

esteininger / vector-search Goto Github PK

View Code? Open in Web Editor NEW
250.0 58.0 14.0 3.69 MB

The definitive guide to using Vector Search to solve your semantic search production workload needs.

Home Page: http://vectorsearch.dev

Jupyter Notebook 96.52% Python 3.48%
lucene nlp search-engine vector-search

vector-search's Introduction

Vector Search

Vector Search engines provide the ability for developers to store vectors structured around certain algorithms (i.e. KNN), and an engine to compute similar vectors (like cosine distance) to determine which vectors are related.

This repository provides a comprehensive overview of the vector search landscape inclusive of tutorials, guides, best-practices, and extended learning. Please review the Education section to learn more.

Here is how you may use a Vector Search engine within your application search architecture:

Topics

๐Ÿง‘โ€๐Ÿซ Foundation - Learn the core concepts of vector-based information retrieval.

๐ŸŽฌ Use Cases - Understand where it makes sense to use vector search.

๐Ÿ’ต Architecture - Guides on how to use vector search in your architecture.

Foundations

# Label Description
1 Keyword vs Vector Search The difference between standard (TF-IDF) text search and vector search and when to use each.
2 Sparse Vector Tutorial A walkthrough of building your own sparse vector feature extraction engine.
3 Dense Vector Tutorial A walkthrough of building your own dense vector feature extraction engine.
4 Atlas Vector Search Engine Guides that showcase MongoDB Atlas' vector search implementation.
5 Vector Search Comparisons A comparison of the most popular vector search engines.

Use Cases

# Label Description
1 Sentence Similarity Determination of how similar to texts are.
2 Token Classification Classification of text into pre-defined categories.
3 Question and Answering Building systems that automatically answer questions.
4 Personalization Using client data to personalize query results.
5 Automated Synonym Creation Enriching synonyms collection automatically.
6 Summarization Reconstruction of a corpus into less words.
7 Conversational Dialogue response generation.
8 File Search Search the contents of files across multiple modalities

Architecture

One-click model deployment that never leaves your AWS account

# Source Description
1 Reference Architecture Common best-practices for deploying vector search architecture in production.
2 Model Hosting Suggestions on how to host your vector models.
3 Model Versioning Common best-practices for versioning your models as they evolve.
4 Feedback Loops Query re-ranking, learn-to-rank and more.
5 Selecting Models Which model supports your domain-specific tasks best?

Education

Although a challenging topic to grasp, there's a myriad of educational resources at your disposal.

Information Retrieval

Overarching field of education.

Transformer Architecture

Latest breakthrough in the area of converting human content (text, images, etc.) into vector representations. Transformers are a deep learning model that utilize "self-attention", and differentially weigh the significance of each part of the input data.

Similarity Search

In order to determine what is deemed relevant, computers need to measure the distance between points, in this case vectors.

Gratitude

This repository wouldn't be possible without several key individuals:

Watch for Changes

This is a living repository and will evolve as I learn and the landscape changes. Please subscribe to changes accordingly via:

vector-search's People

Contributors

esteininger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vector-search's Issues

Pricing for Atlas

Do you have any knowledge on Vector pricing for Atlas? Most databases store vectors in RAM which can be quite expensive. Does MongoDB/Atlas also require this and require higher cost clusters for large data, or is it stored in HD memory?

Any info would be super useful, couldn't find anything.
Thanks

MongoDB Support for LangChain Vectorstore

https://github.com/hwchase17/langchain is very popular and powerful framework for building applications on top of LLMs, and Vector Search is a key aspect of this.

There are quite a few Vectorstores supported by LangChain, but not MongoDB:
https://python.langchain.com/en/latest/modules/indexes/vectorstores.html

I just opened a ticket in that project to discuss this topic, but I wanted to also bring it up here to see if there was anyone else in the MongoDB community who is using Langchain:
langchain-ai/langchain#2274

cc: @sam-lippert and @AggressivelyMeows

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.