zilliztech / feder Goto Github PK

Visualize hnsw, faiss and other anns index

JavaScript 0.41% Python 0.14% HTML 0.04% Jupyter Notebook 94.82% TypeScript 4.59%

faiss milvus hnsw visualization

feder's Introduction

Feder

What is feder

Feder is a JavaScript tool designed to aid in the comprehension of embedding vectors. It visualizes index files from Faiss, HNSWlib, and other ANN libraries to provide insight into how these libraries function and the concept of high-dimensional vector embeddings. Currently, Feder is primarily focused on the IVF_FLAT index file type from Faiss and the HNSW index file type from HNSWlib, though additional index types will be added in the future.

Feder is written in javascript, and we also provide a python library federpy, which is based on federjs.

NOTE:

In IPython environment, it supports users to generate the corresponding visualization directly.
In other environments, it supports outputting visualizations as html files, which can be opened by the user through the browser with web service enabled.

Online demos

How feder works

Wiki

Usage

HNSW visualization screenshots

IVF_Flat visualization screenshots

Quick Start

Installation

Use npm or yarn.

yarn install @zilliz/feder

Material Preparation

Make sure that you have built an index and dumped the index file by Faiss or HNSWlib.

Init Feder

Specifying the dom container that you want to show the visualizations.

import { Feder } from '@zilliz/feder';

const feder = new Feder({
  filePath: 'faiss_file', // file path
  source: 'faiss', // faiss | hnswlib
  domSelector: '#container', // attach dom to render
  viewParams: {}, // optional
});

Visualize the index structure.

HNSW - Feder will show the top-3 levels of the hnsw-tree.
IVF_Flat - Feder will show all the clusters.

feder.overview();

Explore the search process.

Set search parameters (optional) and Specify the query vector.

feder
  .setSearchParams({
    k: 8, // hnsw, ivf_flat
    ef: 100, // hnsw (ef_search)
    nprobe: 8, // ivf_flat
  })
  .search(target_vector);

Examples

We prepare a simple case, which is the visualizations of the hnsw and ivf_flat with 17,000+ vectors that embedded from VOC 2012).

git clone [email protected]:zilliztech/feder.git
cd feder
yarn install
yarn dev

Then open http://localhost:12355/

It will show 4 visualizations:

hnsw overview
hnsw search view
ivf_flat overview
ivf_flat search view

Feder for Large Index

Feder consists of three components:

FederIndex - parse the index file. It requires a lot of memory.
FederLayout - layout calculations. It consumes a lot of computational resources.
FederView - render and interaction.

In case of excessive amount of data, we support separating the computation part and running it on a node server. We have two solutions for you:

oneServer
- federServer (with FederIndex and FederLayout).
twoServer
- indexServer (with FederIndex)
- layoutServer (with FederLayout)

Referring to case/oneServer and case/twoServer.

Example with One Server

launch the server

yarn test_one_server_backend

launch the front web service

yarn test_one_server_front

open http://localhost:8000

Example with Two Servers

launch the FederIndex server

yarn test_two_server_feder_index

launch the FederLayout server

yarn test_two_server_feder_layout

launch the front web service

yarn test_two_server_front

open http://localhost:8000

Pipeline - explore a new dataset with feder

Step 1. Dataset preparation

Put all images to test/data/images/. (example dataset VOC 2012)

You can also generate random vectors without embedding for index building and skip to step 3.

Step 2. Generate embedding vectors

Recommend to use towhee, one line of code to generating embedding vectors!

We have the encoded vectors ready for you.

Step 3. Build an index and dump it.

You can use faiss or hnswlib to build the index.

(*Detailed procedures please refer to their tutorials.)

Referring to test/data/gen_hnswlib_index_*.py or test/data/gen_faiss_index_*.py

Or we have the index file ready for you.

Step 4. Init Feder.

import { Feder } from '@zilliz/feder';
import * as d3 from 'd3';

const domSelector = '#container';
const filePath = [index_file_path];
const source = "hnswlib"; // "hnswlib" or "faiss"

const mediaCallback = (rowId) => mediaUrl;

const feder = new Feder({
  filePath,
  source,
  domSelector,
  viewParams: {
    mediaType: 'img',
    mediaCallback,
  },
});

If use the random_data, no need to specify the mediaType.

import { Feder } from '@zilliz/feder';
import * as d3 from 'd3';

const domSelector = '#container';
const filePath = [index_file_path];

const feder = new Feder({
  filePath,
  source: 'hnswlib',
  domSelector,
});

Step 5. Explore the index!

Visualize the overview

feder.overview();

or visualize the search process.

feder.search(target_vector[, targetMediaUrl]);

or randomly select an vector as the target to visualize the search process.

feder.searchRandTestVec();

More cases refer to the test/test.js

Blogs

Roadmap

We're still in the early stages, we will support more types of anns index, and more unstructured data viewer, stay tuned.

Acknowledgments

feder's People

Contributors

Stargazers

Watchers

feder's Issues

visualize pinecone with feder

Is there a way to visualize a pinecone vector database with feder

How to generate .index file ?

Hello feder team !

I'm trying to use the project, but I don't know how to generate the .index file, can you help me on this, please ?

I'm hoping for supporting faiss(hnsw)!

Can’t run test code with local index

Hello Feder team!

I'm trying to run your test example, but without using a remote index source location like `https://assets.zilliz.com/hnswlib_hnsw_voc_17k_1f1dfd63a9.index%60. I downloaded the index and put it in my directory.
With the remote index, all the visualizations were loaded, but with the local index, the visualizations didn't load and became stuck in an infinite loop.
Can you explain why this is happening, please?

Here is the code with both sources:

from federpy.federpy import FederPy

if __name__ == '__main__':
    hnswSource = 'hnswlib'
    hnswIndexFile = 'https://assets.zilliz.com/hnswlib_hnsw_voc_17k_1f1dfd63a9.index'
    # hnswIndexFile = 'hnswlib_hnsw_voc_17k_1f1dfd63a9.index'

    # Lite version, only input indexFile, no viewParams, no images.
    federPy_hnsw_lite = FederPy(hnswIndexFile, hnswSource)

    # federPy_hnsw_lite.overview()
    federPy_hnsw_lite.searchRandTestVec()

Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: No "exports" main defined in /node_modules/@zilliz/feder/package.json

I installed this package using npm but when I try to use it like this:

const { Feder } = require( '@zilliz/feder');

I get:

Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: No "exports" main defined in /node_modules/@zilliz/feder/package.json
    at new NodeError (node:internal/errors:405:5)
    at exportsNotFound (node:internal/modules/esm/resolve:259:10)
    at packageExportsResolve (node:internal/modules/esm/resolve:533:13)
    at resolveExports (node:internal/modules/cjs/loader:571:36)
    at Module._findPath (node:internal/modules/cjs/loader:645:31)
    at Module._resolveFilename (node:internal/modules/cjs/loader:1058:27)
    at Module._load (node:internal/modules/cjs/loader:925:27)
    at Module.require (node:internal/modules/cjs/loader:1139:19)
    at require (node:internal/modules/helpers:121:18)
    at Object.<anonymous> (/index.js:1:19) {
  code: 'ERR_PACKAGE_PATH_NOT_EXPORTED'
}

new version

Is there any updates on new version for feder? You have mentioned about that on my last issue which was about supporting text documents.

Why does my overview() load forever?

In colab, visualizing both hnsw and faiss using my own index with overview or searchRandTestVec shows the cell has finished running but the blue spinning circle goes on forever on a white output.

The same notebook can show the examples from https://colab.research.google.com/drive/12L_oJPR-yFDlORpPondsqGNTPVsSsUwi#scrollTo=N3qqBAYxYcbt though.

Is the problem with the index used?
I tried reducing the dimensionality to 5, even lower than the 10 in the example but it still loads forever.

How to investigate how the indexes at

hnswSource = 'hnswlib'
hnswIndexFile = 'https://assets.zilliz.com/hnswlib_hnsw_voc_17k_1f1dfd63a9.index'

ivfflatSource = 'faiss'
ivfflatIndexFile = 'https://assets.zilliz.com/faiss_ivf_flat_voc_17k_ab112eec72.index'

were built?

I tried to find the data using

import requests

hnswIndexFile = 'https://assets.zilliz.com/hnswlib_hnsw_voc_17k_1f1dfd63a9.index'
localFilename = 'hnswlib_hnsw_voc_17k_1f1dfd63a9.index'

response = requests.get(hnswIndexFile)
with open(localFilename, 'wb') as f:
    f.write(response.content)

p = hnswlib.Index(space='l2',dim=10)
p.load_index(localFilename)
p.get_items()

but got empty list.
I was expecting get_items() to show the data added to the index

Inquiry about displaying images using JavaScript with URL-based including token approach in federpy

I'm curious about whether it's possible to display images using a JavaScript-based URL approach in federpy when an access token is available (like GithubAPI). Does it require solely utilizing a self-hosted web server without utilizing a token?

[BUG] Lose a cluster when displayed

When I search with {"k": 5, "nprobe": 6} it only highlight with 5 clusters.

feder for text documents

I want to visualize my text documents and so i did. However i didn't find any way to get docids in a specific cluster. For example, when I search over a random vector with feder, library gives me a bunch of clusters. Is it possible to obtain which vectors are stored in a specified cluster?

Actually what i exactly wanted and asked is that, in your example you are able to see images with fine search (distance) (first attached file). I want to show small texts instead of images (second attached file).