Coder Social home page Coder Social logo

feder's Introduction

Feder

What is feder

Feder is a JavaScript tool designed to aid in the comprehension of embedding vectors. It visualizes index files from Faiss, HNSWlib, and other ANN libraries to provide insight into how these libraries function and the concept of high-dimensional vector embeddings. Currently, Feder is primarily focused on the IVF_FLAT index file type from Faiss and the HNSW index file type from HNSWlib, though additional index types will be added in the future.

Feder is written in javascript, and we also provide a python library federpy, which is based on federjs.

NOTE:

  • In IPython environment, it supports users to generate the corresponding visualization directly.
  • In other environments, it supports outputting visualizations as html files, which can be opened by the user through the browser with web service enabled.

Online demos

How feder works

Wiki

HNSW visualization screenshots

image

IVF_Flat visualization screenshots

image image image

Quick Start

Installation

Use npm or yarn.

yarn install @zilliz/feder

Material Preparation

Make sure that you have built an index and dumped the index file by Faiss or HNSWlib.

Init Feder

Specifying the dom container that you want to show the visualizations.

import { Feder } from '@zilliz/feder';

const feder = new Feder({
  filePath: 'faiss_file', // file path
  source: 'faiss', // faiss | hnswlib
  domSelector: '#container', // attach dom to render
  viewParams: {}, // optional
});

Visualize the index structure.

  • HNSW - Feder will show the top-3 levels of the hnsw-tree.
  • IVF_Flat - Feder will show all the clusters.
feder.overview();

Explore the search process.

Set search parameters (optional) and Specify the query vector.

feder
  .setSearchParams({
    k: 8, // hnsw, ivf_flat
    ef: 100, // hnsw (ef_search)
    nprobe: 8, // ivf_flat
  })
  .search(target_vector);

Examples

We prepare a simple case, which is the visualizations of the hnsw and ivf_flat with 17,000+ vectors that embedded from VOC 2012).

git clone [email protected]:zilliztech/feder.git
cd feder
yarn install
yarn dev

Then open http://localhost:12355/

It will show 4 visualizations:

  • hnsw overview
  • hnsw search view
  • ivf_flat overview
  • ivf_flat search view

Feder for Large Index

Feder consists of three components:

  • FederIndex - parse the index file. It requires a lot of memory.
  • FederLayout - layout calculations. It consumes a lot of computational resources.
  • FederView - render and interaction.

In case of excessive amount of data, we support separating the computation part and running it on a node server. We have two solutions for you:

  • oneServer
    • federServer (with FederIndex and FederLayout).
  • twoServer
    • indexServer (with FederIndex)
    • layoutServer (with FederLayout)

Referring to case/oneServer and case/twoServer.

Example with One Server

  1. launch the server
yarn test_one_server_backend
  1. launch the front web service
yarn test_one_server_front
  1. open http://localhost:8000

Example with Two Servers

  1. launch the FederIndex server
yarn test_two_server_feder_index
  1. launch the FederLayout server
yarn test_two_server_feder_layout
  1. launch the front web service
yarn test_two_server_front
  1. open http://localhost:8000

Pipeline - explore a new dataset with feder

Step 1. Dataset preparation

Put all images to test/data/images/. (example dataset VOC 2012)

You can also generate random vectors without embedding for index building and skip to step 3.

Step 2. Generate embedding vectors

Recommend to use towhee, one line of code to generating embedding vectors!

We have the encoded vectors ready for you.

Step 3. Build an index and dump it.

You can use faiss or hnswlib to build the index.

(*Detailed procedures please refer to their tutorials.)

Referring to test/data/gen_hnswlib_index_*.py or test/data/gen_faiss_index_*.py

Or we have the index file ready for you.

Step 4. Init Feder.

import { Feder } from '@zilliz/feder';
import * as d3 from 'd3';

const domSelector = '#container';
const filePath = [index_file_path];
const source = "hnswlib"; // "hnswlib" or "faiss"

const mediaCallback = (rowId) => mediaUrl;

const feder = new Feder({
  filePath,
  source,
  domSelector,
  viewParams: {
    mediaType: 'img',
    mediaCallback,
  },
});

If use the random_data, no need to specify the mediaType.

import { Feder } from '@zilliz/feder';
import * as d3 from 'd3';

const domSelector = '#container';
const filePath = [index_file_path];

const feder = new Feder({
  filePath,
  source: 'hnswlib',
  domSelector,
});

Step 5. Explore the index!

Visualize the overview

feder.overview();

or visualize the search process.

feder.search(target_vector[, targetMediaUrl]);

or randomly select an vector as the target to visualize the search process.

feder.searchRandTestVec();

More cases refer to the test/test.js

Blogs

Roadmap

We're still in the early stages, we will support more types of anns index, and more unstructured data viewer, stay tuned.

Acknowledgments

feder's People

Contributors

alwayslove2013 avatar nameczz avatar shanghaikid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

feder's Issues

How to generate .index file ?

Hello feder team !

I'm trying to use the project, but I don't know how to generate the .index file, can you help me on this, please ?

Can’t run test code with local index

Hello Feder team!

I'm trying to run your test example, but without using a remote index source location like `https://assets.zilliz.com/hnswlib_hnsw_voc_17k_1f1dfd63a9.index%60. I downloaded the index and put it in my directory.
With the remote index, all the visualizations were loaded, but with the local index, the visualizations didn't load and became stuck in an infinite loop.
Can you explain why this is happening, please?

Here is the code with both sources:

from federpy.federpy import FederPy

if __name__ == '__main__':
    hnswSource = 'hnswlib'
    hnswIndexFile = 'https://assets.zilliz.com/hnswlib_hnsw_voc_17k_1f1dfd63a9.index'
    # hnswIndexFile = 'hnswlib_hnsw_voc_17k_1f1dfd63a9.index'

    # Lite version, only input indexFile, no viewParams, no images.
    federPy_hnsw_lite = FederPy(hnswIndexFile, hnswSource)

    # federPy_hnsw_lite.overview()
    federPy_hnsw_lite.searchRandTestVec()

Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: No "exports" main defined in /node_modules/@zilliz/feder/package.json

I installed this package using npm but when I try to use it like this:

const { Feder } = require( '@zilliz/feder');

I get:

Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: No "exports" main defined in /node_modules/@zilliz/feder/package.json
    at new NodeError (node:internal/errors:405:5)
    at exportsNotFound (node:internal/modules/esm/resolve:259:10)
    at packageExportsResolve (node:internal/modules/esm/resolve:533:13)
    at resolveExports (node:internal/modules/cjs/loader:571:36)
    at Module._findPath (node:internal/modules/cjs/loader:645:31)
    at Module._resolveFilename (node:internal/modules/cjs/loader:1058:27)
    at Module._load (node:internal/modules/cjs/loader:925:27)
    at Module.require (node:internal/modules/cjs/loader:1139:19)
    at require (node:internal/modules/helpers:121:18)
    at Object.<anonymous> (/index.js:1:19) {
  code: 'ERR_PACKAGE_PATH_NOT_EXPORTED'
}

new version

Is there any updates on new version for feder? You have mentioned about that on my last issue which was about supporting text documents.

Why does my overview() load forever?

In colab, visualizing both hnsw and faiss using my own index with overview or searchRandTestVec shows the cell has finished running but the blue spinning circle goes on forever on a white output.

The same notebook can show the examples from https://colab.research.google.com/drive/12L_oJPR-yFDlORpPondsqGNTPVsSsUwi#scrollTo=N3qqBAYxYcbt though.

Is the problem with the index used?
I tried reducing the dimensionality to 5, even lower than the 10 in the example but it still loads forever.

How to investigate how the indexes at

hnswSource = 'hnswlib'
hnswIndexFile = 'https://assets.zilliz.com/hnswlib_hnsw_voc_17k_1f1dfd63a9.index'

ivfflatSource = 'faiss'
ivfflatIndexFile = 'https://assets.zilliz.com/faiss_ivf_flat_voc_17k_ab112eec72.index'

were built?

I tried to find the data using

import requests

hnswIndexFile = 'https://assets.zilliz.com/hnswlib_hnsw_voc_17k_1f1dfd63a9.index'
localFilename = 'hnswlib_hnsw_voc_17k_1f1dfd63a9.index'

response = requests.get(hnswIndexFile)
with open(localFilename, 'wb') as f:
    f.write(response.content)

p = hnswlib.Index(space='l2',dim=10)
p.load_index(localFilename)
p.get_items()

but got empty list.
I was expecting get_items() to show the data added to the index

feder for text documents

I want to visualize my text documents and so i did. However i didn't find any way to get docids in a specific cluster. For example, when I search over a random vector with feder, library gives me a bunch of clusters. Is it possible to obtain which vectors are stored in a specified cluster?

Actually what i exactly wanted and asked is that, in your example you are able to see images with fine search (distance) (first attached file). I want to show small texts instead of images (second attached file).
1
2

[Feature] Same cluster board with one index file.

When I define FederPy with the same index file and search different data, it shows different base cluster boards.

I want to display the same cluster board with an index file so that I can search faiss_feder.searchById(40) and faiss_feder.searchById(50) on the same board.

Add license to repo

Hi feder team,

This is a fantastic project! Do you consider adding a license to the feder repo? We want to further contribute to this project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.