Coder Social home page Coder Social logo

lancedb / yoloexplorer Goto Github PK

View Code? Open in Web Editor NEW
107.0 107.0 19.0 16.08 MB

YOLOExplorer : Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within seconds

Python 85.33% HTML 3.54% JavaScript 2.26% CSS 4.03% TypeScript 4.84%
computer-vision object-detection yolov5 yolov8

yoloexplorer's Introduction

LanceDB Logo

Developer-friendly, database for multimodal AI

LanceDB lancdb Blog Discord Twitter

LanceDB Multimodal Search


LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings.

The key features of LanceDB include:

  • Production-scale vector search with no servers to manage.

  • Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).

  • Support for vector similarity search, full-text search and SQL.

  • Native Python and Javascript/Typescript support.

  • Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.

  • GPU support in building vector index(*).

  • Ecosystem integrations with LangChain ๐Ÿฆœ๏ธ๐Ÿ”—, LlamaIndex ๐Ÿฆ™, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

LanceDB's core is written in Rust ๐Ÿฆ€ and is built using Lance, an open-source columnar format designed for performant ML workloads.

Quick Start

Javascript

npm install vectordb
const lancedb = require('vectordb');
const db = await lancedb.connect('data/sample-lancedb');

const table = await db.createTable({
  name: 'vectors',
  data:  [
    { id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
    { id: 2, vector: [1.1, 1.2], item: "bar", price: 50 }
  ]
})

const query = table.search([0.1, 0.3]).limit(2);
const results = await query.execute();

// You can also search for rows by specific criteria without involving a vector search.
const rowsByCriteria = await table.search(undefined).where("price >= 10").execute();

Python

pip install lancedb
import lancedb

uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
                         data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
                               {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_pandas()

Blogs, Tutorials & Videos

yoloexplorer's People

Contributors

ayushexel avatar dependabot[bot] avatar hardikdava avatar onuralpszr avatar prashantdixit0 avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yoloexplorer's Issues

How to use own dataset?

Hi there! Thanks for the great work!

If we want to use our own dataset, what should be included in the dataset? I mean, the directory format, labels, etc. Considering we have a dataset of images for a park, should we also have the labels for these images?

Hope this is not a stupid problem : )

Best regards,
Zijia

Roadmap, Scope and API design

Scope

Roadmap to V1:

  • Dataset management - Simple ways to double down on specific parts of the dataset

Searching

  • Using SQL

  • Using Semantic search

  • Support multiple datasets, i.e, allow semantic search across datasets (Via staging area)

  • Using natural Language / AI search - Example- "Show me all images where a person is standing next to a car and the sample should be recorded before 2015"

  • Allow enriching datasets by adding/removing samples from various sources, including other datasets ( secondary datasets of data lakes)

  • Allow Various dataset formats (Users can load datasets from any format including 3rd party sources like RF, DagsHub, etc.)

  • Allow loading tables directly - local, or cloud based ( s3 etc.)

  • Allow various tasks:
  • Detection
  • Segmentation
  • Classification
  • ImgFolder structure used for generative AI or diffusion model samples

API Design:

  • GUI/ Dashboard - TODO
  • Pythonic/ Notebook - TODO

Contribution

Hello @AyushExel :wave, I want to contribute to this repo. Is there any contribution guideline or list of todos available?

Thank you.

Use recordbatch iterator when creating tables

Currently the table is built using the entire dataset in one go which'll probably fail for massive datasets due to memory constraints. Switch to iterator for initialization of LanceDB tables.

Create Dashboard

Allows users to plot images, analytics and embeddings in interactive browser plots. the current solution (matplotlib) isn't scalable.
The best candidate that I found was plotly dash(https://dash.plotly.com/) but there might be more so open to suggestions.
The idea is avoid building the UI from scratch using js/node as it'll make installation complex and development slow.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.