Coder Social home page Coder Social logo

pollen-robotics / pollen-vision Goto Github PK

View Code? Open in Web Editor NEW
84.0 3.0 5.0 87.52 MB

Simple and unified interface to zero-shot computer vision models curated for robotics use cases.

Home Page: https://www.pollen-robotics.com

License: Apache License 2.0

Python 100.00%
computer-vision grasping object-detection object-segmentation robotics

pollen-vision's Introduction

Pollen vision library

Simple and unified interface to zero-shot computer vision models curated for robotics use cases.

demo

Check out our HuggingFace space for an online demo or try pollen-vision in a Colab notebook!

Get started in very few lines of code!

Perform zero-shot object detection and segmentation on a live video stream from your webcam with the following code:

import cv2

from pollen_vision.vision_models.object_detection import OwlVitWrapper
from pollen_vision.vision_models.object_segmentation import MobileSamWrapper
from pollen_vision.perception.utils import Annotator, get_bboxes


owl = OwlVitWrapper()
sam = MobileSamWrapper()
annotator = Annotator()

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    predictions = owl.infer(
        frame, ["paper cups"]
    )  # zero-shot object detection | put your classes here
    bboxes = get_bboxes(predictions)

    masks = sam.infer(frame, bboxes=bboxes)  # zero-shot object segmentation
    annotated_frame = annotator.annotate(frame, predictions, masks=masks)

    cv2.imshow("frame", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        cv2.destroyAllWindows()
        break

Supported models

We continue to work on adding new models that could be useful for robotics perception applications.

We chose to focus on zero-shot models to make it easier to use and deploy. Zero-shot models can recognize objects or segment them based on text queries, without needing to be fine-tuned on annotated datasets.

Right now, we support:

Object detection

  • Yolo-World for zero-shot object detection and localization
  • Owl-Vit for zero-shot object detection and localization
  • Recognize-Anything for zero-shot object detection (without localization)

Object segmentation

  • Mobile-SAM for (fast) zero-shot object segmentation

Monocular depth estimation

  • Depth Anything for (non metric) monocular depth estimation

Below is an example of combining Owl-Vit and Mobile-Sam to detect and segment objects in a point cloud, all live. (Note: in this example, there is no temporal or spatial filtering of any kind, we display the raw outputs of the models computed independently on each frame)

pc_segmentation_doc3-2024-02-26_17.07.20.mp4

We also provide wrappers for the Luxonis cameras which we use internally. They allow to easily access the main features that are interesting to our robotics applications (RBG-D, onboard h264 encoding and onboard stereo rectification).

Installation

Installation

Note: This package has been tested on Ubuntu 22.04 and macOS (with M1 Pro processor), with python3.10.

Git LFS

This repository uses Git LFS to store large files. You need to install it before cloning the repository.

Ubuntu

sudo apt-get install git-lfs

macOS

brew install git-lfs

One line installation

You can install the package directly from the repository without having to clone it first with:

pip install "pollen-vision[vision] @ git+https://github.com/pollen-robotics/pollen-vision.git@main"

Note: here we install the package with the vision extra, which includes the vision models. You can also install the depthai_wrapper extra to use the Luxonis depthai wrappers.

Install from source

Clone this repository and then install the package either in "production" mode or "dev" mode.

👉 We recommend using a virtual environment to avoid conflicts with other packages.

After cloning the repository, you can either install everything with:

pip install .[all]

or install only the modules you want:

pip install .[depthai_wrapper]
pip install .[vision]

To add "dev" mode dependencies (CI/CD, testing, etc):

pip install -e .[dev]

Luxonis depthai specific information

If this is the first time you use luxonis cameras on this computer, you need to setup the udev rules:

echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
sudo udevadm control --reload-rules && sudo udevadm trigger
Gradio demo

Gradio demo

Test the demo online

A gradio demo is available on Pollen Robotics' Huggingface space. It allows to test the models on your own images without having to install anything.

Run the demo locally

If you want to run the demo locally, you can install the dependencies with the following command:

pip install pollen_vision[gradio]

You can then run the demo locally on your machine with:

python pollen-vision/gradio/app.py
Examples

Examples

Vision models wrappers

Check our example notebooks!

Luxonis depthai wrappers

Check our example scripts!

Twitter URL Linkedin URL

pollen-vision's People

Contributors

apirrone avatar fabiendanieau avatar simheo avatar stevenguyen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pollen-vision's Issues

Make SDKWrapper

This single wrapper will replace CvWrapper and DepthWrapper

It will output:

  • depth map aligned to left or right OAK-D-SR camera
  • left and right RGB OAK-D-SR cameras

Try to directly give 1440x1080 images to pipeline

  • Evaluate the increase in latency
  • noticed weird videoencoder behaviour when giving 1440x1080 images -> crops the image to get a 16:9 aspect ratio. Did not occur with 960x720, while they are the same ratio

write unit tests

Some functions can be checked with pre recorded data, for instance
compute_undistort_maps
get_inv_R_T
etc

[Refactor] Improve understandability and usability of imports

For ex, instead of

from vision_models.mobile_sam.mobile_sam_wrapper import MobileSamWrapper

do

from vision_models.object_segmentation.mobile_sam import MobileSamWrapper

Or ideally

from pollen_vision.object_segmentation import MobileSam
from pollen_vision.object_detection import OwlVit

Support OAK-D Pro

Add support for legacy OAK-D PRO. Could be useful for debugging.

{
    "socket_to_name": {
        "CAM_B": "right",
        "CAM_C": "left"
    },
    "inverted": false,
    "fisheye": false,
    "mono": true
}

Besides adding the config file, the compatibility with the teleop needs to be fixed
wrapper.py l 34

    # Assuming both cameras are the same
    width = connected_cameras_features[0].width
    height = connected_cameras_features[0].height

Add logger instead of print

It is better for a library to log instead of print, so we can know where a message comes from when it is used from another code

Write Readme

  • How to use the wrappers
  • Calibration / flashing procedure

Add demo notebooks.

Add demo notebooks for:

  • RAM (Reconize Anything)
  • OWL-ViT
  • SAM (Segment Anything)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.