pollen-robotics / pollen-vision Goto Github PK

View Code? Open in Web Editor NEW

84.0 3.0 5.0 87.52 MB

Simple and unified interface to zero-shot computer vision models curated for robotics use cases.

Home Page: https://www.pollen-robotics.com

License: Apache License 2.0

Python 100.00%

computer-vision grasping object-detection object-segmentation robotics

pollen-vision's Introduction

Pollen vision library

Simple and unified interface to zero-shot computer vision models curated for robotics use cases.

Check out our HuggingFace space for an online demo or try pollen-vision in a Colab notebook!

Get started in very few lines of code!

Perform zero-shot object detection and segmentation on a live video stream from your webcam with the following code:

import cv2

from pollen_vision.vision_models.object_detection import OwlVitWrapper
from pollen_vision.vision_models.object_segmentation import MobileSamWrapper
from pollen_vision.perception.utils import Annotator, get_bboxes


owl = OwlVitWrapper()
sam = MobileSamWrapper()
annotator = Annotator()

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    predictions = owl.infer(
        frame, ["paper cups"]
    )  # zero-shot object detection | put your classes here
    bboxes = get_bboxes(predictions)

    masks = sam.infer(frame, bboxes=bboxes)  # zero-shot object segmentation
    annotated_frame = annotator.annotate(frame, predictions, masks=masks)

    cv2.imshow("frame", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        cv2.destroyAllWindows()
        break

Supported models

We continue to work on adding new models that could be useful for robotics perception applications.

We chose to focus on zero-shot models to make it easier to use and deploy. Zero-shot models can recognize objects or segment them based on text queries, without needing to be fine-tuned on annotated datasets.

Right now, we support:

Object detection

Yolo-World for zero-shot object detection and localization
Owl-Vit for zero-shot object detection and localization
Recognize-Anything for zero-shot object detection (without localization)

Object segmentation

Mobile-SAM for (fast) zero-shot object segmentation

Monocular depth estimation

Depth Anything for (non metric) monocular depth estimation

Below is an example of combining Owl-Vit and Mobile-Sam to detect and segment objects in a point cloud, all live. (Note: in this example, there is no temporal or spatial filtering of any kind, we display the raw outputs of the models computed independently on each frame)

pc_segmentation_doc3-2024-02-26_17.07.20.mp4

We also provide wrappers for the Luxonis cameras which we use internally. They allow to easily access the main features that are interesting to our robotics applications (RBG-D, onboard h264 encoding and onboard stereo rectification).

Installation

Note: This package has been tested on Ubuntu 22.04 and macOS (with M1 Pro processor), with python3.10.

Git LFS

This repository uses Git LFS to store large files. You need to install it before cloning the repository.

Ubuntu

sudo apt-get install git-lfs

macOS

brew install git-lfs

One line installation

You can install the package directly from the repository without having to clone it first with:

pip install "pollen-vision[vision] @ git+https://github.com/pollen-robotics/pollen-vision.git@main"

Note: here we install the package with the vision extra, which includes the vision models. You can also install the depthai_wrapper extra to use the Luxonis depthai wrappers.

Install from source

Clone this repository and then install the package either in "production" mode or "dev" mode.

👉 We recommend using a virtual environment to avoid conflicts with other packages.

After cloning the repository, you can either install everything with:

pip install .[all]

or install only the modules you want:

pip install .[depthai_wrapper]
pip install .[vision]

To add "dev" mode dependencies (CI/CD, testing, etc):

pip install -e .[dev]

Luxonis depthai specific information

If this is the first time you use luxonis cameras on this computer, you need to setup the udev rules:

echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
sudo udevadm control --reload-rules && sudo udevadm trigger

Gradio demo

Test the demo online

A gradio demo is available on Pollen Robotics' Huggingface space. It allows to test the models on your own images without having to install anything.

Run the demo locally

If you want to run the demo locally, you can install the dependencies with the following command:

pip install pollen_vision[gradio]

You can then run the demo locally on your machine with:

python pollen-vision/gradio/app.py

Examples

Vision models wrappers

Check our example notebooks!

Luxonis depthai wrappers

Check our example scripts!

pollen-vision's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes tomchapin barath19 ssilenzi khankindle

pollen-vision's Issues

Forgot setLegacyPattern(True) in generateBoard.py like a noob

Make SDKWrapper

This single wrapper will replace CvWrapper and DepthWrapper

It will output:

depth map aligned to left or right OAK-D-SR camera
left and right RGB OAK-D-SR cameras

Add Jupyter dependencies for black.

Should not be more than

pip install black[jupyter]

Package RAM object description generator.

Try to directly give 1440x1080 images to pipeline

Evaluate the increase in latency
noticed weird videoencoder behaviour when giving 1440x1080 images -> crops the image to get a 16:9 aspect ratio. Did not occur with 960x720, while they are the same ratio

Backup and restore calibration

write unit tests

Some functions can be checked with pre recorded data, for instance
compute_undistort_maps
get_inv_R_T
etc

[Refactor] Improve understandability and usability of imports

For ex, instead of

from vision_models.mobile_sam.mobile_sam_wrapper import MobileSamWrapper

from vision_models.object_segmentation.mobile_sam import MobileSamWrapper

Or ideally

from pollen_vision.object_segmentation import MobileSam
from pollen_vision.object_detection import OwlVit

Speed up camera startup

Improve calibration documentation

annotate() should not require labels colors

Support OAK-D Pro

Add support for legacy OAK-D PRO. Could be useful for debugging.

{
    "socket_to_name": {
        "CAM_B": "right",
        "CAM_C": "left"
    },
    "inverted": false,
    "fisheye": false,
    "mono": true
}

Besides adding the config file, the compatibility with the teleop needs to be fixed
wrapper.py l 34

    # Assuming both cameras are the same
    width = connected_cameras_features[0].width
    height = connected_cameras_features[0].height

How to use the wrappers
Calibration / flashing procedure

RAM (Reconize Anything)
OWL-ViT
SAM (Segment Anything)

Ability to specify device MxID to the wrappers

Calibration procedure

Add the acquire.py script to this repo

document calibration procedure