Coder Social home page Coder Social logo

mjlbach / clip-fields Goto Github PK

View Code? Open in Web Editor NEW

This project forked from notmahi/clip-fields

0.0 1.0 0.0 82.85 MB

Teaching robots to respond to open-vocab queries with CLIP and NeRF-like neural fields

Home Page: https://mahis.life/clip-fields

C++ 0.29% Python 78.26% C 1.02% Cuda 20.43%

clip-fields's Introduction

CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

Teaching robots in the real world to respond to natural language queries with zero human labels โ€” using pretrained large language models (LLMs), visual language models (VLMs), and neural fields.

[Paper] [Website] [Code] [Data] [Video]

Authors: Mahi Shafiullah, Chris Paxton, Lerrel Pinto, Soumith Chintala, Arthur Szlam.

warm_up_my_lunch.mp4

Tl;dr CLIP-Field is a novel weakly supervised approach for learning a semantic robot memory that can respond to natural language queries solely from raw RGB-D and odometry data with no extra human labelling. It combines the image and language understanding capabilites of novel vision-language models (VLMs) like CLIP, large language models like sentence BERT, and open-label object detection models like Detic, and with spatial understanding capabilites of neural radiance field (NeRF) style architectures to build a spatial database that holds semantic information in it.

Installation

To properly install this repo and all the dependencies, follow these instructions.

# Clone this repo.
git clone --recursive https://github.com/notmahi/clip-fields
cd clip-fields

# Create conda environment and install the dependencies.
conda create -n cf python=3.8
conda activate cf
conda install -y pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c nvidia
pip install -r requirements.txt

# Install the hashgrid encoder with the relevant cuda module.
cd gridencoder
# For this part, it may be necessary to find out what your nvcc path is and use that, 
# For me $which nvcc gives public/apps/cuda/11.1/bin/nvcc, so I used the following part
# export CUDA_HOME=/public/apps/cuda/11.1
python setup.py install
cd ..

Interactive Tutorial and Evaluation

We have an interactive tutorial and evaluation notebook that you can use to explore the model and evaluate it on your own data. You can find them in the demo/ directory, that you can run after installing the dependencies.

Training a CLIP-Field directly

Once you have the dependencies installed, you can run the training script train.py with any .r3d files that you have! If you just want to try out a sample, download the sample data nyu.r3d and run the following command.

python train.py dataset_path=nyu.r3d

If you want to use LSeg as an additional source of open-label annotations, you should download the LSeg demo model and place it in the path_to_LSeg/checkpoints/demo_e200.ckpt. Then, you can run the following command.

python train.py dataset_path=nyu.r3d use_lseg=true

You can check out the config/train.yaml for a list of possible configuration options. In particular, if you want to train with any particular set of labels, you can specify them in the custom_labels field in config/train.yaml.

Acknowledgements

We would like to thank the following projects for making their code and models available, which we relied upon heavily in this work.

clip-fields's People

Contributors

cpaxton avatar notmahi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.