Coder Social home page Coder Social logo

2toinf / decisionnce Goto Github PK

View Code? Open in Web Editor NEW
62.0 3.0 1.0 24.94 MB

[ICML 2024] The offical Implementation of "DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning"

Home Page: https://2toinf.github.io/DecisionNCE/

License: MIT License

Python 51.62% CSS 6.76% HTML 35.46% Shell 0.45% Jupyter Notebook 5.71%
decision-making embodied-ai multimodal pytorch-implementation representation-learning robotics

decisionnce's Introduction

DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

[Project Page] [Paper]

🔥 DecisionNCE has been accepted by ICML2024 and selected as outstanding paper at MFM-EAI workshop@ICML2024

Introduction

DecisionNCE , mirrors an InfoNCE-style objective but is distinctively tailored for decision-making tasks, providing an embodied representation learning framework that elegantly extracts both local and global task progression features , with temporal consistency enforced through implicit time contrastive learning, while ensuring trajectory-level instruction grounding via multimodal joint encoding. Evaluation on both simulated and real robots demonstrates that DecisionNCE effectively facilitates diverse downstream policy learning tasks, offering a versatile solution for unified representation and reward learning.

Contents

Quick Start

Install

  1. Clone this repository and navigate to DecisionNCE folder
git clone https://github.com/2toinf/DecisionNCE.git
cd DecisionNCE
  1. Install Package
conda create -n decisionnce python=3.8 -y
conda activate decisionnce
git clone https://github.com/2toinf/DecisionNCE.git
cd DecisionNCE
pip install -e .

Usage

import DecisionNCE
import torch
from PIL import Image
# Load your DecisionNCE model

device = "cuda" if torch.cuda.is_available() else "cpu"
model = DecisionNCE.load("DecisionNCE-P", device=device)

image = Image.open("Your Image Path Here")
text = "Your Instruction Here"

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    reward = model.get_reward(image, text) # please note that number of image and text should be the same

API

decisionnce.load(name, device)

Returns the DecisionNCE model specified by the model name returned by decisionnce.available_models(). It will download the model as necessary. The name argument should be DecisionNCE-P or DecisionNCE-T

The device to run the model can be optionally specified, and the default is to use the first CUDA device if there is any, otherwise the CPU.


The model returned by decisionnce.load() supports the following methods:

model.encode_image(image: Tensor)

Given a batch of images, returns the image features encoded by the vision portion of the DecisionNCE model.

model.encode_text(text: Tensor)

Given a batch of text tokens, returns the text features encoded by the language portion of the DecisionNCE model.

Train

Pretrain

We pretrain vision and language encoder jointly with DecisionNCE-P/T on EpicKitchen-100 dataset. We provide training code and script in this repo. Please follow the instructions below to start training.

  1. Data preparation

Please follow the offical instructions and download the EpicKitchen-100 RGB images here. And we provide our training annotations reorganized according to the official version

  1. start training

We use Slurm for multi-node distributed finetuning.

sh ./script/slurm_train.sh

Please fill in your image and annotation path in the specified location of the script.

Model Zoo

Models Pretaining Methods Params
(M)
Iters Pretrain ckpt
RN50-CLIP DecisionNCE-P 386 2W link
RN50-CLIP DecisionNCE-T 386 2W link

Evaluation

Result

  1. simulation

  1. real robot

Visualization

We provide our jupyter notebook to visualize the reward curves. Please install jupyter notebook first.

conda install jupyter notebook

TO BE UPDATE

Citation

If you find our code and paper can help, please cite our paper as:

@inproceedings{lidecisionnce,
  title={DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning},
  author={Li, Jianxiong and Zheng, Jinliang and Zheng, Yinan and Mao, Liyuan and Hu, Xiao and Cheng, Sijie and Niu, Haoyi and Liu, Jihao and Liu, Yu and Liu, Jingjing and others},
  booktitle={Forty-first International Conference on Machine Learning}
}

decisionnce's People

Contributors

2toinf avatar facebear-ljx avatar zhengyinan-air avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

air-di

decisionnce's Issues

Question about reward learning

Great job!
I have a question about rewarding learning,Is the reward calculated directly by S(φ(on), ψ(l)) or S(φ(on+1), ψ(l)) − S(φ(on), ψ(l))? I noticed that the get_reward function of the model is calculated directly by S(φ(on), ψ(l)) and it is written in the appendix of the paper that it is calculated by S(φ(on+1), ψ(l)) − S(φ(on), ψ(l)). If it's calculated by S(φ(on+1), ψ(l)) − S(φ(on), ψ(l)),Then how should the reward of the last step be calculated?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.