Coder Social home page Coder Social logo

opendrivelab / ppgeo Goto Github PK

View Code? Open in Web Editor NEW
117.0 6.0 7.0 8.27 MB

[ICLR 2023] Pytorch implementation of PPGeo, a fully self-supervised driving policy pre-training framework to learn from unlabeled driving videos.

License: Apache License 2.0

Python 100.00%
end-to-end-autonomous-driving policy-learning self-supervised-learning

ppgeo's Introduction

PPGeo: Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling

teaser

Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling

This repository contains the pytorch implementation for PPGeo in the paper Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling. PPGeo is a fully self-supervised driving policy pre-training framework to learn from unlabeled driving videos.

Pre-trained Models

Model Google Drive Link BaiduYun Link
Visual Encoder (ResNet-34) ckpt ckpt (code: itqi)
DepthNet ckpt ckpt (code: xvof)
PoseNet ckpt ckpt (code: fp2n)

Get Started

  • Clone the repo and build the environment.
git clone https://github.com/OpenDriveLab/PPGeo.git
cd PPGeo
conda env create -f environment.yml --name PPGeo
conda activate PPGeo
  • Download the driving video dataset based on the instructions in ACO.

  • Make a symlink to the dataset root.

ln -s DATA_ROOT data
  • Preprocess the data.
python ytb_data_preprocess.py

Training

  • First stage training.
python train.py --id ppgeo_stage1_log --stage 1 --epochs 30
  • Second stage training.
python train.py --id ppgeo_stage2_log --stage 2 --epochs 20 --ckpt PATH_TO_STAGE1_CKPT

Downstream Tasks

Nuscenes Planning

  • Please download the nuScenes dataset first
  • Make a symlink to the nuScenes dataset root.
cd nuscenes_planning
cd data
ln -s nuScenes_data_root nuscenes
cd ..
  • Training the planning model
python train_planning.py --pretrained_ckpt PATH_TO_STAGE2_CKPT

Navigation & Navigation Dynamic & Reinforcement Learning

We use the DI-drive engine for IL data collection, IL training, IL evaluation, and PPO training following ACO with carla version 0.9.9.4. Some additional details can be found here.

Leaderboard Town05-long

We use the TCP codebase for training and evaluation with default setting.

Citation

If you find our repo or our paper useful, please use the following citation:

  @inproceedings{wu2023PPGeo,
    title={Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling},
    author={Penghao Wu and Li Chen and Hongyang Li and Xiaosong Jia and Junchi Yan and Yu Qiao},
    booktitle={International Conference on Learning Representations},
    year={2023}
  }

License

All code within this repository is under Apache License 2.0.

Acknowlegement

Our code is based on monodepth2.

ppgeo's People

Contributors

faikit avatar hli2020 avatar ilnehc avatar penghao-wu avatar vstar-seal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ppgeo's Issues

Why choosing only the first head in pose computation

Hi PPGeo Team,

The output of the Pose Decoder contains two heads for the axisangle and translation i.e. the shapes of the output are like [.., 2, ...]. But during the calculation of cam_T_cam, I see that only the first is ever used.

PPGeo/model.py

Lines 122 to 125 in bb37f52

outputs[("cam_T_cam", 0, -1)] = transformation_from_parameters(
axisangle1[:, 0], translation1[:, 0], invert=True)
outputs[("cam_T_cam", 0, 1)] = transformation_from_parameters(
axisangle2[:, 0], translation2[:, 0], invert=False)

Can you please help me clarify why the network predict two heads when only 1 of them is used? Is there any particular purpose the second head is solving because according to the code, I see that only the first head is ever used.

Make a symlink to the dataset root

Hi PPGeo team. It's amazing project. So i have question. I downloaded ACO's data to my computer and try to make a symlink to the dataset root. But I can't make a sylink to that.
Can you tell me how to make it? ( I am using the External hard drive.)

Thank you so much.

About Eigen-Cam in Fig4

Hi! I am very interesting in your work. I want to know which model (after 2 stage pretraining or finetuning model?) and which layer you used in visualization? Could you please provide more detail?

Thanks!

Length of `video_idx` not matching with the length of the other lists

Hi, I was going through the data processing script and had a question regarding the lengths of the lists being created.

In these lines, we see that the length of the lists being created is equal to len(tmp_path) - 4

prev_path += tmp_path[:-4]
cur_path += tmp_path[2:-2]
next_path += tmp_path[4:]

But here the len(video_idx) is equal to len(tmp_path)

video_idx += [video_i] * len(tmp_path[:-1])

Wouldn't this create a mismatch between some of the frames and their corresponding video indexes here?

ytb_meta = {"cur_path":cur_path, "next_path":next_path, "prev_path":prev_path, "video_idx":video_idx}

Pose decoder weights not included in checkpoint

Hi,

I tried to load depthnet and posenet, and find that, the weight pose decoder is not included in the ckpt file.

I print the keys for each ckpt(ppgeo_depth.ckpt ppgeo_pose.ckpt)

image

could I know how should I load the pretrained weight?

Thanks

About Eigen-Cam in Fig4

Hi! I am very interesting in your work. I want to know which model (after 2 stage pretraining or finetuning model?) and which layer you used in visualization? Could you please provide more detail?

Thanks!

编码器问题

实验中第一阶段用到的depth和pose 编码器是resnet 结构,第二阶段motionet 编码器也是resnet 结构,二者的设定有相关性吗?我理解第二阶段的视觉编码器应该是根据自己需求选用的吧?thanks

Collision Rate and L2 for nuScenes Planning

Hi, I had two questions regarding the nuScenes Planning task:

  1. The results mentioned in the paper (Table 2), after how many epochs did you train the model to get those scores?
  2. The table also shows a Collision Rate (%) metric which seems to be missing from the code in the repository. Will it be possible for you to share how the collision rate was calculated?

About Training Time And Gpu

Could you please telling me how long did you take in pre-training time,stage1 and stage2 respectively? And what gpu and how many did you use?I need to pre-train with my own datasets,so I have to evaluate time consuming.

Dataset sampling for Town05-long (TCP) downstream task

Hi,
I had a couple questions on the training data and simulation evaluation for the TCP downstream task.

  1. In the TCP paper, it is mentioned that the dataset size for ablations are 189k and for the leaderboard submission it's 420k. But for PPGeo Town05-long downstream evaluation task, 40k samples are used for training. I was wondering what's strategy for down-sampling 40k from 189k? Also, it would be great if you could release the scripts for those as well.

  2. For the evaluation part, are there any scripts for eg. in the TCP repo, that can be used to fetch the Driving score, infraction score, route completion, collisions pedestrians, etc.?

Training of the fc layer in resnet34 visual encoder

I was wondering is the fully connected layer in the visual encoder model being trained or is it frozen? Because the weight of the fully connected layer in the given checkpoint and the pretrained imagenet resnet34 encoder exactly matches.

import torch
from torchvision.models import resnet34

resnet_imagenet = resnet34(pretrained=True)
ppgeo_ckpt = torch.load('resnet34.ckpt')['state_dict']

torch.all(ppgeo_ckpt['fc.weight'] == resnet_imagenet.fc.weight)  # returns True

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.