aau-cns / poet Goto Github PK

PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation

License: Other

Python 99.94% Dockerfile 0.06%

poet's Introduction

PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation

Introduction

This repository is the official implementation of the paper PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation.

PoET is a transformer-based framework that takes a single RGB-image as input to simultaneously estimate the 6D pose, namely translation and rotation, for every object present in the image. It takes the detections and feature maps of an object detector backbone and feeds this additional information into an attention-based transformer. Our framework can be trained on top of any object detector framework. Any additional information that is not contained in the raw RGB image, e.g. depth maps or 3D models, is not required. We achieve state-of-the-art-results on challenging 6D object pose estimation datasets. Moreover, PoET can be utilized as a pose sensor in 6D localization tasks.

Abstract: Accurate 6D object pose estimation is an important task for a variety of robotic applications such as grasping or localization. It is a challenging task due to object symmetries, clutter, occlusion and different scenes, but it becomes even more challenging when additional information, such as depth and 3D models, is not provided. We present a transformer-based approach that takes an RGB image as input and predicts a 6D pose for each object in the image. Besides the image, our network does not require any additional information such as depth maps or 3D object models. First, the image is passed through an object detector to generate feature maps and to detect objects. Second, these feature maps are fed into a transformer while the detected bounding boxes are provided as additional information. Afterwards, the output object queries are processed by a separate translation and rotation head. We achieve state-of-the-art results for RGB-only approaches on the challenging YCB-V dataset. We illustrate the suitability of the resulting model as pose sensor for a 6-DoF state estimation task.

License

This software is made available to the public to use (source-available), licensed under the terms of the BSD-2-Clause-License with no commercial use allowed, the full terms of which are made available in the LICENSE file. No license in patents is granted.

Citing PoET

If you use PoET for academic research, please cite the corresponding paper and consult the LICENSE file for a detailed explanation.

@inproceedings{jantos2023poet,
  title={PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation},
  author={Jantos, Thomas Georg and Hamdad, Mohamed Amin and Granig, Wolfgang and Weiss, Stephan and Steinbrener, Jan},
  booktitle={Conference on Robot Learning},
  pages={1060--1070},
  year={2023},
  organization={PMLR}
}

Getting Started

Requirements

PoET was tested with the following setup

Linux 20.04
CUDA 11.4
Python 3.8.8
PyTorch 1.9
other standard packages: numpy, scipy, cv2, cython
other non-standard packages: mish-cuda, deformable_attention

Docker

We recommend to use PoET within a Docker container. We provide a prebuild and tested Docker image with all the required packages. The Docker image can be pulled with the following command:

docker pull aaucns/poet:latest

PoET can then be run inside the docker container in combination with command line arguments. An example is:

docker run --entrypoint= -v /path/to/code/poet:/opt/project -v /path/to/data:/data -v /path/to/output:/output --rm --gpus all aaucns/poet:latest python -u ../opt/project/main.py --epochs 50 --batch_size 16 --enc_layers 5 --dec_layers 5 --n_heads 16

Evaluation & Inference

The code also allows to evaluate a pre-trained PoET model on a pose dataset, containing ground truth information, or to perform inference on a custom dataset, where only the images are available.

In evaluation PoET processes all images, predicts the 6D pose for all detected objects and calculates the in the paper described evaluation metrics basd on the provided ground truth. The parameter --eval allows to evaluate for the ADD, ADD-S & ADD(-S) and calculates the translation and rotation error. On the other hand, --eval_bop stores the results of PoET in BOP format such that it can be used with the BOP toolbox to evaluate for the metrics of the BOP Challenge. To run PoET in evalaution mode in the Docker container:

docker run --entrypoint= -v /path/to/code/poet:/opt/project -v /path/to/data:/data -v /path/to/output:/output --rm --gpus all aaucns/poet:latest python -u ../opt/project/main.py --eval_batch_size 16 --enc_layers 5 --dec_layers 5 --n_heads 16 --resume /path/to/model/checkpoint0049.pth --eval

Please remeber to set the --eval_set parameter correctly.

In a lot of cases we want to perform inference with PoET on data that has no ground truth annotation. For this we provide our inference_tools. Currently it contains a simple script to load a custom dataset, processes every image and stores the 6D pose predcitions in a JSON file. The inference mode can be simply activated by using the --inference flag and setting the parameters correctly. To run PoET in inference mode in the Docker container:

docker run --entrypoint= -v /path/to/code/poet:/opt/project -v /path/to/data:/data -v /path/to/output:/output --rm --gpus all aaucns/poet:latest python -u ../opt/project/main.py --enc_layers 5 --dec_layers 5 --n_heads 16 --resume /path/to/model/checkpoint0049.pth --inference --inference_path /path/to/inference/data --inference_output /path/to/output/dir

Distributed Training

If you have multiple GPUs it is possible to train PoET with this script. To launch distributed training, run

python launch_distributed.py --train_arg_1 --traing_arg_2

So for example, if you run single GPU training using

python main.py --epochs 100 --resume output/checkpoint.pth --num_workers 6

It would then be

python launch_distributed.py --epochs 100 --resume output/checkpoint.pth --num_workers 6

Please checkout the runtime arguments in the launch_distributed.py and main.py scripts and adapt them to your scenario (e.g. number of GPUs). The distributed training also works in the provided docker container, however it requires an additional runtime argument:

docker run --entrypoint= -v /path/to/code/poet:/opt/project -v /path/to/data:/data -v /path/to/output:/output --rm --ipc=host --gpus all aaucns/poet:latest python -u ../opt/project/launch_distributed.py --train_arg_1 --traing_arg_2

Scaled-YOLOv4 Backbone

This repository allows the user to run PoET with a Mask R-CNN object detector backbone. However, if you would like to reproduce the state-of-the-art results presented in our paper you can download our wrapper for the Scaled-YOLOv4 object detector backbone from our Github (License: GPL 3.0). The backbone can be integrated easily into PoET by placing the code into the models directory.

Model Zoo

Pretrained models and corresponding hyperparameter configurations can be downloaded from our website.

BOP Datasets

For both the YCB-V as well as the LM-O datasets, we use the dataset as provided by the BOP Challenge webpage. However, we take the annotations provided by the BOP challenge and transform them to a format inspired by the COCO annotaiton format. This means that each. The corresponding script can be found in data_utils. The general format of the annotation.json file is

images : list of images in the dataset with the following information
- file_name : path to file
- id : unique image ID
- width : width of the image
- height : height of the image
- intrinsics : array containing the camera intrinsics (3x3 matrix)
- type : indicator whether the image is real ("real"), synthetically generated by projecting the 3D model ("synth") or generated by photo-realistic simulation ("pbr").
categories : list of categories where each entry contains the name and the ID of the class. Note: class 0 is the background class.
annotations : list of all annotated objects across all images
- id : unique annotation ID
- image_id : refers to the image this annotation belongs to
- relative_pose
  - position : relative translation of the object with respect to the camera (x, y, z)
  - rotation : relative rotation of the object with respect to the camera (3x3 rotation matrix)
- bbox : upper left corner (x1, y1) and width and height of the bounding box in absolute pixel value
- category_id : ID to wich category the object belongs to.

We use a dataloader that requires this specific data structure and thus it might be necessary to adapt the dataloader.

poet's People

Contributors

Stargazers

Watchers

Forkers

dl-vit cptcaptain hannahhaensen 3bsamad anderzz kakhileshkrishna everestrs hiyyg dharnishraja weniseb easonchen99 tundon

poet's Issues

fail to pull docker images

Hello~
There is below error while pulling the provided docker image

    Error response from daemon: Get https://gitlab.aau.at:5050/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

    Could you please provide other docker image link?
    Thanks a lot!

Great Repository

Internal notification test.

Distributed Training in Docker

First of all, thanks for the great work!

I am using the provided docker image, and currently I am trying to run distributed training, since training on only one GPU is slow. I have 3 GTX 1080s (IDs 0,1,2).
I added the following args to get_args_parser() in main.py since I couldn't find args.distributed:

# * Distributed training parameters
parser.add_argument('--distributed', action='store_true', default=False, help='Use multi-processing distributed training to launch ')
parser.add_argument('--world_size', default=3, type=int, help='number of distributed processes/ GPUs to use')
parser.add_argument('--dist_url', default='env://', help='url used to set up distributed training')
parser.add_argument('--dist_backend', default='nccl', type=str, help='distributed backend') 
parser.add_argument('--local_rank', default=0, type=int, help='rank of the process')     
parser.add_argument('--gpu', default=0, type=int, help='rank of the process')

Then, in util/misc.py, in init_distributed_mode(args) I added the following:

if 'LOCAL_RANK' not in os.environ:
        os.environ['LOCAL_RANK'] = str(args.local_rank)
    if 'RANK' not in os.environ:
        os.environ['RANK'] = str(args.gpu)
    if 'WORLD_SIZE' not in os.environ:
        os.environ['WORLD_SIZE'] = str(args.world_size)
    if 'MASTER_ADDR' not in os.environ:
        os.environ['MASTER_ADDR'] = '192.168.179.13'
    if 'MASTER_PORT' not in os.environ:
        os.environ['MASTER_PORT'] = '8888'

Everything works fine when I start training up until this point:

torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
                                         world_size=args.world_size, rank=args.gpu)

Where it gets stuck, showing this in the terminal
| distributed init (rank 0): env:// Until I kill the process.

I have tried distributed training in docker before, using this simple example script:

import os
import torch
import torch.distributed as dist
import torch.multiprocessing as mp
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.nn.parallel import DistributedDataParallel
from torch.utils.data import DataLoader
from torch.utils.data.distributed import DistributedSampler
 
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc = nn.Linear(784, 5000)
        self.fc2 = nn.Linear(5000, 10)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        x = self.fc2(x)
        return x
 
def train(rank, num_gpus):
 # "nccl", "gloo"
    dist.init_process_group(
    backend="nccl", init_method="env://", world_size=num_gpus, rank=rank
    )
    torch.cuda.set_device(rank)
    
    model = SimpleNet().to(rank)
    ddp_model = DistributedDataParallel(model, device_ids=[rank])
    print("Rank ", rank, ", Model Created")
    transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]
    )
    train_set = datasets.MNIST("./data", download=True, train=True, transform=transform)
    train_sampler = DistributedSampler(
    dataset=train_set, num_replicas=num_gpus, rank=rank
    )
    train_loader = DataLoader(
    dataset=train_set,
    batch_size=4,
    shuffle=False,
    num_workers=0,
    pin_memory=False,
    sampler=train_sampler,
    )
    
    criterion = nn.CrossEntropyLoss().to(rank)
    optimizer = optim.SGD(ddp_model.parameters(), lr=0.01)
    
    for epoch in range(1000):
        running_loss = 0.0
        for inputs, labels in train_loader:
            inputs = inputs.to(rank)
            labels = labels.to(rank)
            optimizer.zero_grad()
            outputs = ddp_model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        
        print("Rank ", rank, ", Epoch ", epoch, ", Loss: ", running_loss)
        
def main():
 num_gpus = 3
 os.environ["MASTER_ADDR"] = "localhost"
 os.environ["MASTER_PORT"] = "16855"
 mp.spawn(train, args=(num_gpus,), nprocs=num_gpus, join=True)
 
if __name__ == "__main__":
 main()

I am a bit new to implementing distributed training, and was wondering what might be wrong/missing here. Any help would be appreciated!

YCB-Video dataset link

Hi I was trying to train the YOLO model (https://github.com/aau-cns/yolov4) using YCB-V dataset (in order to do some tests for the poet model). Can you link me the YCB version you used for training that YOLO model? I noticed that in datasets.py you look for labels and images files that are not present in the standard YCB dataset. Since my final goal is to train both YOLO and poet on a custom dataset, I'm interested into understanding how to train both, since as I read from other issues the backbone is currently not trainable directly in poet.

Moreover, is there a way to use the pre-trained .pth model you provided? I see in yolov4/train.py that I can load weights of the model but the file extension is .pt and not .pth (sorry if this sounds as a silly question but I'm quite new to deep learning)

Best regards, Daniele

Question for the ycbv2poet.py

The ycbv2poet.py file seems so weird.

base_path = '/data'
data_paths = ['test/', 'train_real/', 'train_synt/']
img_types = ['real', 'synt', 'test']


output_base_path = '/data/ycbv/annotations/'
annotation_paths = ['train_real.json', 'train_synt.json']
...
for data_path, ann_path, img_type in zip(data_paths, annotation_paths, img_types):

It looks like you only create 2 files train_real.json and train_synt.json with test/ write into train_real.json and train_real write into train_synt.json.

Besides it seems like the test/ data was also written into the train_synt.json since you didn't reinitialize the "annotations".

This is my modification of the file, but I'm not sure if the output is the same as yours.

Could you please update the file you used for model training? Thanks a lot!

...
base_path = 'path/to/ycbv'
data_paths = ['test/', 'train_real/', 'train_synt/']
img_types = ['test','real', 'synt']


output_base_path = 'annotations/'
annotation_paths = ['test.json','train_real.json', 'train_synt.json']

...

annotations = {'images': [],
               'categories': categories,
               'annotations': []}
image_id = 0
annotation_id = 0
annotations_removed = 0
for data_path, ann_path, img_type in zip(data_paths, annotation_paths, img_types):
    annotations = {'images': [], #reinitialize annotations
               'categories': categories,
               'annotations': []}
    image_id = 0
    annotation_id = 0
    annotations_removed = 0
    print(data_path,ann_path, img_type)
    print("Annotating: {}".format(data_path))
   ...
end of file

Was the LMO dataset also trained for 50 epochs?

Provide Dockerfile

I wanted to ask if it would be possible to add the Dockerfile to this repository to better track how it is built and make adjustments if necessary.

Transfer Learning

Evaluation of PoET in distributed training mode

Reported by @3bsamad in #10

Currently when training PoET in distributed training mode, it seems that the evaluation is only based on the data used by GPU 1, i.e. 1/n of the dataset. Possible solution might be using Hugging Face Accelerate.

Train using custom dataset

Thank you for providing this excellent project! I would like to train on my custom dataset. In the backbone.py file, I noticed that setting self[0].train_backbone to True results in a NotImplementedError. Does this mean that the backbone is currently not trainable? If I want to use Mask R-CNN as the backbone for training on my custom dataset, what steps should I follow?

How to run PoET in inference mode.

Throwing errors while running the inference in docker container.
Traceback (most recent call last):
File "../opt/project/main.py", line 372, in
inference(args)
File "/opt/project/inference_tools/inference_engine.py", line 31, in inference
model, criterion, matcher = build_model(args)
File "/opt/project/models/init.py", line 11, in build_model
return build(args)
File "/opt/project/models/pose_estimation_transformer.py", line 594, in build
backbone = build_backbone(args)
File "/opt/project/models/backbone.py", line 61, in build_backbone
raise NotImplementedError
NotImplementedError

Loss is NaN in training

Hi, I was trying to train PoET on my custom dataset. I converted it using your script ycbv2poet.py since my custom dataset is basically built to emulate YCB-V but with other objects (same number of classes). I also trained the backbone and it works correctly. However, when I try to run the training for PoET (using gt bboxes), it stops immediately for the loss being nan. I tried to understand which was the problem, but I not sure that these two that I list could be the one:

Before stopping, the script prints the following warning (I used the docker image so I didn't touch the dependencies):

/opt/conda/lib/python3.8/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2156.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/opt/project/models/position_encoding.py:53: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)

I noticed that for some images the returned targets in PoseDataset class (in getitem) are empty:

{'boxes': tensor([], size=(0, 4)), 'labels': tensor([], dtype=torch.int64), 'image_id': tensor([202]), 'relative_position': tensor([], size=(0, 3)), 'relative_quaternions': tensor([], size=(0, 4)), 'relative_rotation': tensor([], size=(0, 3, 3)), 'intrinsics': tensor([], size=(0, 9)), 'area': tensor([]), 'iscrowd': tensor([], dtype=torch.int64), 'orig_size': tensor([480, 640]), 'size': tensor([480, 640])}

Actually I think this could be the most probable cause of the loss being nan since I added a print and it always crashes when the target given in input in the forward function are like those. If I double-checked the labels in the original dataset and all these information are present in the json, therefore something appears to be broken in the script ycbv2poet.py as these informations are replaces with empty lists.

Best, Daniele

Background Data Missing

Thank you for providing the code！As shown below, I encountered an issue when training the model with LMO data.

Traceback (most recent call last):
File "/home/zhongry/code/poet/main.py", line 394, in
main(args)
File "/home/zhongry/code/poet/main.py", line 338, in main
train_stats = train_one_epoch(
File "/home/zhongry/code/poet/engine.py", line 49, in train_one_epoch
prefetcher = data_prefetcher(data_loader, device, prefetch=True)
File "/home/zhongry/code/poet/data_utils/data_prefetcher.py", line 29, in init
self.preload()
File "/home/zhongry/code/poet/data_utils/data_prefetcher.py", line 33, in preload
self.next_samples, self.next_targets = next(self.loader)
File "/home/zhongry/anaconda3/envs/ffbpy38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/home/zhongry/anaconda3/envs/ffbpy38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/zhongry/anaconda3/envs/ffbpy38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/zhongry/anaconda3/envs/ffbpy38/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/zhongry/anaconda3/envs/ffbpy38/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/zhongry/anaconda3/envs/ffbpy38/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/zhongry/anaconda3/envs/ffbpy38/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/zhongry/code/poet/data_utils/pose_dataset.py", line 62, in getitem
img, target = super(PoseDataset, self).getitem(idx)
File "/home/zhongry/code/poet/data_utils/torchvision_datasets/coco.py", line 136, in getitem
background_img = self.get_background(img.size)
File "/home/zhongry/code/poet/data_utils/torchvision_datasets/coco.py", line 84, in get_background
n_background_images = len(self.synthetic_background)
TypeError: object of type 'NoneType' has no len()

I believe the issue arises from not properly loading the background data, with the default value of 'None' for 'synt_background' in the main.py file.
parser.add_argument('--synt_background', default=None, type=str, help="Directory containing the background images from which to sample")
I noticed in the documentation that 'class 0' is referred to as the background class, but I'm unsure if this represents the background data itself. How should the path for the background data be configured? Or could you please clarify if the background data needs to be downloaded separately?

Looking forward to your response. Thank you very much.

Question about two missing parameter files in main.py

main.py

line 138: '--class_info', 'annotations/classes.json'

line 142: '--model_symmetry', '/annotations/symmetries.json'

I have tried to find the two missing .json files but failed, could you please provide them or the method to get these files? Thanks a lot!

Evaluation results very bad

Hi, I'm trying to train PoET on my custom dataset, for sake of clarity, the dataset is composed by 30 video in which I have only one object which is always in the center of the image (it can be slightly moved but approximately it's in the center) and never occluded. As first step I trained YOLO and it works very good. However, the PoET transformer during the training both using backbone or ground-truth label is not learning at all. If we compare the evaluation metrics between the 5-th epoch and the 50-th one, the results are almost the same (in some case even worst). By looking at the losses during the training they're decreased along the epochs, however the results still are so bad. Did you ever experienced something like?

PS: As hyperparameters I'm using the one you provided for the YCB-Video dataset as my dataset is built with custom objects but with a similar configuration.

These are the losses:

{"train_lr": 0.0001999999999999853, "train_grad_norm": 71.56180365573567, "train_position_loss": 0.0004068389578922774, "train_rotation_loss": 0.7290082707034702, "train_loss": 3.76163222109715, "train_loss_trans": 0.0008136779157845548, "train_loss_rot": 0.7290082707034702, "train_loss_trans_0": 0.0006544853708065885, "train_loss_rot_0": 0.7750758395832922, "train_loss_trans_1": 0.0008348826280783751, "train_loss_rot_1": 0.7659609740001402, "train_loss_trans_2": 0.0009145029361920533, "train_loss_rot_2": 0.74827361520032, "train_loss_trans_3": 0.000919925991882703, "train_loss_rot_3": 0.7391760488487178, "train_loss_trans_unscaled": 0.0004068389578922774, "train_loss_rot_unscaled": 0.7290082707034702, "train_loss_trans_0_unscaled": 0.00032724268540329423, "train_loss_rot_0_unscaled": 0.7750758395832922, "train_loss_trans_1_unscaled": 0.00041744131403918756, "train_loss_rot_1_unscaled": 0.7659609740001402, "train_loss_trans_2_unscaled": 0.00045725146809602667, "train_loss_rot_2_unscaled": 0.74827361520032, "train_loss_trans_3_unscaled": 0.0004599629959413515, "train_loss_rot_3_unscaled": 0.7391760488487178, "epoch": 10, "n_parameters": 14047113}

{"train_lr": 0.0001999999999999853, "train_grad_norm": 49.230578642773246, "train_position_loss": 0.0002832899802765488, "train_rotation_loss": 0.3247749454875207, "train_loss": 1.772507746909583, "train_loss_trans": 0.0005665799605530976, "train_loss_rot": 0.3247749454875207, "train_loss_trans_0": 0.00048024131988974757, "train_loss_rot_0": 0.3864199985161464, "train_loss_trans_1": 0.0006219867309041784, "train_loss_rot_1": 0.3682782535760578, "train_loss_trans_2": 0.000699654193110162, "train_loss_rot_2": 0.3492575710207973, "train_loss_trans_3": 0.0006888218545745218, "train_loss_rot_3": 0.340719692684924, "train_loss_trans_unscaled": 0.0002832899802765488, "train_loss_rot_unscaled": 0.3247749454875207, "train_loss_trans_0_unscaled": 0.00024012065994487378, "train_loss_rot_0_unscaled": 0.3864199985161464, "train_loss_trans_1_unscaled": 0.0003109933654520892, "train_loss_rot_1_unscaled": 0.3682782535760578, "train_loss_trans_2_unscaled": 0.000349827096555081, "train_loss_rot_2_unscaled": 0.3492575710207973, "train_loss_trans_3_unscaled": 0.0003444109272872609, "train_loss_rot_3_unscaled": 0.340719692684924, "epoch": 40, "n_parameters": 14047113}

and these are the relative evaluations:
Epoch 10

---------------------------------------------------------------------------------------------------- *
Metric ADD(-S)
---------------------------------------------------------------------------------------------------- *
** Unnamed-DUMMY#1-1 **threshold=[0.0, 0.10], area: 6.23
threshold=0.02, correct poses: 1.0, all poses: 3734.0, accuracy: 0.03
threshold=0.05, correct poses: 146.0, all poses: 3734.0, accuracy: 3.91
threshold=0.10, correct poses: 1271.0, all poses: 3734.0, accuracy: 34.04
---------------------------------------------------------------------------------------------------- *
Metric Average Rotation Error in Degrees
---------------------------------------------------------------------------------------------------- *
Class: Unnamed-DUMMY#1-1 134.05327252676838

epoch 40

---------------------------------------------------------------------------------------------------- *
Metric ADD(-S)
---------------------------------------------------------------------------------------------------- *
** Unnamed-DUMMY#1-1 **threshold=[0.0, 0.10], area: 5.83
threshold=0.02, correct poses: 16.0, all poses: 3734.0, accuracy: 0.43
threshold=0.05, correct poses: 135.0, all poses: 3734.0, accuracy: 3.62
threshold=0.10, correct poses: 1219.0, all poses: 3734.0, accuracy: 32.65
---------------------------------------------------------------------------------------------------- *
Metric Average Rotation Error in Degrees
---------------------------------------------------------------------------------------------------- *
Class: Unnamed-DUMMY#1-1 136.038801572181

Demo script for inference mode

Dear @tgjantos,

Congrats for your work and thanks for releasing the code. I am trying to understand how I could use your work for my project and checking on the documentation it is not clear to me how to use poet for inference with the pre-trained models on images given from my custom dataset.

For example I have a set of images like the ones below:

and for each one of them I would be interested to extract the camera pose using poet thus could you please elaborate a bit how this could be achieved since as I understand the commands that you provide here is for training if I am not wrong.

Thanks.

Questions about visualization or demo script

Dear authors,

Thank you for sharing this nice work and releasing this code for open source. I just wonder if there are scripts for visualization for 6Dpose with 3D bounding boxes or prediction masks as shown in the paper as follow.

Best regards

Details about training ycbv with yolo

Thanks for the nice work, could you provide some details about training ycbv with real and synthetic data.

did you first train on synthetic and then on real or both together?
and if so have you adapted the annotations file and merged them?

About FasterRcnn detector

Thank you for your awesome work! I am working on you great work and I noticed that you have updated your code for Fastrercnn detector. Could you please offer the checkpoint of your trained fastrcnn detector of lmo dataset for I got poor results while testing the backbone mode on it? That would help me a lot!
Best wishes.

How to import the LM-O dataset

How to import the LM-O dataset into the project? I downloaded the PBR dataset and models for LM-O from the BOP website, as well as the BOP19-23 test image set. How should I import them, and what is the project structure like? Can you provide a sample for me to refer to?

aau-cns / poet Goto Github PK

poet's Introduction

PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation

Introduction

License

Citing PoET

Getting Started

Requirements

Docker

Evaluation & Inference

Distributed Training

Scaled-YOLOv4 Backbone

Model Zoo

BOP Datasets

poet's People

Contributors

Stargazers

Watchers

Forkers

poet's Issues

main.py

Recommend Projects

Recommend Topics

Recommend Org