Coder Social home page Coder Social logo

Comments (34)

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

Hi @AminSeffo,

Sounds good! :)

  1. The training config will be automatically retrieved by the exp_group/my_autoencoder you define in the m3_template.cfg. So in particular here, where you define the mapping from class to the name and group of your AAE model:

    class_2_encoder = {1:'bop_tudl/obj_000001'}

    It's likely you can keep the default parameters, just make sure your input images are in bgr format. The maskrcnn parameters are irrelevant because you are using your own one.

  2. You can simply convert them to a binary mask and apply the mask to the image before cropping and prediction, as done here:

    img_masked = img * inst_mask[..., None].astype(np.uint8)

    I once wrote a converter from RLE to binary masks:
    https://github.com/thodan/bop_toolkit/blob/af380d7a028b5c44903913e39d652c83a4bc2bdd/bop_toolkit_lib/pycoco_utils.py#L202

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

Hi @MartinSmeyer,

thanks again for your reply.
I changed the parameter of class_2_encoder (see below) and I defined it as class 1 in the python script, but I still get this error: 1 not contained in config class_names dict_keys([1]

By the way, I trained the model as in the AAE pipeline instructions, where I only changed the path of the .ply and the VOC background images.

Additional information

[methods]
object_detector = mask_rcnn
object_pose_estimator = auto_pose

[auto_pose]
gpu_memory_fraction = 0.5
color_format = bgr
color_data_type = np.float32
depth_data_type = np.float32
class_2_encoder = {1:"exp_group/my_autoencoder"}
camPose = False
upright = False
topk = 1
pose_visualization = False

[mask_rcnn]
path_to_masks =
inference_time = 0.15
# from test_m3.py
# gt boxes and classes (replace with your favorite detector)

classes =  [1]
bboxes = [[860, 511, 929, 667]]

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

@AminSeffo So you did not rename your experiment group / name when training the AAE? Normally you would put some descriptive names there, but it shouldn't matter.

The problem might be that here the class key is transformed into a string:

bbs.append(BoundingBox(xmin=b[0]/w, xmax=(b[0]+b[2])/w , ymin=b[1]/h, ymax=(b[1]+b[3])/h, classes={str(c):1.0}))

Can you try to remove the str()?

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

hello,Could you tell me how train 2D detector? Thank you so much~

I used Detectron2 for that. Here is a colab notebook, where you can define a data set and start the training.

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

hello,Could you tell me how train 2D detector? Thank you so much~

I used Detectron2 for that. Here is a colab notebook, where you can define a data set and start the training.
hello ,I want to know how to train 2D detector with tless train-dataset?
if you could reply me ,i will appreciate!

Hey maybe I can help you doing that but can you please open a new issue for that?

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

Hey @MartinSmeyer,
thank you again.
I removed str() and it works, but I am not able to visualize it using the -vis flag because of this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 0: invalid start byte
I saw this issue #88, however the training worked already.

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

@MartinSmeyer

maybe here the traceback error in details, when running with -vis :

This message will be only logged once.
INFO - 2022-08-01 16:25:09,531 - acceleratesupport - OpenGL_accelerate module loaded
INFO - 2022-08-01 16:25:09,536 - arraydatatype - Using accelerated ArrayDatatype
using egl
('renderer', 'Model paths: ', ['/home/amin/autoencoder_ws/cad_model/nuss_model.ply'])
[0]
Traceback (most recent call last):                                                                                                                                                                                                                                              |
  File "/home/amin/6d_pose_estimation/test_m3.py", line 60, in <module>
    pose_visualizer.render_poses(img, camK, pose_ests, bbs)
  File "/home/amin/6d_pose_estimation/visualization/render_pose.py", line 31, in render_poses
    bgr, depth,_ = self.renderer.render_many(obj_ids = [self.classes.index(pose_est.name) for pose_est in pose_ests],
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/auto_pose/ae/utils.py", line 15, in decorator
    setattr(self, attribute, function(self))
  File "/home/amin/6d_pose_estimation/visualization/render_pose.py", line 25, in renderer
    vertex_scale=float(self.vertex_scale[0])) #1000 for models in meters
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/auto_pose/meshrenderer/meshrenderer.py", line 37, in __init__
    vert_norms = gu.geo.load_meshes(models_cad_files, vertex_tmp_store_folder, recalculate_normals=True)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/auto_pose/meshrenderer/gl_utils/geometry.py", line 54, in load_meshes
    scene = pyassimp.load(model_path, pyassimp.postprocess.aiProcess_Triangulate)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/pyassimp/core.py", line 315, in load
    scene = _init(model.contents)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/pyassimp/core.py", line 211, in _init
    call_init(obj, target)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/pyassimp/core.py", line 76, in call_init
    _init(obj.contents, obj, caller)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/pyassimp/core.py", line 122, in _init
    target.name = str(obj.data.decode("utf-8"))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

There seem to be some special characters in your 3D model file. Please try to debug this yourself.

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

Hey @MartinSmeyer,
I checked that out. I replaced my 3D model with obj_30.ply from the BOP challenge and I am still getting the same error.
And I can still try to fix this error.
thank you again

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

That's strange, I just tried it and it works for me, including tless object 30:
Bildschirmfoto_2022-08-03_15-05-23

Are you using the CAD models or the reconstructed ones? Could you try to change the model type to CAD in your training config before running the visualization?

[Dataset]
MODEL: cad

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

Which pyassimp version are you using?

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

Which pyassimp version are you using?

pyassimp: 3.3

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

Try to update to

- pyassimp==4.1.3

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

Try to update to

- pyassimp==4.1.3

I updated but I got this error now :(


('renderer', 'Model paths: ', ['/home/amin/autoencoder_ws/cad_model/nuss_model.ply'])
[0]
100% |########################################################################################################################################################################################################################################################################################|
Traceback (most recent call last):
  File "/home/amin/6d_pose_estimation/test_m3.py", line 64, in <module>
    pose_visualizer.render_poses(img, camK, pose_ests, bbs)
  File "/home/amin/6d_pose_estimation/visualization/render_pose.py", line 38, in render_poses
    far = 10000)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/auto_pose/meshrenderer/meshrenderer.py", line 141, in render_many
    assert W <= Renderer.MAX_FBO_WIDTH and H <= Renderer.MAX_FBO_HEIGHT
AssertionError

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

I think, I have problems with my image dimensions...I will check it out and let you

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

Hey @MartinSmeyer

I had problems with some dimensions of the bbox from the 2d object detection and now the rendering works, here is the output :

test_image

I think, I am facing an issue with scaling. Here is my .ply model my_model_aae.zip , which I scaled in blender. Actually I should have done everything correctly before training the autoencoder (please take a look at the .ply model if you have time)

Here is also a snap of my_autoencoder.cfg :

[Paths]
MODEL_PATH: /home/amin/autoencoder_ws/cad_model/my_model_aae.ply
BACKGROUND_IMAGES_GLOB: /home/amin/autoencoder_ws/VOCdevkit/VOC2012/JPEGImages/*.jpg


[Dataset]
MODEL: reconst
H: 128
W: 128
C: 3
RADIUS: 700

RENDER_DIMS: (720, 540)
K: [1075.65, 0, 720/2, 0, 1073.90, 540/2, 0, 0, 1]

#Azure Kinect parameters
#RENDER_DIMS: (720, 1280)
#K: [608.1231079101562, 0, 638.6071166992188, 0, 608.0382690429688, 368.2049560546875, 0, 0, 1]

# Scale vertices to mm
VERTEX_SCALE: 1
ANTIALIASING: 1
PAD_FACTOR: 1.
CLIP_NEAR: 10
CLIP_FAR: 10000
NOOF_TRAINING_IMGS: 20000
NOOF_BG_IMGS: 15000

[Augmentation]
REALISTIC_OCCLUSION: False
SQUARE_OCCLUSION: False
MAX_REL_OFFSET: 0.20
CODE: Sequential([
	#Sometimes(0.5, PerspectiveTransform(0.05)),
	#Sometimes(0.5, CropAndPad(percent=(-0.05, 0.1))),
	Sometimes(0.5, Affine(scale=(1.0, 1.2))),
	Sometimes(0.5, CoarseDropout( p=0.2, size_percent=0.05) ),
	Sometimes(0.5, GaussianBlur(1.2*np.random.rand())),
    Sometimes(0.5, Add((-25, 25), per_channel=0.3)),
    Sometimes(0.3, Invert(0.2, per_channel=True)),
    Sometimes(0.5, Multiply((0.6, 1.4), per_channel=0.5)),
    Sometimes(0.5, Multiply((0.6, 1.4))),
    Sometimes(0.5, ContrastNormalization((0.5, 2.2), per_channel=0.3))
	], random_order=False)

[Embedding]
EMBED_BB: True
MIN_N_VIEWS: 2562
NUM_CYCLO: 36

[Network]
BATCH_NORMALIZATION: False
AUXILIARY_MASK: False
VARIATIONAL: 0
LOSS: L2
BOOTSTRAP_RATIO: 4
NORM_REGULARIZE: 0
LATENT_SPACE_SIZE: 128
NUM_FILTER: [128, 256, 512, 512]
STRIDES: [2, 2, 2, 2]
KERNEL_SIZE_ENCODER: 5
KERNEL_SIZE_DECODER: 5


[Training]
OPTIMIZER: Adam
NUM_ITER: 30000
BATCH_SIZE: 64
LEARNING_RATE: 2e-4
SAVE_INTERVAL: 10000

[Queue]
# OPENGL_RENDER_QUEUE_SIZE: 500
NUM_THREADS: 10
QUEUE_SIZE: 50

And from test_m3.py :

import cv2
import numpy as np
import os
import argparse
import object_detector
from auto_pose.m3_interface.m3_interfaces import BoundingBox
from auto_pose.m3_interface.ae_pose_estimator import AePoseEstimator
from webcam_video_stream import WebcamVideoStream


dir_name = os.path.dirname(os.path.abspath(__file__))
default_cfg = os.path.join(dir_name, '../cfg_m3vision/m3_template_pose.cfg')

parser = argparse.ArgumentParser()
parser.add_argument("--m3_config_path", type=str, default=default_cfg)
parser.add_argument("-vis", action='store_true', default=False)

args = parser.parse_args()

if os.environ.get('AE_WORKSPACE_PATH') == None:
    print('Please define a workspace path:\n')
    print('export AE_WORKSPACE_PATH=/path/to/workspace\n')
    exit(-1)

img = cv2.imread("/home/amin/6d_pose_estimation/image_test.png")
H,W,_ = img.shape


# Azure Kinect camera parameters
f_x=608.1231079101562
f_y=608.0382690429688
c_x=638.6071166992188
c_y=368.2049560546875
camK = np.array([f_x, 0., c_x, 0., f_y, c_y, 0., 0., 1.]).reshape(3, 3)


# gt boxes and classes (replace with your favorite detector)
classes =  [1]

bboxes=[[723, 366, 89, 80]]
my_detector=object_detector.Detector()
nuss_detection=my_detector.prediction(img)
bbs = []
h,w = float(H), float(W)
for b,c in zip(bboxes, classes):
    bbs.append(BoundingBox(xmin=b[0]/w, xmax=(b[0]+b[2])/w , ymin=b[1]/h, ymax=(b[1]+b[3])/h, classes={(c):1.0}))
    # MultiPath Encoder Initialization
    aae_pose_estimator = AePoseEstimator("/home/amin/6d_pose_estimation/cfg_m3vision/m3_template_pose.cfg")
    # Predict 6-DoF poses
    pose_ests = aae_pose_estimator.process(bbs,img,camK)
    print(np.array([{p.name:p.trafo} for p in pose_ests]))
# Visualize
if args.vis:
    from visualization.render_pose import PoseVisualizer
    pose_visualizer = PoseVisualizer(aae_pose_estimator)
    pose_visualizer.render_poses(img, camK, pose_ests, bbs)

And finally the used m3_template:

[methods]
object_detector = mask_rcnn
object_pose_estimator = auto_pose

[auto_pose]
gpu_memory_fraction = 0.5
color_format = bgr
color_data_type = np.float32
depth_data_type = np.float32
class_2_encoder = {1:"exp_group/my_autoencoder"}
camPose = False
upright = False
topk = 1
pose_visualization = False

[mask_rcnn]
path_to_masks =
inference_time = 0.15

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

Hey @MartinSmeyer,
do you have any suggested solutions?

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

Hey,

#Azure Kinect parameters
#RENDER_DIMS: (720, 1280)
#K: [608.1231079101562, 0, 638.6071166992188, 0, 608.0382690429688, 368.2049560546875, 0, 0, 1]

It's best to use these for training, but render dims is the wrong way around, should be

RENDER_DIMS: (1280, 720)

The pad factor of 1.2 should not be changed.

The 3D model geometry seems okay at first glance. Try again to train with the above parameters and use the ae_train .. -d option to visualize the reconstruction targets before training.

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

Thank you again @MartinSmeyer

I corrected the render dims the K matrix as you suggested but I am still getting the same visualization. Moreover, I centered the model using meshlab and it looks now:
Screenshot from 2022-08-16 10-36-08

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

Can you please post the image you get with
ae_train ... -d

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

@MartinSmeyer of course
Here are the images, which are generated using ae_train ... -d and the centered model using meshlab:

Screenshot from 2022-08-16 10-52-33
Screenshot from 2022-08-16 10-52-19

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

Okay, although the 3D model is hollow and without texture, the size looks alright.

What is the pose that you print out?

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

and did you retrain with the azure kinect camK? and recreated the embedding?

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

Shouldn't this classes={(c):1.0}) be this classes={str(c):1.0})?

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

sry I closed the issue by mistake

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

and did you retrain with the azure kinect camK? and recreated the embedding?

Yes I did

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

Shouldn't this classes={(c):1.0}) be this classes={str(c):1.0})?

With str I got some errors, we discussed that before: #113 (comment)

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

With str I got some errors, we discussed that before: #113 (comment)

Ah yes. Would just also remove the ()

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

With str I got some errors, we discussed that before: #113 (comment)

Ah yes. Would just also remove the ()

Ohh okey I removed it: bbs.append(BoundingBox(xmin=b[0]/w, xmax=(b[0]+b[2])/w , ymin=b[1]/h, ymax=(b[1]+b[3])/h, classes={c:1.0}))

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

Okay, although the 3D model is hollow and without texture, the size looks alright.

What is the pose that you print out?
Screenshot from 2022-08-16 11-20-58

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

Oh it's in meters, although your 3d model is in mm. Try to add mm=True as an argument here:

pose_ests = aae_pose_estimator.process(bbs,img,camK)

from augmentedautoencoder.

AminSeffo avatar AminSeffo commented on June 2, 2024

Oh it's in meters, although your 3d model is in mm. Try to add mm=True as an argument here:

pose_ests = aae_pose_estimator.process(bbs,img,camK)

Okey thanks it looks better:
Screenshot from 2022-08-16 11-35-04

but where is the translation vector?

from augmentedautoencoder.

MartinSmeyer avatar MartinSmeyer commented on June 2, 2024

It's a 4x4 homogeneous matrix. ;) t = [149.27 , 45.84, 687.40] in mm

from augmentedautoencoder.

Albertdalmen avatar Albertdalmen commented on June 2, 2024

Hi @AminSeffo, I'm glad that someone else is interested in using this as a real-time pose estimator. I'm currently trying to implement my own.

How where your results? I'm also curious about why you choose to go for detectron2 instead of, for instance, Keras Retitnanet.

Is there, by any change, the possibility that you share your work/pipeline, or some indications on how you manage to make it work?

Thanks in advance.

from augmentedautoencoder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.