Hi author, thanks for your excellent work first! I have been trying to reimplement you

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

This is the <a href="https://github.com/astra-vision/SceneRF/blob/main/scenerf/data/se

Some question about compute_transformation and dataset about scenerf HOT 10 CLOSED

astra-vision commented on May 28, 2024

Some question about compute_transformation and dataset

from scenerf.

Comments (10)

BeileiCui commented on May 28, 2024 1

Really thanks for your patience! I will try again accordingly, hope I can have better results!

from scenerf.

anhquancao commented on May 28, 2024

Hi @BeileiCui,
Thanks for your interest. I anwser your question below:

You don't need the registration if your poses are accurate. I only use registration to correct the transformation in KITTI since its poses are errornous.
SceneRF doesn't require Velodyne. It reconstructs in the camera coordinates. The T_velo_2_cam2 and T_cam0_2_cam2 are used to only compute the point cloud registration to fix the transformation.
We use these numbers following SemanticKITTI SSC setting. You can select other numbers.

from scenerf.

BeileiCui commented on May 28, 2024

Hi, thanks for your reply! I just have another question. I have built up the training pipeline with SCARED now. But some training logs are a little weird compared to the logs you mentioned here. The settings are mostly the same as your original settings except for like image size.

Do you have any suggestions on what the problems might be? And any suggestions on how can I resolve it?

Thanks for your time and look forward to your reply!

from scenerf.

anhquancao commented on May 28, 2024

Hi @BeileiCui,
It seems that your relative transformation is incorrect. I would suggest:

Train without the loss reprojection (comment this line). And render the rgb images to make sure everything is correct except the reprojection.
Checking your transformation: Passing ground-truth depth to this function and draw the projected image see if you can recover the target image. If the depth is sparse like in lidar, you can draw it on top of the target image to see if they match.
I hope this help.

from scenerf.

BeileiCui commented on May 28, 2024

Hi, @anhquancao really thanks for your suggestion! I followed your advice and checked the code these days and found out it was the transformation's problem, I have solved it now.
The performance was still not reaching its best due to some reasons. I notice that some parameters related to ray sampling are too high compared to your original training on KITTI. Like shown below: my min_som_vars_epoch is about 160 (yours is about 26), my closest_std_epoch is about 12.5 (yours is about 4.5), my dist_2_closest_gaussian_epoch is about 4 (yours is about 1.5),

So do you have any suggestions on how should I finetune the network? Or could it be some other reason causing this right now?

Looking forward to your reply!

from scenerf.

BeileiCui commented on May 28, 2024

By the way, here are some parameters of mine currently. The min distance and max distance of SCARED are about 10mm and 250 mm separately. I down-sample the original image by a factor of 2 (1024*1280 to 512 * 640) and decrease the cam_K accordingly.

@click.command()
@click.option('--dataset', default="kitti", help='experiment prefix')
@click.option('--logdir', default="", help='log directory')
@click.option('--root', default="/mnt/data-hdd2/Beilei/Dataset/SCARED", help='path to dataset folder')
@click.option('--preprocess_root', default="/mnt/data-hdd2/Beilei/Dataset/SCARED/preprocess", help='path to preprocess folder')
@click.option('--bs', default=1, help='Batch size')
@click.option('--lr', default=1e-5, help='learning rate')
@click.option('--wd', default=0, help='weight decay')
@click.option('--n_gpus', default=1, help='number of GPUs')
@click.option('--n_workers_per_gpu', default=1, help='number of workers per GPU')
@click.option('--enable_log', default=False, help='enable log')
@click.option('--exp_prefix', default="Train", help='experiment prefix')

@click.option('--n_rays', default=1200, help='Total number of rays')
@click.option('--frames_interval', default=0.5, help='Interval between supervision frames')
@click.option('--sample_train', default=5, help='Sample the train set at certain scale')

@click.option('--max_sample_depth', default=220, help='maximum sample depth')
@click.option('--eval_depth', default=200, help='cap depth at 80 for evaluation')

@click.option('--n_pts_per_gaussian', default=8, help='#points sampled for each gaussian')
@click.option('--n_gaussians', default=4, help='#gaussians')
@click.option('--n_pts_uni', default=32, help='#points sampled uniformly')
@click.option('--std', default=2.0, help='initial std of each gaussian')
@click.option('--add_fov_hor', default=16, help='Amount of angle in degree added to left and right of the horizontal FOV')
@click.option('--add_fov_ver', default=14, help='Amount of angle in degree added to top and bottom of the vertical FOV')
# ideally sphere_h and sphere_w should be img_H * 1.5, img_W * 1.5 (Because we increase the FOV by 1.5). 
# However, we empirically found that any sphere_h >= img_H and any sphere_w >= img_W have almost similar performance. 
@click.option('--sphere_h', default=600, help='The height of the discretized spherical grid') 
@click.option('--sphere_w', default=700, help='The width of the discretized spherical grid') 
@click.option('--sequence_distance', default=10, help='Distance between the input and the last frames in the sequence')
@click.option('--som_sigma', default=2.0, help='')
@click.option('--max_epochs', default=50, help='')
@click.option('--use_color', default=True, help='Use color loss')
@click.option('--use_reprojection', default=True, help='Use reprojection loss')

from scenerf.

anhquancao commented on May 28, 2024

It means that the std of the gaussian is quite high and the Gaussian peaks are far from the depth the depth.
Maybe you can increase the weight of the lost that minimizes the distance to closest Gaussian.

The loss seems to still decrease sharply. I think you should train for longer.

However, the most important metrics are the depth metrics like: abs_rel, rmse, sq_rel, a1, a2, a3 and they looks good.

You can also make the network focus more on depth by increasing the weight of the reprojection loss.

I also found out that the size of the image in the loss functions is quite important also. You can decrease the size of the image input to the network. But for the images in the loss functions, I would advise you to keep full size.

from scenerf.

BeileiCui commented on May 28, 2024

Hi, thanks for your advice! Sorry, I don't know if I understand how to decrease the size of image input to the network and keep the full size in the loss function. Aren't the images handled together and sampled with rays？ How can I input different size of images to network and loss function?
Do you mean I can, for example, prepare the data with the original size but makes sphere_h and sphere_w smaller than the original size?

from scenerf.

BeileiCui commented on May 28, 2024

The metrics like: abs_rel, rmse, sq_rel, a1, a2, a3 they are decreasing but the decreasing speed is already slow. I mentioned performance was still not reaching its best becuase it's weird compared to other SOTA self-supervised depth estimation method. For example, current my best RMSE is about 16 while SOTA method on this dataset is about 6. So I wonder if it's my problem, if I did not set something right for training.
I also notice that my weights_at_depth_epoch is too small, around 0.05 and not increasing(yours is above 0.3 at the beginning at around 0.55 at the end). Could this be the problem? I 'm just confused because I think it's performance should not decrease this much so it must be my problem

from scenerf.

anhquancao commented on May 28, 2024

This is the input image and the images used in loss functions.
The depth metrics are much lower because they are computed on a selected random frame in 10m from the infer frames, not the input frame.
0.05 is indeed small, it's the weight of the point closest to the depth. Probably because the points are too far away.

from scenerf.

Some question about compute_transformation and dataset about scenerf HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent