sconlyshootery / featdepth Goto Github PK

This is the offical codes for the methods described in the "Feature-metric Loss for Self-supervised Learning of Depth and Egomotion".

License: MIT License

Python 100.00%

depth-estimation kitti odometry representation-learning self-supervised-learning

featdepth's People

Contributors

Stargazers

Watchers

featdepth's Issues

Pretrained weights

Hi! I'm curious about pretrained weights and configs: which config was used to receive this and which for this. I guess it cfg_kitti_fm.py and cfg_kitti_fm_refine.py respectively. Am I right?

Resnet 18 & Resnet 50

Hello!

First and foremost I want to congratulate you on your work!

I want to perform run infer.py on some images and I can't find in the repo the links for the pretrained models for resnet 50 and 18. Have you provided them, or should I download them from somewhere else?

PS: I was able to find the resnet 50 in Monodepth2 repo from the link you provided. But still, no resnet 18.

Thank you!

question about figure2 in the paper

Hi, it's a great work :)

I was surprised about the perfect qualitative result in the paper (Fig.2.)
and curious which model is used to visualize that disparity map (online training model or not?)

wtat is gt_depths.npz and how can i get it

Input image size

I want to train using a custom dataset of size 1600x900. Due to low available computing power, I have to lower the image size. So far, I have tried with the following: 320x1024 (kitti size, does not keep aspect ratio), 576x1024(keeps aspect ratio). The problem is when I try with certain sizes (ex: 288x512), I get black results. Other sizes (ex: 360x640) produces compilation error.
I did the scaling of intrinsic parameter:

K[0, :] *= self.current_width/self.original_width
K[1, :] *= self.current_height/self.original_height

And I modified in net.py, line 148:
pose_feats = {f_i: F.interpolate(inputs["color_aug", f_i, 0], [self.opt.height, self.opt.width], mode="bilinear", align_corners=False) for f_i in self.opt.frame_ids}

My question is: Is there any thinking that goes behind choosing the right size? I have observed that the width and height are multiples of 64 in every case that works, can you confirm if this logic makes sense ?

Backpropagation

How does the backpropagation work? Can you point out the piece of code responsible for the backpropagation?

Generate_features_pred room for optimization?

In the Generate_features_pred function as pointed below:

FeatDepth/mono/model/mono_fm/net.py

Line 80 in 3b5bf79

outputs = self.generate_features_pred(inputs, outputs)

I see that you are using only the scale 0 features. But the calculation takes place for all the 4 scales.
I believe, it can be calculated only once.

FeatDepth/mono/model/mono_fm/net.py

Line 168 in 3b5bf79

disp = outputs[("disp", 0, 0)]

FeatDepth/mono/model/mono_fm/net.py

Line 184 in 3b5bf79

    
           outputs[("feature", frame_id, 0)] = F.grid_sample(src_f, pix_coords, padding_mode="border")

perceptional_loss in scale loop

FeatDepth/mono/model/mono_fm_joint/net.py

Lines 114 to 121 in db7344a

    
           for frame_id in self.opt.frame_ids[1:]: 
        
               src_f = outputs[("feature", frame_id, 0)] 
        
               tgt_f = self.Encoder(inputs[("color", 0, 0)])[0] 
        
               perceptional_losses.append(self.compute_perceptional_loss(tgt_f, src_f)) 
        
           perceptional_loss = torch.cat(perceptional_losses, 1) 
        
           min_perceptional_loss, outputs[("min_index", scale)] = torch.min(perceptional_loss, dim=1) 
        
           loss_dict[('min_perceptional_loss', scale)] = self.opt.perception_weight * min_perceptional_loss.mean() / len(self.opt.scales)

FeatDepth/mono/model/mono_fm_joint/net.py

Lines 175 to 178 in db7344a

    
           def generate_features_pred(self, inputs, outputs): 
        
               disp = outputs[("disp", 0, 0)] 
        
               disp = F.interpolate(disp, [int(self.opt.height/2), int(self.opt.width/2)], mode="bilinear", align_corners=False) 
        
               _, depth = self.disp_to_depth(disp, self.opt.min_depth, self.opt.max_depth)

perceptional_loss is only computed in scale [1] (1/2 w, 1/2 h), but the code block is in the scale loop. So I think the loss is summed 5 times, and / len(self.opt.scales) at last.

About using distributed training.

Pre-training

Hi,

A doubt regarding the ResNets.

The original monodepth2 uses ImageNet pre-trained weights as initialisation for the ResNets, but it is not entirely clear to me if the paper results for FeatDepth were obtained with or without pre-training.

ImportError: cannot import name 'get_world_size'

run infer.py and get error

Number of GPUs and performance

Hi,

I'm wondering if the performance will be affected if I do not have access to 8 gpus like in your paper (apart from longer training times of course). Are the weights updated asynchronously over the different gpus?

Thank you

Perceptual loss is only in one scale

FeatDepth/mono/model/mono_fm_joint/net.py

Line 115 in c83d1cf

src_f = outputs[("feature", frame_id, 0)]

shouldn't the "perceptual loss" be calculated on the features in each scale?

Online Refinement

First off, great work! Is there a script for online refinement? :) I could not find it.

How to get the value of odometry evaluation

Hi:
I am sorry to trouble you!
I used the "our model train on kitti odometry" model and “eval_pose.py” provided by you to get the following values without any changes in the parameters.
odom_9 Trajectory error: 0.025, std: 0.010
odom_10 Trajectory error: 0.020, std: 0.011
This seems not as good as "Seq_09 0.016 (std. 0.009) Seq_10 0.013 (std. 0.009)" of "sfmlearn" .
How did you get "Ours 8.75 2.11 10.67 4.91" and " 3.07 0.89 3.83 1.78"in the paper？
I follow your evaluation code, the code is as follows

eval_pose.py

from __future__ import absolute_import, division, print_function
import os
import sys
import numpy as np

import torch
from torch.utils.data import DataLoader

sys.path.append('.')
from mono.datasets.utils import readlines, dump_xyz, compute_ate, transformation_from_parameters
from mono.datasets.kitti_dataset import KITTIOdomDataset
from mono.model.mono_fm.pose_encoder import PoseEncoder
from mono.model.mono_fm.pose_decoder import PoseDecoder

def evaluate(data_path,model_path,sequence_id,height,width):
    filenames = readlines("/userhome/Feat/FeatDepth3/mono/datasets/splits/odom/test_files_{:02d}.txt".format(sequence_id))

    dataset = KITTIOdomDataset(data_path,
                           filenames,
                           height,
                           width,
                           [0, 1],
                           is_train=False,
                           img_ext='.png',
                           gt_depth_path=None)

    dataloader = DataLoader(dataset,
                        1,
                        shuffle=False,
                        num_workers=4,
                        pin_memory=True,
                        drop_last=False)


    pose_encoder = PoseEncoder(18, None, 2)
    pose_decoder = PoseDecoder(pose_encoder.num_ch_enc)

    checkpoint = torch.load(model_path)
    for name, param in pose_encoder.state_dict().items():
        pose_encoder.state_dict()[name].copy_(checkpoint['state_dict']['PoseEncoder.' + name])
    for name, param in pose_decoder.state_dict().items():
        pose_decoder.state_dict()[name].copy_(checkpoint['state_dict']['PoseDecoder.' + name])
    pose_encoder.cuda()
    pose_encoder.eval()
    pose_decoder.cuda()
    pose_decoder.eval()

    pred_poses = []

    print("-> Computing pose predictions")
    with torch.no_grad():
        for inputs in dataloader:
            for key, ipt in inputs.items():
                inputs[key] = ipt.cuda()
            all_color_aug = torch.cat([inputs[("color_aug", i, 0)] for i in [0, 1]], 1)
            features = pose_encoder(all_color_aug)
            axisangle, translation = pose_decoder(features)
            pred_poses.append(transformation_from_parameters(axisangle[:, 0], translation[:, 0]).cpu().numpy())
    pred_poses = np.concatenate(pred_poses)

    gt_poses_path = os.path.join(data_path, "poses", "{:02d}.txt".format(sequence_id))
    gt_global_poses = np.loadtxt(gt_poses_path).reshape(-1, 3, 4)
    gt_global_poses = np.concatenate((gt_global_poses, np.zeros((gt_global_poses.shape[0], 1, 4))), 1)
    gt_global_poses[:, 3, 3] = 1
    gt_xyzs = gt_global_poses[:, :3, 3]
    gt_local_poses = []
    for i in range(1, len(gt_global_poses)):
        gt_local_poses.append(np.linalg.inv(np.dot(np.linalg.inv(gt_global_poses[i - 1]), gt_global_poses[i])))

    ates = []
    num_frames = gt_xyzs.shape[0]
    track_length = 5
    for i in range(0, num_frames - 1):
        local_xyzs = np.array(dump_xyz(pred_poses[i:i + track_length - 1]))
        gt_local_xyzs = np.array(dump_xyz(gt_local_poses[i:i + track_length - 1]))
        ates.append(compute_ate(gt_local_xyzs, local_xyzs))

    print("\n  odom_{} Trajectory error: {:0.3f}, std: {:0.3f}\n".format(sequence_id, np.mean(ates), np.std(ates)))

    # save_path = os.path.join(load_weights_folder, "poses.npy")
    # np.save(save_path, pred_poses)
    # print("-> Predictions saved to", save_path)


if __name__ == "__main__":
    data_path='/userhome/dataset/odometry_color'#path to kitti odometry
    model_path = '/userhome/Feat/FeatDepth3/fm_depth_odom.pth'
    height=320
    width=1024
    sequence_id =9
    evaluate(data_path,model_path,sequence_id,height,width)
    sequence_id = 10
    evaluate(data_path,model_path,sequence_id,height,width)

Relationship between Pose(s-->t) and Pose(t-->s)

When obtaining Pose(s-->t) (batch-size is B*6) predicted from PoseNet, whether it‘s possible to compute Pose(t-->s) directly?

Some queries about training

Hi there,

This is a good paper, but I have a small question. When training, a feature network, depth net, pose net are trained at the same time? Or the feature network can be trained separately? Thanks.

learning rate when resuming

Hi,
Thank you for the code!

When I try to resume training from a checkpoint, setting a different learning rate in the cfg file doesn't propagate to the optimizer, and it uses the learning rate of the checkpoint.
Is it on purpose? How could I change it?

training on a custom dataset

Hi, thank you so much for sharing this great project!

I have trained the model using my own datasets, and the images size of my trained datasets is (1280,720).
In the first training, I trained my own datasets using input size of (512,288), and I got the results as figure 2.
In the second training, I trained using input size of (1024,576), I only got a black image after 10epoch shown as figure 3，and I found the loss is not converge shown as figure 4.

I have only modified the input size in config file between the first training and the second training because the input camera intrinsics is normalization.
Do you have an idea what is going?

Thank you very much for your help!

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one.

Running

python -m torch.distributed.launch --master_port=9900 --nproc_per_node=1 train.py --config ./config/cfg_kitti_fm.py --work_dir logs --gpus "0,1"

gives the following RuntimeError error

./config/cfg_kitti_fm.py
cfg is  Config (path: ./config/cfg_kitti_fm.py): {'DEPTH_LAYERS': 18, 'POSE_LAYERS': 18, 'FRAME_IDS': [0, -1, 1], 'IMGS_PER_GPU': 1, 'HEIGHT': 192, 'WIDTH': 640, 'data': {'name': 'kitti', 'split': 'exp', 'height': 192, 'width': 640, 'frame_ids': [0, -1, 1], 'in_path': '../data/kitti-raw', 'gt_depth_path': '../easy2ride_pipeline/monodepth2/splits/eigen/gt_depths.npz', 'png': True, 'stereo_scale': False}, 'model': {'name': 'mono_fm', 'depth_num_layers': 18, 'pose_num_layers': 18, 'extractor_num_layers': 50, 'frame_ids': [0, -1, 1], 'imgs_per_gpu': 1, 'height': 192, 'width': 640, 'scales': [0, 1, 2, 3], 'min_depth': 0.1, 'max_depth': 100.0, 'depth_pretrained_path': None, 'pose_pretrained_path': None, 'extractor_pretrained_path': '/home/e2r/Downloads/autoencoder.pth', 'automask': True, 'disp_norm': True, 'perception_weight': 0.001, 'smoothness_weight': 0.001}, 'resume_from': None, 'finetune': None, 'total_epochs': 40, 'imgs_per_gpu': 1, 'learning_rate': 0.0001, 'workers_per_gpu': 4, 'validate': True, 'optimizer': {'type': 'Adam', 'lr': 0.0001, 'weight_decay': 0}, 'optimizer_config': {'grad_clip': {'max_norm': 35, 'norm_type': 2}}, 'lr_config': {'policy': 'step', 'warmup': 'linear', 'warmup_iters': 500, 'warmup_ratio': 0.3333333333333333, 'step': [20, 30], 'gamma': 0.5}, 'checkpoint_config': {'interval': 1}, 'log_config': {'interval': 50, 'hooks': [{'type': 'TextLoggerHook'}]}, 'dist_params': {'backend': 'nccl'}, 'log_level': 'INFO', 'load_from': None, 'workflow': [('train', 1)], 'work_dir': 'logs', 'gpus': [0, 1]}
2020-11-20 14:46:09,398 - INFO - Distributed training: True
2020-11-20 14:46:09,398 - INFO - Set random seed to 1024
/home/e2r/anaconda3/envs/e2r/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:364: UserWarning: Single-Process Multi-GPU is not the recommended mode for DDP. In this mode, each DDP instance operates on multiple devices and creates multiple module replicas within one process. The overhead of scatter/gather and GIL contention in every forward pass can slow down training. Please consider using one DDP instance per device or per module replica by explicitly setting device_ids or CUDA_VISIBLE_DEVICES. 
  "Single-Process Multi-GPU is not the recommended mode for "
cfg work dir is  logs
validate........................
2020-11-20 14:46:17,460 - INFO - Start running, host: e2r@e2r-Super-Server, work_dir: /home/e2r/Desktop/e2r/featdepth/logs
2020-11-20 14:46:17,460 - INFO - workflow: [('train', 1)], max: 40 epochs
/home/e2r/anaconda3/envs/e2r/lib/python3.7/site-packages/torch/nn/functional.py:3384: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
  warnings.warn("Default grid_sample and affine_grid behavior has changed "
/home/e2r/anaconda3/envs/e2r/lib/python3.7/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
  warnings.warn('Was asked to gather along dimension 0, but all '
Traceback (most recent call last):
  File "train.py", line 105, in <module>
    main()
  File "train.py", line 101, in main
    logger=logger)
  File "/home/e2r/Desktop/e2r/featdepth/mono/apis/trainer.py", line 68, in train_mono
    _dist_train(model, dataset_train, dataset_val, cfg, validate=validate)
  File "/home/e2r/Desktop/e2r/featdepth/mono/apis/trainer.py", line 177, in _dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/e2r/anaconda3/envs/e2r/lib/python3.7/site-packages/mmcv/runner/runner.py", line 380, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/e2r/anaconda3/envs/e2r/lib/python3.7/site-packages/mmcv/runner/runner.py", line 278, in train
    self.model, data_batch, train_mode=True, **kwargs)
  File "/home/e2r/Desktop/e2r/featdepth/mono/apis/trainer.py", line 29, in batch_processor
    model_out, losses = model(data)
  File "/home/e2r/anaconda3/envs/e2r/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/e2r/anaconda3/envs/e2r/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 528, in forward
    self.reducer.prepare_for_backward([])
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
Traceback (most recent call last):
  File "/home/e2r/anaconda3/envs/e2r/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/e2r/anaconda3/envs/e2r/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/e2r/anaconda3/envs/e2r/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in <module>
    main()
  File "/home/e2r/anaconda3/envs/e2r/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/e2r/anaconda3/envs/e2r/bin/python', '-u', 'train.py', '--local_rank=0', '--config', './config/cfg_kitti_fm.py', '--work_dir', 'logs', '--gpus', '0,1']' returned non-zero exit status 1.

How can I turn off saving the log files

I want to remove saving the log file and log.json file and only save the epoch.pth files after each epoch is done, because this saving log files after each batch, take a lot of my cpu usage and that makes a pc crash.
I tried to find where is the torch.save command, but I could not find it.

Evaluation and performance

Hi,
you got some great numbers, but I have several questions regarding the comparison to Monodepth2 and Depth-VO-Feat (Zhan et al).

First of all, your depth and feature encoders are ResNet50 with input resolution of 320×1024, right? Why do you then compare to Monodepth2 with ResNet18 encoder in Table 2? Table 6 of Monodepth2 shows that changing encoder from ResNet18 to ResNet50 reduces RMSE by more than 0.22 even with lower resolution input. Is there any performance gain when compared to Monodepth2 in a fair way (same encoder, same resolution)?

Depth-VO-Feat introduce a very similar idea, but back then it was implemented using a significantly weaker baseline. How does FeatDepth compare to Depth-VO-Feat using a better reconstruction loss (e.g. from Monodepth2)?

Performance on NYUv2?

Hi, thanks for sharing the code.
I'm quite interested in the performance on NYUv2. Did you follow a similar setting as in kitti?
I found you only released the model on NYU but not the cfg files or dataloaders.

OSError: [Errno 95] Operation not supported: 'epoch_1.pth' -> ...

I find an answer "some environments do not support os.symlink so that you can add an argument in the checkpoint_config field in config files, like checkpoint_config=dict(create_symlink=False)" in mmdetection project. However, when i set like that, models (epoch_n.pth) can not be saved, which could result in failing to evaluate and compute metrics.

An issue on Resume Training

Hi,

I have a question on resuming training on previous trained models. As my GPU server constrains the running time for each job, I have to resume the training process to get the desired epoch (40). I set the 'resume' parameter to the model path of previous epoch.

Under this circumstance, I wonder is there other parameters I should change for the taining? (i.e. the learning rate? I wish to use the learning rate set for the previous epoch. ) Should I change the parameter 'total_epochs ' ? Should the number '1' in 'workflow = [('train', 1)]' be reset to the number of the next epoch?

Thank you very much!

about feature visualization (PCA)

It's a wonderful job!
I am confused about feture visualization. (Fig.3 in you paper).
You said that one principle channel was selected using PCA decomposition. Did you mean, for example, the first channel of 64-channel feature map?
May you describle more detailedly?
Thank you so much!

inference results in your Table 2 in your paper

Hi, thanks for your great work. I am a little confused about the test results reported in your Table 2. Why the results of DORN and BTS are much worse than that in their papers? Are the test splits used in their papers are different with yours and monodepth2? Please help me to figure it out.
Looking forward to your reply~ Thanks a lot~

RuntimeError

Trying to run using the distributed command as shown in run.py gives the following error.

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=Truetotorch.nn.parallel.DistributedDataParallel; (2) making sure all forwardfunction outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module'sforwardfunction. Please include the loss function and the structure of the return value offorward of your module when reporting this issue (e.g. list, dict, iterable).

Change of scale for intrinsic parameters on the commit "update some settings

Hello,

I noticed that a few weeks ago you updated the repository and the matrix K for the intrinsic parameters is now scaled accordingly to the image-size of the output of your autoencoder (half of the original image).

Was thing change present when you generated your results as provided in your article?

Thanks

how can i get weight/resnet/resnet50.pth

Anaconda install steps

It's not easy for me to download your anaconda environment, please share the required package version.

Package Dependency Errors

Hi,

I tried to use the environment you uploaded, but I got errors. I tried to Install the packages one by one from requirements.txt file, but the problem is when I install torch=1.2, the mmcv=0.4.4 cannot be install and keep giving an error same as the spatial-correlation-sampler package. however when I tried the last version of torch (1.7v) the mmcv=0.4.4 was installed successfully, but the spatial-correlation-sampler package not.

I tried both Ubuntu and windows 10. Is there any other versions that can be match together so I can try them as well?
Also, did you use both tensorflow and pytorch or only pytorch?

Eval Depth Performence

For KITTI Eigen Depth test by fm_depth.pth:

-> Evaluating
using mean scaling
Scaling ratios | med: 37.704 | std: 0.061

abs_rel| sq_rel| rmse| rmse_log| a1| a2| a3|
&0.097 &0.676 &4.299 &0.172 &0.905 &0.967 &0.984 \

-> Done!

The report I get is different from your paper. Could you tell me which one it belongs to or anything wrong with my testing.

@sconlyshootery

it seems the model has a poor general ability when test on the wild images

A conda issue and An error when running the code of config file 'cfg_kitti_fm_joint'

Hi, thanks for your brilliant work. I'm planning to run the code but encountered two issues.

Conda environment. The readme.md file and requirements.txt say the pytorch version should be >1.2，but I downloaded the provided py37t11.tar.gz conda env file and found the torch version is 1.1.0. So I wonder what is the right version of the required pytorch.
I use pytorch 1.7.1 and torchvision 0.8.2 to run the code. I choose to jointly train the depth network and the autoencoder using cfg file 'cfg_kitti_fm_joint.py' with resolution 192x640, and I change the path in the file to my server's setting. When running the code, an error occurs

' File "FeatDepth/mono/model/mono_fm_joint/layers.py", line 59, in forward
cam_points = depth.view(self.batch_size, 1, -1) * cam_points
RuntimeError: The size of tensor a (61440) must match the size of tensor b (122880) at non-singleton dimension 2'

I check the code and find the reason may lied in the inconsistent batchsize between 'depth' and 'cam_points'. Could you give me some guidance to settle this error? Thank you very much!

Training on NYU

Can you provide training and testing codes on NYU and data preparation?

How to run this code?

Hello,Does the path in the code need to be changed to my own path?Maybe my question is very low level,but when i run the command python run.py,many errors were reported.I am confused about the true command to execute the programe.Thanks for your help!

Inverting pose for fame -1

Hello,

I have two related questions:

Does the PoseNet predict source-to-target or target-to-source relative pose?
Why do you invert the predicted pose of the -1 frame, and not invert the inputs to the PoseNet?

Thank you,
Adi

Error after first epoch

I tried to run the code couple of times but every-time after the first epoch, got an error TypeError: Object of type float32 is not JSON serializable. only the first epoch result will be save and then this error pop up. Any idea how to fix this problem?

Issues when training with the pretrained autoencoder (config file: cfg_kitti_fm.py)

I trained the network with the fixed pretrained autoencoder (cfg_kitti_fm.py), and during the training procedure, I found in the log that the min perceptual loss (and smooth loss) keep to be zeros across all scales, while the reconstruction loss seems to be right. A random line of the log is shown below:

2021-02-15 11:14:17,379 - INFO - Epoch [1][450/2489] lr: 0.00009, eta: 5 days, 0:02:05, time: 2.996, data_time: 0.027, memory: 5717, ('min_reconstruct_loss', 0): 0.0328, ('min_perceptional_loss', 0): 0.0000, ('smooth_loss', 0): 0.0000, ('min_reconstruct_loss', 1): 0.0328, ('min_perceptional_loss', 1): 0.0000, ('smooth_loss', 1): 0.0000, ('min_reconstruct_loss', 2): 0.0328, ('min_perceptional_loss', 2): 0.0000, ('smooth_loss', 2): 0.0000, ('min_reconstruct_loss', 3): 0.0327, ('min_perceptional_loss', 3): 0.0000, ('smooth_loss', 3): 0.0000, loss: 0.1312

I wonder is there anything wrong in my setting to run the code? Or it is natual (the loss is super small) during the right training procedure?

Thanks a lot!

Batch size issue

Hi there, I found the parameter IMGS_PER_GPU varies in different .cfg file (=1 in cfg_kitti_fm.py and =2 in cfg_kitti_fm_joint.py). As the batchsize you mentioned in the paper is 2 (on 8 GPU), does that mean there are 2 training sample on each GPU and the total batchsize is 16?
I wonder how should I set the right batchsize (IMGS_PER_GPU and number of GPUs) so as to get the desired result on Monocular KITTI (0.104 0.729 4.481 0.179 0.893 0.965 0.984) ?
Thank you very much!

Problems on image reconstruction

why do you use the sigmoid function in decoder.py for final image reconstruction ? and the loss functions also use the reprojection loss instead of general L1 and L2 loss. Is there anything behind it ?

Number of parameters of the model

Hi, I wonder is it possible to calculate the number of parameters of the model and pre-trained weight?
Looking forward to your reply.
Thanks

Cityscapes

How can I train on Cityscapes dataset?

batch_size for eval_depth when add post_process

First of all, thank you for your amazing paper！
I set IMGS_PER_GPU = 6, workers_per_gpu = 4，training model on 4 gpus. When I evaluation my model adding post_process as monodepth2, I got "RuntimeError: Caught RuntimeError in DataLoader worker process 1." and "RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 370 and 374 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:612"
Traceback (most recent call last):
File "/home/pp/python/FeatDepth/scripts/eval_depth.py", line 128, in
evaluate(MODEL_PATH, CFG_PATH, GT_PATH)
File "/home/pp/python/FeatDepth/scripts/eval_depth.py", line 61, in evaluate
for batch_idx, inputs in enumerate(dataloader):
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 74, in
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 64, in default_collate
return default_collate([torch.as_tensor(b) for b in batch])
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 370 and 374 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:612

Traceback (most recent call last):
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/home/pp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/pp/anaconda3/envs/pytorch/bin/python3.7', '-u', '/home/pp/python/FeatDepth/scripts/eval_depth.py', '--local_rank=0']' returned non-zero exit status 1.

I think this is caused by the batch-size setting. The batch-size when I train the model is equal to 24, right?
dataloader = DataLoader(dataset,
2,
shuffle=False,
num_workers=4,
pin_memory=True,
drop_last=False)
To apply post_process, batch-size must be an integral multiple of 2, so I did not find a suitable value.
If I want to successfully adding post_process for evaluation, how can I solve this problem?
Looking forward to your recovery, thank you very much！

About the training of the autoencoder

Hi, I want to know why the dis is set as 0?

FeatDepth/config/cfg_kitti_autoencoder.py

Line 36 in 8350621

dis=0,

License

Hi,

I'm very interested in your work and would like to re-use some of it for my master's thesis. Would you be able to provide a license file, please?

Thanks in advance!

Best,
Patrick

How to choose featurenet when training separately?

Hi, it a exciting job offering a novel loss.
I have a quetion about featurenet. In your original paper, featurenet is trained separately. How do one choose a featurenet to serve for following training of depthNet and poseNet? Is there any metric?
Thank you so much！

Testing Single Image

Hi,

is there any code available to just run a single image and get the depth of the image as an output?
I tried to use and modify infer.py for that, but it seems code will collect the input from kitti raw dataset and it cannot be changed.

Thank you in advance.

Question about weight between featuremetric and photometric

Hello, I have a question about your implementation

When running your code I see that perception_loss (featuremetric) is 0.000. The likely culprit here is that in the config it's specified that the weight the loss is modified by (for feature-metric only) is 0.001. Meaning that feature-metric loss is a thousand times smaller in magnitude then photo-metric. Is this intended?

Thank you

	for frame_id in self.opt.frame_ids[1:]:
	src_f = outputs[("feature", frame_id, 0)]
	tgt_f = self.Encoder(inputs[("color", 0, 0)])[0]
	perceptional_losses.append(self.compute_perceptional_loss(tgt_f, src_f))
	perceptional_loss = torch.cat(perceptional_losses, 1)

	min_perceptional_loss, outputs[("min_index", scale)] = torch.min(perceptional_loss, dim=1)
	loss_dict[('min_perceptional_loss', scale)] = self.opt.perception_weight * min_perceptional_loss.mean() / len(self.opt.scales)

	def generate_features_pred(self, inputs, outputs):
	disp = outputs[("disp", 0, 0)]
	disp = F.interpolate(disp, [int(self.opt.height/2), int(self.opt.width/2)], mode="bilinear", align_corners=False)
	_, depth = self.disp_to_depth(disp, self.opt.min_depth, self.opt.max_depth)

sconlyshootery / featdepth Goto Github PK

featdepth's People

Contributors

Stargazers

Watchers

Forkers

featdepth's Issues

Recommend Projects

Recommend Topics

Recommend Org