lliuz / arflow Goto Github PK
View Code? Open in Web Editor NEWThe official PyTorch implementation of the paper "Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation".
License: MIT License
The official PyTorch implementation of the paper "Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation".
License: MIT License
类比学习是如何训练的,代码中有体现吗?
Hi,
Could you please let me know which GPU have you used to train the model. When I run the program with some modifications cuda is going out of memory.
during training, Occu_mask whether there is gradient return? when occu_mask is zeros, the loss is small;
if need occu_mask -> occu_mask.detach() ?
I have installed the repo using the Dockerfile you have upload. Although i have some problems with the first hands-on experience of the repo.
the first one might be the dataset path, what is the second?
python .\setup_win.py install
在windows跑上面的命令安装correlation packages失败了,即使把gcc路径和cuda路径换成windows上的路径也还是错了,请问在windows上如何跑?
I'm having issues using the correlation package. So, I was thinking of the native implementation you've provided. I would like to understand the differences between them.
Hi, thanks for sharing the code.
I noticed that if I train without smoothing loss and without ternary loss, the losses explode at around epoch 7 and then eventually become NaN.
Did you also observe that in your experiments and do you have any ideas what could cause this behaviour?
If i have my own unlabeled dataset (no ground truth) on which i want to train the ARFlow network.
How can i do it? Is there a rule of thumb of organizing the data in order to train the NN?
Thanks for your work, I found some point cloud interpolation works using PWC-Net to extract optical flow between point clouds, but none of them provided source code. I currently plan to reproduce these results, but I found that PWC-Net cannot be directly used to extract the optical flow of point clouds. I'm guessing someone is working on a problem related to mine, can you provide an idea? thanks.
Using PWC-Net to extract optical flow between point clouds:
Zhao L, Zhu Z, Lin X, et al. RAI-Net: Range-adaptive LiDAR point cloud frame interpolation network[C]//2021 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE, 2021: 1-6.
Hi, I found that with the given script to evaluate sintel model, the evaluation will do many epochs. Part of log:
./outputs/checkpoints/210107/231109/231109.log:9075:01-08 12:53:53 INFO - main_logger [base_trainer.py line 49] - * Epoch 164 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9130:01-08 12:58:50 INFO - main_logger [base_trainer.py line 49] - * Epoch 165 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9185:01-08 13:03:49 INFO - main_logger [base_trainer.py line 49] - * Epoch 166 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9240:01-08 13:08:52 INFO - main_logger [base_trainer.py line 49] - * Epoch 167 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9295:01-08 13:13:53 INFO - main_logger [base_trainer.py line 49] - * Epoch 168 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9350:01-08 13:18:46 INFO - main_logger [base_trainer.py line 49] - * Epoch 169 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9405:01-08 13:23:40 INFO - main_logger [base_trainer.py line 49] - * Epoch 170 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9460:01-08 13:28:31 INFO - main_logger [base_trainer.py line 49] - * Epoch 171 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9515:01-08 13:33:33 INFO - main_logger [base_trainer.py line 49] - * Epoch 172 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9570:01-08 13:38:19 INFO - main_logger [base_trainer.py line 49] - * Epoch 173 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9625:01-08 13:43:24 INFO - main_logger [base_trainer.py line 49] - * Epoch 174 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9680:01-08 13:48:09 INFO - main_logger [base_trainer.py line 49] - * Epoch 175 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9735:01-08 13:52:49 INFO - main_logger [base_trainer.py line 49] - * Epoch 176 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9790:01-08 13:57:33 INFO - main_logger [base_trainer.py line 49] - * Epoch 177 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9845:01-08 14:02:17 INFO - main_logger [base_trainer.py line 49] - * Epoch 178 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9900:01-08 14:06:59 INFO - main_logger [base_trainer.py line 49] - * Epoch 179 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:9955:01-08 14:11:54 INFO - main_logger [base_trainer.py line 49] - * Epoch 180 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:10010:01-08 14:16:43 INFO - main_logger [base_trainer.py line 49] - * Epoch 181 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:10065:01-08 14:21:27 INFO - main_logger [base_trainer.py line 49] - * Epoch 182 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:10120:01-08 14:26:11 INFO - main_logger [base_trainer.py line 49] - * Epoch 183 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:10175:01-08 14:30:56 INFO - main_logger [base_trainer.py line 49] - * Epoch 184 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:10230:01-08 14:35:34 INFO - main_logger [base_trainer.py line 49] - * Epoch 185 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:10285:01-08 14:41:58 INFO - main_logger [base_trainer.py line 49] - * Epoch 186 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:10340:01-08 14:46:42 INFO - main_logger [base_trainer.py line 49] - * Epoch 187 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:10395:01-08 14:51:35 INFO - main_logger [base_trainer.py line 49] - * Epoch 188 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:10450:01-08 14:56:20 INFO - main_logger [base_trainer.py line 49] - * Epoch 189 EPE_0: 2.79 EPE_1: 3.73
./outputs/checkpoints/210107/231109/231109.log:10505:01-08 15:00:59 INFO - main_logger [base_trainer.py line 49] - * Epoch 190 EPE_0: 2.79 EPE_1: 3.73
I 'd like to test your models on my own images, but each time it only accepts three images. will you support this option?
How can one use this code to train with the KITTI dataset?
Also where can I find the sintel_raw.json file etc. I downloaded the sintel dataset but i cannot find the json files
during training the program printing these two values, could you please explain what is the difference?
ARFlow is a great job. I learn a lot from it.
I want to know that the parameter in
Sintel_ft_ar.json : line 61
"rotate": [-0.2, 0.2, -0.015, 0.015]
and sp_transfroms.py: line 229
phi.uniform_(-min_rotate, max_rotate)
So the rotation is always 0,2 or 0.015. Why not set it as a random number like "trans" or "zoom"?
Hi,
first of all, great paper and great code, thank you for sharing it :)
I was wondering - why do you scale up the loss before the backward() call (multiplying by 1024.), and then dividing it again before the weights update?
Thanks for your wonderful code!
I wonder the epe on trainval of sintel-clean/final for the model of sintel_ft.json without AR.
Could u also provide the sintel_ft.tar model?
By the way, the training looks time-consuming with epoch_num=1000. Is it necessary to train the model with 1000 epoch? And is the setting with_bk=True
important when training sintel_raw/sintel_ft?
Hi,
I would like to provide an acceleration strategy that can address the problem of slow optical flow estimation speed in >pytorch1.1 version. Since we want to predict the dense optical flow of a long 15fps or 30fps video, the time consumption could be a big concern with respect to the GPU utils and training speed.
After analyzing the entire training procedure of the PWC-Net, I found that the time bottleneck mainly came from two parts: correlation function and ContextNetwork.
For the correlation function, since the dockerfile is torch 1.1.0 + cuda 9.0, it can not be applied for cuda 10.1 version directly. For this part, a simple strategy is to change the 'cuda path' in l.21 of setup.py to the current cuda version. We have tested that the 10.1, 10.2, and 11.0 are all available. Besides, the higher cuda version requires higher gcc and g++ versions. The results show that gcc/g++ 7.3.0/7.3.1 is ok for almost all kinds of cuda and torch versions, while gcc/g++ 4.9 is only available for PyTorch 1.1.0+torchvision 0.3.0. If you directly update the gcc/g++ version from 4.9 to 7.3, an additional error may be thrown as "ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found" In this case, the first step is to check if there remain applicable libstdc++ version by the following command: strings /data/anaconda3/bin/../lib/./libstdc++.so.6 | grep GLIBCXX_3.4.20. If there exists, add this path to the environment variables as follows: export LD_LIBRARY_PATH=/data/anaconda3/lib:$LD_LIBRARY_PATH. After these operations, the cuda acceleration version of the correlation function is available for almost any kind of cuda/torch/torchvision version. And the speed is around 2x~5x faster than the python version correaltion_native.py.
Moreover, the most time-consuming part is the ContextNetwork part, which is very counter-intuitive and hard to find. We do not know whether this problem belongs to the PyTorch version conflict, but we did find a way to fix it. Specifically, if you use pytorch>1.1.0 to run the pwclite network, no matter using the correlation_cuda or correlation_native, the entire optical flow estimation time cost will be around 5x~10x higher than the PyTorch 1.1 version. We analyze the time cost of each part in the pwclite.py and localize the ContextNetwork class. However, this class is extremely simple since it only contains a sequence of conv functions, so we try to modify the conv function in l.12, pwclite.py. If we change the bias parameter to False, the speed will be as fast as the original version, while setting the bias parameter to True could lead to a slower estimation. However, though this function is applied in many classes, such as FeatureExtractor, FlowEstimatorDense, etc., the time costs do not change for all classes except the ContextNetwork classes, which I feel weird. Whatever, if we simply change the bias to False, the speed will be normal. However, we do not know whether this change could lead to any influence on the performance, thus we decide to only change the bias parameter in the ContextNetwork class, which can be implemented by adding a control argument in the function conv. By doing so, we only change the seven convolution layers in the ContextNetwork class, hoping will not impact the performance badly.
After performing these two changes above, the optical flow estimation speed will have a 5x~20x boost. I am aware that simply using the dockerfile and the fixed environment is a simpler way to reproduce, yet we hope our experience will help more researchers to expand this nice work to generalize more environments.
I would appreciate it if you can clarify for me the following points related to specific parts of the "sp_transforms.py" file :
In transform_flow() , what is the purpose of the following block of code ?
` # inverse transform coords
x0, y0 = self.inverse_transform_coords(
width=width, height=height, thetas=theta1)
x1, y1 = self.inverse_transform_coords(
width=width, height=height, thetas=theta2, offset_x=u, offset_y=v)
# subtract and create new flow
u = x1 - x0
v = y1 - y0
new_flow = torch.stack([u, v], dim=1)`
Why didn't you use only the following part of the code to transform the optical flow just like the transform_image() function ?
` # transform coords
xq, yq = self.transform_coords(width=width, height=height, thetas=theta1)
# interp2
transformed = self._flow_interp2(new_flow, xq, yq)`
What is the difference between the functionality of inverse_transform_coords() and transform_coords() ?
How to remove all images included in the KITTI 2012 and 2015 train and test sets from the KITTI raw dataset? Thanks!
In your code you have the following lines:
if self.cfg.occ_from_back: # True
occu_mask1 = 1 - get_occu_mask_backward(flow[:, 2:], th=0.2)
occu_mask2 = 1 - get_occu_mask_backward(flow[:, :2], th=0.2)
else:
occu_mask1 = 1 - get_occu_mask_bidirection(flow[:, :2], flow[:, 2:])
occu_mask2 = 1 - get_occu_mask_bidirection(flow[:, 2:], flow[:, :2])
And in the training json it says:
occ_from_back": true
in one part and
"stage1": {"epoch": 50,
"loss": {"occ_from_back": false,
"w_l1": 0.0,
"w_ssim": 0.0,
"w_ternary": 1.0}},
in the later part.
Does that mean that at epoch 50, you switch to occ_from_back to false? What is the difference between get_occu_mask_backward and get_occu_mask_bidirection? Is it because at the start we dont have any logical flow so you just use a threshold?
could you explain please how to get the testing results on KITTI dataset?
thanks ..
Hi, first of all great work and fantastic code!
I'm trying to recreate your reported results on the Sintel Training:
Thanks!
Hi,
I noticed there're several config files and checkpoints for each dataset. Take KITTI as an example, what's the corresponding relation between config files kitti15_ft_ar.json kitti15_ft.json kitti_raw.json
and checkpoints pwclite_ar_mv.tar pwclite_ar.tar pwclite_raw.tar
? Which config should I use if I want to reproduce these three models?
Besides, I tried to evaluate the checkpoint pwclite_ar_mv.tar
with all the three config files and always got the following error:
[INFO] => using pre-trained weights checkpoints/KITTI15/pwclite_ar_mv.tar.
Traceback (most recent call last):
File "train.py", line 50, in
basic_train.main(cfg, _log)
File "/proj/xcdhdstaff1/wenjingk/SLAM/ARFlow-master/basic_train.py", line 53, in main
train_loader, valid_loader, model, loss, _log, cfg.save_root, cfg.train)
File "/proj/xcdhdstaff1/wenjingk/SLAM/ARFlow-master/trainer/kitti_trainer.py", line 13, in init
train_loader, valid_loader, model, loss_func, _log, save_root, config)
File "/proj/xcdhdstaff1/wenjingk/SLAM/ARFlow-master/trainer/base_trainer.py", line 26, in init
self.model = self._init_model(model)
File "/proj/xcdhdstaff1/wenjingk/SLAM/ARFlow-master/trainer/base_trainer.py", line 75, in _init_model
model.load_state_dict(weights)
File "/scratch/workspace/wenjingk/anaconda-3.6/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PWCLite:
size mismatch for flow_estimators.conv1.0.weight: copying a param with shape torch.Size([128, 198, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 115, 3, 3]).
size mismatch for context_networks.convs.0.0.weight: copying a param with shape torch.Size([128, 68, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 34, 3, 3]).
I just need explanation of files that should I upload to get kitti_15 testing result? because I have uploaded them many times but I have gotten this error:
ERROR: Zip file content is either invalid or too large (>500 MB). Please try again! (accumulated size of files in zip file: 0 MB)
I selected Stereo / Flow / Scene Flow 2015 option
I have uploaded zip file called flow and contains flow images inside (200 png images numbered from 000000_10.png to 000199_10.png) which are the flow of the testing images data_scene_flow/testing/image_2
what should be the error please? they asked to provide (Provide a zip file which contains the 'disp_0' directory (stereo), the 'flow' directory (flow), or the 'disp_0', 'disp_1' and 'flow' directories (scene flow) in its root folder. Use the file format and naming described in the readme.txt (000000_10.png,...,000199_10.png).
how to get the disp_0 ? I want to check the flow only
Dear @lliuz,
thanks for your great work. I am testing your networks on Sintel clean.
In the paper you show lower values, but the parenthesis indicate that the network has been trained on that data.
Do you think I might be doing something wrong or my results are reliable to you?
Many Thanks,
Stefano
Hi, When I directly use "sintel_ft.json" configuration, the loss is Nan at beginning. But when I firstly use "sintel_raw.json" configuration, the training processing is normal. Why? what is the difference between two configuration? and Why can not directly using "sintel-ft.json" to train ARFlow?
First of all, thank you for sharing the code. Awesome work!
I was wondering if you could explain the settings of different pre-trained models provided under the checkpoints
folder. For example, there are three models for the Sintel
dataset. I assume pwclite_raw.tar
comes from pre-training on Sintel raw videos without AR. But how about pwclite_ar.tar
? In the paper, it is mentioned that AR is not used for pre-training. Any clarification will be appreciated. Thanks.
I'm trying to apply ARFlow on Sintel database, which has images of shape (436, 1024). What is the recommended test shape value?
For the example given, images are of shape (375, 1242), but the used test shape is (384,640). I didn't understand how you got that value
Hi, read your paper and very impressive, and thank you for sharing your code. I'm trying your code recently, maybe not dig into too much details yet, just plainly try to reproduce the fine-tune with Sintel datasets. The loss doesn't go down, and stays around 0.7, and the evaluation in epoch 68 is "EPE_0: 3.19 EPE_1: 4.22", is this normal? Because all I have done is to download official Sintel dataset and try the command "python3 train.py -c sintel_ft_ar.json", and I also use "correlation_native".
if yes, how? what parameters should I change to do so?
Hi, I am having issues running correlaton_native.py during the backward phase:
RuntimeError: CUDA error: an illegal memory access was encountered
I first modified your implementation to update it to PyTorch 1.6.0 and ran into this issue.
So then I tried to use your docker file, however jonathonf removed his python3.6 repository for ubuntu 16.04. Consequently I made the following changes to the docker file:
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
RUN pip3 install https://download.pytorch.org/whl/cu100/torch-1.1.0-cp36-cp36m-linux_x86_64.whl
This still resulted in errors during the backpropation stages, specifically during:
correlation_backward_input1
correlation_backward_input2
I tried printing the dims to make sure the tensor shapes were correct in some of the functions:
(Pytorch backward)
Grad Output torch.Size([4, 81, 120, 120])
Input Dims torch.Size([4, 256, 128, 128])
(correlation_backward_cuda)
Input batch: 4 ch: 256 h: 128 w: 128
(correlation_backward_cuda_kernel, after channels_first calls)
rInput batch: 4 ch: 128 h: 128 w: 256
gradInput1 batch: 4 ch: 256 h: 128 w: 128
gradOutput batch: 4 ch: 81 h: 120 w: 120
Any idea where the issue is arising from? Is there a subtle difference in changing CUDA9->10 in the docker image?
Is the correlation module here the same as IRR?
I have created environment for IRR and can run the code smoothly, so I want to know:
if I have install correlation module follow the IRR's instruction, can I use it for ARFlow too?
should be
min_rotate=self.cfg.rotate[2], max_rotate=self.cfg.rotate[3],
@lliuz
Hello, Sir. Thanks for your sharing. When I read your paper, I find the following formula is hard to understand for me. I want to know whether the tao_theta in the second formula should be the inverse of tao_theta? Thank you very much!
Could you please share the method how to train the multi-view model ? It seems that there is no trainer for multi-view version.
@lliuz
Hello, thanks very much for your sharing! I can run the whole project without any problem. But when I want to measure the running time of your correlation_cuda module by python correlation_native.py, I encounter the cuda error like this:
I don't know how to solve this problem.
I downloaded the following files :
Kitti raw , data_stereo_flow for kitti 2012 , data_scene_flow for kitti 2015
and data_stereo_flow_multiview , data_scene_flow_multiview
first I want to train the model with kittiraw and validate using kitti stereo and kitti scene but I got data assertion error , I thinks that's because may be there is wrong dataset used
could you explain please "according to your config files " what should be the files specified for the pretraining and finetuning if I want to pretrain with kitti raw and fine tune with multiview as you explained in your paper
thanks in advance ..
您好,感谢您开源的完备的代码,最近在研究您代码的时候发现在kitti的训练过程中将图像resize到了256×832的尺寸,想问您这样做的优势在于什么,如果在原始分辨率下进行实验的话会对精度有较大的影响吗。期待您的回答,谢谢!
Hello, thank you for the open source complete code. Recently, when researching your code, I found that during the training process of kitti, I resized the image to a size of 256×832. I want to ask you what is the advantage of doing this. If it is at the original resolution Does the experiment have a greater impact on accuracy? Looking forward to your answer
I notice your aflow has two steps during training. At 'raw data' step, you use 'stage1' as second stage. But I observe the hyper parameters at "stage1" is the same as "ft.json". So I want to know why you train arflow twice with same hyper parameters?
you saved two model the last and best modet, which one you used for finetuning? for testing?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.