Coder Social home page Coder Social logo

gvnn's Introduction

gvnn: Neural Network Library for Geometric Vision, ECCV Workshop on Deep Geometry, 2016

Ankur Handa, Michael Bloesch, Viorica Patraucean, Simon Stent, John McCormac, Andrew Davison

Link to the paper gvnn

What is gvnn?

gvnn is primarily intended for self-supervised learning using low-level vision. It is inspired by the Spatial Transformer Networks (STN) paper that appeared in NIPS in 2015 and its open source code made available by Maxime Oquab. The code is self contained i.e. the original implementation of STN by Maxime is also within the repository.

STs were mainly limited to applying only 2D transformations to the input. We added a new set of transformations often needed for manipulating data in 3D geometric computer vision. These include the 3D counterparts of what were used in original STN together with a lot more new transformations and different M-estimators.

  • SO3 layer - Rotations are expressed in so3 vector (v1, v2, v3)

  • Euler layer - Rotations are also expressed in euler angles

  • SE3 and Sim3 layer

  • Camera Pin-hole projection layer

  • 3D Grid Generator

  • Per-pixel 2D transformations

    • 2D optical flow
    • 6D Overparameterised optical flow
    • Per-pixel SE(2)
    • Slanted plane disparity
  • Per-pixel 3D transformations

    • 6D SE3/Sim3 transformations
    • 10D transformation
  • M-estimators

Below you will see some examples of how to use gvnn to set up architectures for self-supervised learning. We plan to make this a comprehensive and complete library to bridge the gap between geometry and deeplearning.

We are also performing large scale experiments on data collected both from real world and our previous work, SceneNet to test different geometric computer vision algorithms e.g. dense image registration, 3D reconstruction and place recognition for loop closure.

Recommendation

Please do a fresh pull in case you spot any errors since the repository is getting updated regularly.

Installation

luarocks make gvnn-scm-1.rockspec

How to run gvnn on just CPU

  • Comment out require 'libcugvnn' from init.lua.
  • Use the CMakeLists_CPU.txt i.e. copy CMakeLists_CPU.txt to CMakeLists.txt.
  • Do a fresh install of gvnn and if possible uninstall the previous gvnn version.

Unit tests - Forward/Backward pass checks

All the relevant unit tests are in test.lua. The gif image below shows how to run the this file and check for any forward/backward pass errors in the layer implementation.

All the modules that are in the repository have been tested properly and pass the forward and backward pass checks as defined in the test.lua. In case of any errors or visible hot-spots you may find in the code, please create an issue.

SO3 Layer

Rotations are represented as so(3) 3-vector. This vector is turned into rotation matrix via the exponential map. For a more detailed view of the so(3) representation and exponential map read this tutorial from Ethan Eade: Lie-Algebra Tutorial. This is what the exponential map is Exponential Map. Also, Tom Drummond's notes on Lie-Algebra are a great source to learn about exponential maps Tom Drummond's notes. The reason for choosing so3 representation is mainly due to its appealing properties when linearising rotations (via taylor series expansion) for iterative image alignment via classic linearise-solve-update rule. The figure below shows how linearisation for SO3 is fitting a local plane on the sphere

Montage-2

The backprop derivatives of this rotation parameterisation is all you need to make sure you can insert this layer within a network - the derivatives are a bit involved but they look like this

Montage-1

However, this derivative has singularity at (0,0,0) because of the division by the norm of the vector. Therefore, we have a threshold to check if the magnitude is small enough that we can use a first-order approximation of the exponential map. The derivatives of this linearised version are nothing but the Generators of the exponential map Generators

To set up 3D rotation warping, you first need to homogenise the x,y positions to [x, y, 1]^T, apply the inverse camera calibration matrix to get the ray in 3D. This ray is rotated with the rotation and then backprojected into the 2D plane with PinHoleCameraProjection layer and interpolated with bilinear interpolation.

require 'nn'
require 'gvnn'

concat = nn.ConcatTable()

height = 240
width  = 320
u0     = 160
v0     = 120

fx = 240
fy = 240

-- first branch is there to transpose inputs to BHWD, for the bilinear sampler
tranet=nn.Sequential()
tranet:add(nn.SelectTable(1))
tranet:add(nn.Identity())
tranet:add(nn.Transpose({2,3},{3,4}))

rotation_net = nn.Sequential()
rotation_net:add(nn.SelectTable(2))
rotation_net:add(nn.TransformationRotationSO3())
rotation_net:add(nn.Transform3DPoints_R(height, width, fx, fy, u0, v0))
rotation_net:add(nn.PinHoleCameraProjectionBHWD(height, width, fx, fy, u0, v0))
rotation_net:add(nn.ReverseXYOrder())

concat:add(tranet)
concat:add(rotation_net)

warping_net = nn.Sequential()
warping_net:add(concat)
warping_net:add(nn.BilinearSamplerBHWD())
warping_net:add(nn.Transpose({3,4},{2,3}))

This is how to use the previous network to warp and plot the image

require 'image'
require 'nn'
require 'torch'

dofile('imagewarpingSO3.lua')

x = image.loadPNG('linen1.png')
input = torch.Tensor(1,1,240,320)
input[1] = x

r = torch.Tensor(1,3):zero()
r[1][1] = 0.2
--r[1][2] = 0.3
--r[1][3] = 0.4

t = {input, r}

out_w = warping_net:forward(t)

w = out_w[1]

image.display(x)
image.display(w)

image.save('warped.png', w)

For running on cuda just do :cuda() wherever needed. e.g. warping_net = warping_net:cuda(), input = input:cuda() and r = r:cuda()

Montage-0

SE3 Layer

require 'nn'
require 'gvnn'

--dofile('ReverseXYOrder.lua')

concat = nn.ConcatTable()
concat_Rt_depth = nn.ConcatTable()


height = 480--240
width  = 640--320
u0     = 320--160
v0     = 240--120

fx =  480 --240
fy = -480 --240

-- first branch is there to transpose inputs to BHWD, for the bilinear sampler
tranet=nn.Sequential()
tranet:add(nn.SelectTable(1))
tranet:add(nn.Identity())
tranet:add(nn.Transpose({2,3},{3,4}))

-- converts the 6-vector (3-vector so3 for rotation and 3-vector for translation)
Rt_net = nn.Sequential()
Rt_net:add(nn.SelectTable(2))
Rt_net:add(nn.TransformationMatrix3x4SO3(true,false,true))

depth = nn.Sequential()
depth:add(nn.SelectTable(3))

concat_Rt_depth:add(Rt_net)
concat_Rt_depth:add(depth)

Transformation3x4net = nn.Sequential()
Transformation3x4net:add(concat_Rt_depth)
Transformation3x4net:add(nn.Transform3DPoints_Rt(height, width, fx, fy, u0, v0))
Transformation3x4net:add(nn.PinHoleCameraProjectionBHWD(height, width, fx, fy, u0, v0))
Transformation3x4net:add(nn.ReverseXYOrder())

concat:add(tranet)
concat:add(Transformation3x4net)

warping_net = nn.Sequential()
warping_net:add(concat)
warping_net:add(nn.BilinearSamplerBHWD())
warping_net:add(nn.Transpose({3,4},{2,3}))
require 'gvnn'
require 'torch'
require 'image'

dofile('imagewarpingSE3.lua')

--local height=480
--local width =360

ref_rgb_image   = image.load('iclnuim/rgb/100.png')

ref_depth_image = image.load('iclnuim/depth/100.png')
ref_depth_image = (ref_depth_image*65535)/5000.0

print(ref_rgb_image:size())
print(ref_depth_image:size())

--image.display(ref_rgb_image)
--image.display(ref_depth_image)

data_ref_rgb      = torch.Tensor(1,3,480,640)
data_ref_rgb[1]   = ref_rgb_image

data_ref_depth    = torch.Tensor(1,1,480,640)
data_ref_depth[1] = ref_depth_image

so3_t_vector      = torch.Tensor(1,6):uniform()

-- tx, ty, tz, rx, ry, rz
-- -0.00119339 -0.00449791 -0.00122229 0.00104319 -0.00694122 -0.00333668

--- so3 and translation vector

so3_t_vector[1][1] = 0--  0.00104319
so3_t_vector[1][2] = 0-- -0.00694122
so3_t_vector[1][3] = 0-- -0.00333668

so3_t_vector[1][4] = 0-- -0.00119339
so3_t_vector[1][5] = 0-- -0.00449791
so3_t_vector[1][6] = 0-- -0.00122229

inputTable = {data_ref_rgb:cuda(), so3_t_vector:cuda(), data_ref_depth:cuda()}

outImage = warping_net:cuda():forward(inputTable)

image.display(outImage[1])

expand...

Optical Flow

Optical flow is a 2D motion vector per-pixel. In many standard computer vision formulations, it is obtained via the solutions of a partial differential equations involving a data term which measures the pixel colour discrepency between the reference image at time t and a new image at time t+1, and a regulariser which helps smooth out the flow vectors at the neighbouring pixels. We provide two formulations of the optical flow vector i.e. the standard minimal parameterisation 2D vector and an over-parameterised 6DoF optical flow. Below, we show an example of how to use this layer to do self-supervised learning. The optical flow predicted by a convolutional LSTM is used to warp the frame at time t on frame at t+1. The relevant paper and code is available here.

Montage-0 Montage-1

Spatio-temporal autoencoder with differential memory. Viorica Patraucean, Ankur Handa, Roberto Cipolla, ICLRWorkshop Track 2016

Disparity

Again, standard low-level vision provides an intuitively appealing way to do self-supervised learning. Now let us imagine instead of two frames in a video what if we had a stereo pair? We can then warp the left frame on top of the right in a similar way where the network instead predicts the disparity.

Montage-0 Montage-1

Unsupervised CNN for Single View Depth Estimation: Geometry to the rescue. Ravi Garg, Vijay Kumar BG, Gustavo Carneiro, Ian Reid, ECCV 2016.

Projection Layer

The projection layer allows to project 3D data onto a 2D image plane via the projection matrix (in our case we use pin-hole camera projection matrix). This is extremely useful for data involving any 3D point cloud, depth and/or mesh and their projections in the 2D plane. This is differentiable only upto a point i.e. the forward/backward pass checks fail if the z-coordinate is below a certain threshold.

![Montage-0](assets/projection_layer.png)

Lens Distortion

Montage-0 expand...

Nonrigid SO3

expand...

Nonrigid SE3

Tracking non-rigid deformable objects is possible via a full dense per-pixel SE3 motion field. We provide a non-rigid se3 layer which predicts per-pixel se3 vector that allows to warp one depth image onto another as a means to do self-supervised learning.

![Montage-0](assets/non-rigid.png)

SE3-Nets: Learning Rigid Body Motion using Deep Neural Networks, Arunkumar Byravan and Dieter Fox, arXiv, 2016.

M-estimators

M-estimators have a long history in traditional computer vision and statistics. Michael Black's early papers in the 90s provide a compendium of various m-estimators and how most of them are superior to the standard L2 loss function and their ability to cull the outliers from the estimation of model parameters. We provide 4 different m-estimators namely, L2, Huber, Cauchy and Tukey. Montage-0

Future Improvements

Bilinear interpolation can use the tex2D function within CUDA to speed up the interpolation. Also, need to add interpolation with taylor series expansion as done in classic PDE based variational optimisation methods. Warping should be done at a higher resolution and blurred and downsampled later on i.e. DBW model used in Unger's super-resolution method.

License

GPL. We would like to thank Dyson Technologies Limited for supporting this work.

Contact

Ankur Handa (handa(dot)ankur(at)gmail(dot)com)

Acknowledgements

If you find the code useful, please consider citing the following

@inproceedings{Handa:etal:ECCVW16,
  author    = {Ankur Handa and 
               Michael Bloesch and 
               Viorica P{\u a}tr{\u a}ucean and
               Simon Stent and
               John McCormac and
               Andrew Davison},
  title     = {gvnn: Neural Network Library for Geometric Computer Vision},
  booktitle = {ECCV Workshop on Geometry Meets Deep Learning},
  year      = {2016}
}
@Misc{STNImplementation,
    author = {Maxime Oquab},
    title={{Open Source Implementation of Spatial Transformer Networks}},
    howpublished={URL https://github.com/qassemoquab/stnbhwd},
    year={2015}
}

gvnn's People

Contributors

ankurhanda avatar manuelruder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gvnn's Issues

access to nil value in updateGradInput of OpticalFlow2D.lua

Hi, I am using OpticalFlow2D.lua in pix2pix framework. I use it at the very end of the generator network like this:

    local d8 = d7 - nn.ReLU(true) - nn.SpatialFullConvolution(ngf * 2, 2, 4, 4, 2, 2, 1, 1)
    local d8_transposed = d8 - nn.Transpose({3,4},{2,4})
    local flowGrid = d8_transposed - nn.OpticalFlow2DBHWD(height,width)
    local inp_transposed = input - nn.Transpose({3,4},{2,4})
    local o1 = {inp_transposed,flowGrid} - nn.BilinearSamplerBHWD() - nn.Transpose({2,4})
    netG = nn.gModule({input},{o1})

In the backward pass, I get the following error

transferring to gpu...
done
/home/msarkar/torch/install/bin/luajit: ...e/msarkar/torch/install/share/lua/5.1/cutorch/Tensor.lua:22: attempt to index local 'tensor' (a nil value)
stack traceback:
        ...e/msarkar/torch/install/share/lua/5.1/cutorch/Tensor.lua:22: in function 'typeAs'
        ...arkar/torch/install/share/lua/5.1/gvnn/OpticalFlow2D.lua:72: in function 'updateGradInput'
        .../msarkar/torch/install/share/lua/5.1/nngraph/gmodule.lua:420: in function 'neteval'
        .../msarkar/torch/install/share/lua/5.1/nngraph/gmodule.lua:454: in function 'updateGradInput'
        /home/msarkar/torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward'
        train.lua:307: in function 'opfunc'
        /home/msarkar/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
        train.lua:357: in main chunk
        [C]: in function 'dofile'
        ...rkar/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00406670

I commented out the :typeAs(optical_flow) in updateGradInput of OpticalFlow2D.lua . This makes the backward to run smoothly. But I am not sure if this has any side effects. Could you please tell me the correct way to remove this problem?

Source Code For "Application: Training on RGB-D Visiual Odometry"

Hi,
My name is Jack,
I wonder if you could provide me with the source code of the application posed in part 3 of your paper "GVNN"?(If you had already uploaded the source code, would you mind telling me which file I should check on, because I am very new to torch) I find your job very interesting and I want to know more about it. Thanks in advanced!

"CUDA driver version is insufficient for CUDA runtime version" error

Hello,

I'm using Ubuntu 16.04 with the driver nvidia-387.
When I run "require 'gvnn'", I get the following error:

THCudaCheck FAIL file=/home/puren/torch/extra/cutorch/lib/THC/THCGeneral.c line=70 error=35 : CUDA driver version is insufficient for CUDA runtime version
/home/puren/torch/install/share/lua/5.1/trepl/init.lua:389: /home/puren/torch/install/share/lua/5.1/trepl/init.lua:389: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /home/puren/torch/extra/cutorch/lib/THC/THCGeneral.c:70
stack traceback:
	[C]: in function 'error'
	/home/puren/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	[string "_RESULT={require 'gvnn'}"]:1: in main chunk
	[C]: in function 'xpcall'
	/home/puren/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
	...uren/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk
	[C]: at 0x00405d50

Has anyone got a similar error before and have any idea how to solve this issue?

Error: attempt to index field 'CudaByteStorage

Hello, I am trying to run gvnn on AWS Deep Learning Ubuntu, but receive this error in torch:

th> require 'gvnn'
/home/ubuntu/torch/install/share/lua/5.2/trepl/init.lua:389: /home/ubuntu/torch/install/share/lua/5.2/trepl/init.lua:389: /home/ubuntu/torch/install/share/lua/5.2/cutorch/init.lua:4: attempt to index field 'CudaByteStorage' (a nil value)
stack traceback:
	/home/ubuntu/torch/install/share/lua/5.2/trepl/init.lua:506: in function </home/ubuntu/torch/install/share/lua/5.2/trepl/init.lua:499>
	[C]: in function 'error'
	/home/ubuntu/torch/install/share/lua/5.2/trepl/init.lua:389: in function 'require'
	[string "_RESULT={require 'gvnn'}"]:1: in main chunk
	[C]: in function 'xpcall'
	/home/ubuntu/torch/install/share/lua/5.2/trepl/init.lua:661: in function 'repl'
	...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk
	[C]: in ?	
                                                                      [0.0008s]	

Add to rocks

This look to be pretty comprehensive - do you think it's good to add to the list of Torch rocks?

I've already added it to the cheatsheet for you ;)

How to install the package? Building NVCC (Device) object CMakeFiles fund some error.

hi,@ankurhanda ,
The following is the output after command "luarocks make gvnn-scm-1.rockspec"
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/hpj/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/hpj/torch/install/lib/luarocks/rocks/gvnn/scm-1" && make

-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Torch7 in /home/hpj/torch/install
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Compiling with OpenMP support
-- Found CUDA: /usr (found suitable version "7.5", minimum required is "4.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /home/hpj/gvnn/build
Scanning dependencies of target gvnn
[ 25%] Building C object CMakeFiles/gvnn.dir/init.c.o
[ 50%] Linking C shared module libgvnn.so
[ 50%] Built target gvnn
[ 75%] Building NVCC (Device) object CMakeFiles/cugvnn.dir/cugvnn_generated_init.cu.o
/usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’:
/usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope
return (char *) memcpy (__dest, __src, __n) + __n;
^
CMake Error at cugvnn_generated_init.cu.o.cmake:267 (message):
Error generating file
/home/hpj/gvnn/build/CMakeFiles/cugvnn.dir//./cugvnn_generated_init.cu.o

CMakeFiles/cugvnn.dir/build.make:63: recipe for target 'CMakeFiles/cugvnn.dir/cugvnn_generated_init.cu.o' failed
make[2]: *** [CMakeFiles/cugvnn.dir/cugvnn_generated_init.cu.o] Error 1
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/cugvnn.dir/all' failed
make[1]: *** [CMakeFiles/cugvnn.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

Error: Build error: Failed building.
THX~

I have a question with this code.

hi,@ankurhanda ,
Thank you for proving this awesome gvnn code.
I want to convert some of your code to python,but I don't know this code mean~
inputImages.nn.BilinearSamplerBHWD_updateOutput(self, inputImages, grids)
I searched for information but did not get useful information.
Could you please give me some hints?
Thanks!

detailed structure of your gvnn siamese network

Can you tell me the detailed structure of the siamese network in the section "Application: Training on RGB-D Visual Odometry"? You mention that two heads are fused early but don't tell at which stage they are fused.
Thanks!
Siyuan

invariances in sift

"... something not possible with pure geometric methods that either rely on pixel values or SIFT-like features"

Following the sift paper ( lowe 2004) that descriptor should be robust to changes in illuminations because it uses as a low level feature the orientation of (localy sampled) gradients.
SIFT has also been used for many works in internet photo collections that use pictures from flikr or other uncontrolled acquisition sources.

Maybe i am not understanding that statement...
(A github issue may not be a good place for a theoretical question)

Projecting the output of NonRigidPerPixelSE3

Hello!
I am using test_warpingSE3.lua as a basis for experimentation. If I provide NonRigidPerPixelSE3 a BxWxHx3x3 tensor (3-rot, 3-trans, per pixel) it outputs a tensor in the form BxWxHx3x4 (which is likely a formal 3x4 transformation matrix)

The problem I have is I am not sure if gvnn provides a transform layer for data in this form. Transform3DPoints_depth, Transform3DPoints_r, Transform3DPoints_rt do not seem to be up to the task?

How does one utilize the output of NonRigidPerPixelSE3?

Best wishes,
Michael

A bug in derivatives derivation?

Hello,

There seems to be a small issue for derivatives near identity. A multiplication with a Rotation matrix should be applied only when omega_mag > threshold, for another case since:

R = I + w_x

A partial derivative will be just:

\partial{R}{\partial w_i} = G_i

where G_i is a generator matrix for the i-th element (Section 3.3 Gallego et al. (2014)).

Best Regards,

Minh

update cmakelists NVCC code architecture

In case you are getting the following error:

Building NVCC (Device) object CMakeFiles/cugvnn.dir/cugvnn_generated_init.cu.o
nvcc fatal   : Value 'sm_20' is not defined for option 'gpu-architecture'
CMake Error at cugvnn_generated_init.cu.o.cmake:207 (message):
  Error generating
  /path/gvnn/build/CMakeFiles/cugvnn.dir//./cugvnn_generated_init.cu.o

Change line 54 of gvnn/CMakelists.txt from: LIST(APPEND CUDA_NVCC_FLAGS "-arch=sm_20")
to LIST(APPEND CUDA_NVCC_FLAGS "-arch=sm_30")

Where is the M-estimators?

Hi,

Thanks for sharing this great work.

I am trying to apply different loss functions. However, I have not found anything related to M-estimator? Can you help providing some information about M-estimators?

Thanks!

access to bad image in AffineTransform.

hi,I use test_warpingSO3.lua to achieve the three-dimensional rotation of the image. Next I intend to implement the scale transformation and translation of the picture. I used the following code:
require 'image'
require 'nn'
require 'torch'
dofile('imagewarpingSO3.lua')
x = image.loadPNG('linen1.png')
input = torch.Tensor(1,1,240,320)
input[1] = x
input = nn.Transpose({2,3},{3,4}):forward(input)

r = torch.Tensor(1,6):zero()
r[1][1] = 1
r[1][5] = 1

out_r = nn.AffineTransformMatrixGenerator(True,True,True):forward(r)
out_grid = nn.AffineGridGeneratorBHWD(240,320):forward(out_r)

t = {input, out_grid}
out_img = nn.BilinearSamplerBHWD():forward(t)
out_img = nn.Transpose({3,4},{2,3}):forward(out_img)
out_img = out_img[1]
image.display(out_img)

But I got a bad image and I don't know where it went wrong. Thank you!

Test in `test.lua` not passed

Hello,
I've run tests in gvnn with this result:

`~/gvnn$ th

______ __ | Torch7
/_ / ________/ / | Scientific computing for Lua.
/ / / _ / __/ / _ \ | Type ? for help
/_/ _
// _//// | https://github.com/torch
| http://torch.ch

th> require 'gvnn'
{
VolumetricMaxUnpooling : {...}
ConcatTable : {...}

...

TemporalRowConvolution : {...}
}
[0.0882s]
th> dofile('test.lua')

Testing nn with type = cuda

Running 1 test
1/1 Disparity1D_single .................................................. [WAIT]Illegal instruction (core dumped)`

Runned on: ~/gvnn$ uname -a Linux ip-172-31-46-126 4.4.0-1061-aws #70-Ubuntu SMP Fri May 25 21:47:34 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
With LuaJIT: ~/gvnn$ luajit -v LuaJIT 2.1.0-beta1 -- Copyright (C) 2005-2015 Mike Pall. http://luajit.org/

What is the disparity map output layer?

Hi everyone,

I have the following problem: I am training a network like the one in "Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue". After training, the warped left image looks almost identical to the right image, but when I try to extract the relevant disparity map, the disparity errors seem to be very high (I have the ground truth disparity map)!

This is the structure of the relevant part of my network (I am using nngraph):

local predict_flow0 = concat0
            - nn.SpatialConvolution(8,1,3,3,1,1,1,1) <-- this output I am currently taking as the disparity map
            
local predict_flow0_disp = predict_flow0
            - nn.Transpose({2,3},{3,4})
            - nn.Disparity1DBHWD(height, width)
            - nn.ReverseXYOrder()
  
local input_1 = input
            - nn.Transpose({2,3},{3,4})
            - nn.Narrow(4,1,3)
  
local warp = {input_1,predict_flow0_disp}
            - nn.BilinearSamplerBHWD()
            - nn.Transpose({3,4},{2,3}) <-- this output is very similar to the right image

I would like to note that I am also multiplying the disparity map by the image width before comparing to the ground truth disparity since the output of gvnn is normalized in [-1,1] to my understanding.

Any idea what I might be doing wrong?

Demo or example for 3D data

Hi, Ankur,

Thank you for proving this awesome gvnn code. I am wondering if you could provide an example of the network setting on using 3D data, like 30x30x30 voxel data. I saw the demo you provided is on 2D image, so I am very confused how to use it directly on 3D voxel data. Thank you very much.

Best,
Cindy Guo

How to rotate image of 3D object?

In Transform3DPoints_R and Transform3DPoints_Rt, all Z coordinates are set to 1...
In this setting, it assumes all the image pixles lie on a plane in 3D space, thus these layers can only deal with image of planar scenes, right?

If I want to rotate an image of a 3D object/scene, there should be some way to feed these layers with depth of each coordinate....so , can you give an example to explain how to do this?

Thanks..

Limitation value

The limit of one_minus_cos_div_theta_sqr is 0.5.
Should this line be 0.5?

From

one_minus_cos_div_theta_sqr_tensor[b] = one_minus_cos_div_theta_sqr_tensor[b]:fill(0)	

to

one_minus_cos_div_theta_sqr_tensor[b] = one_minus_cos_div_theta_sqr_tensor[b]:fill(0.5)	

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.