Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

To <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Reflexion for improvement about kpconv HOT 9 CLOSED

huguesthomas commented on July 17, 2024

Reflexion for improvement

from kpconv.

Comments (9)

HuguesTHOMAS commented on July 17, 2024

Hi @tchaton,

Thank you for your interest in KPConv. I agree with you that the model could still be improved. However, the modification you proposed would not be beneficial in my opinion.

Adding a term depending on the distance to the center?

As one of the kernel points is already forced to be at the center. We already have a weight that depends on this distance. Now, if I look at the "pixel adaptive convolution", I believe they add a dependency to the feature located at the center, and not the distance to the center. in the figure you used, K depends on f(0,0) and not on (0,0). So the equation you propose is totally different from the example in image you gave.

Sigma variable / depending on the input density?

Using a variable sigma has no sense in our convolution definition. Here is why. The input point cloud is subsampled so that its density is uniform. For example, let's imagine all input points are spaced by 1cm approximately. Then sigma is also chosen as 1cm or a bit more. So that the input weights of each kernel point are applied only to a few input points, ideally just to one. This is so that we have the smallest kernel possible like the 3x3 pixel kernels in images. With this kind of kernels any kind of normalization (like the one in Monte Carlo) is useless.

Now that being said. i do not think our work is perfect and there are many ways to look for improvement. If you want my opinion on that, I think one of the most promising direction is the relation between feature aggregation and geometric encoding. Any point convolution can be fed features or a constant value. If you feed a constant value, then the convolution is going to encode the shape of the input and if you feed features, the convolution is going to aggregate them into a more complex feature. In standard network, the first layer does geometric encoding and the next layers feature aggregation. But you could imagine ways to improve that.

Best,
Hugues

from kpconv.

XuyangBai commented on July 17, 2024

Hi @HuguesTHOMAS

Great explanation! I have some ideas on modification 2 (but sorry if my example is not suitable.) . For CAD datasets(like ModelNet40) or very dense indoor scenes(S3DIS, Semantic3D, etc.), it is true that the density is uniform if you use grid subsampling strategy, but if using KPConv for outdoor scenes like KITTI, there is still density variations even under grid subsampling because the regions far away from Lidar center is very sparse. Besides, I think without normalisation on the number of neighbouring points in the local neighbourhood, KPConv might not able to directly be trained and tested on a different first_subsampling_dl.

Best,
Xuyang.

from kpconv.

tchaton commented on July 17, 2024

@HuguesTHOMAS, thanks for your answer.

Density. You might be right, but I am so amazed by tsne/umap ability to perform dimensionality reduction that having a fixed one seemed wierd to me. I was thinking we could use their defintion of the local density or even conditional probalities. Lot of things could be tried. For example: perform the radius search using this adaptative standard deviation from the points to the kernel weights. The weights will be defined locally to every N neighboring points of the center. We could perform nested Relation Shape Encoding. One time for creating the weights, msg for the kernels points to the neighboring points, and a second time to project them on the center of the sphere. Or the weights for the message passing could be using this density. Things to brainstorm there.

I have been working on ECC a lot. The best convolution I got for was perfoming attention first between weights and features, and use the message transformed through an MLP to either perform Hadamard product (similar to Relation Shape w/o attention) or matrice product. It got me 4 points up in miou over s3dis.

from kpconv.

tchaton commented on July 17, 2024

@HuguesTHOMAS, last thing. It would be great to export a much cleaner version for KPConv to pytorch.It would be amazing to have it implemented within pytorch geometric. Do you have any plan doing so ? Or would you like to participate in its implementation ?

NB: I don t think right now we have good enough models for production. Getting around 75-80 miou will allow to automate so much tasks. The current API of KPConv makes it hard, especially with the tensorflow / cuda troubles. And therefore improvement are limited. I think it is the best model out there and it deserved a lean implementation for everyone to use.

from kpconv.

HuguesTHOMAS commented on July 17, 2024

Hi guys and thx for your interest in this work,

I will try to answer all your questions

from kpconv.

HuguesTHOMAS commented on July 17, 2024

To @XuyangBai:
You are right, when dealing with varying densities, we have to take two problems into account:

The high densities: too many points means long computations
The low densities : too few points means no information

And in my answer I only explained how the subsampling deals with the first problem. Now what with the second problem? Lets take the example of the image below (from my first article)

We can see that the density in neighborhood C is too low at the first scale. With only one or two points, you cannot get any information on the scene geometry. If you add normalization, it won't change anything because the information is simply not there, you need to look at bigger scales. Because our network has multiple layers with increasing scales, the geometry of the sparser regions is still going to be detected in the network, just not in the first layer.

In practice, I have two examples to show that our method is very robust to density variations:

On Semantic3D (fixed laser scans with very sparse points far away from the laser), we have this problem and our performance are still very strong.
On SemanticKitti (the example you took) same problem and we still rank amongst the first methods on their benchmark

Eventually, training and testing with different first_subsampling_dl is not a problem since the ratio between convolution radius and grid size is kept the same. I don't think adding normalization would help, as the grid already normalizes the data for a given radius.

from kpconv.

HuguesTHOMAS commented on July 17, 2024

To @tchaton,

I agree with you, a lot could be tried to perform convolutions on point clouds with adaptive neighborhoods. The ideas you describe are similar to the convolutions described in Dynamic Graph CNN for Learning on Point Clouds. They let the network learn its own neighborhood relations and it helps getting long ranged relationships between similar patterns in the point cloud. Using dimension reduction techniques to help such a network to learn the best relations possible would definitely constitute a strong base for an article. Furthermore, such high level operation might need to be placed in the last layers of the network, like deformable KPConv. As you said many things to brainstorm here.

In the case of KPConv, the whole point of the method is to understand that geometric relationship need to be carefully used if you want a spatially consistent convolutions and a "standard way" to convolve point clouds. Adding such enhancement, even with kernel point like KPConv, would just make it a different method in my opinion, which would deserve a different name Because it is not a minor adjustment, but a very big theoretical difference. I can only encourage you to look into such direction and would be very glad to help and even collaborate if you want to.

This lead me to your second message. I totally agree, a clean version of KPConv on pytorch would be great. Unfortunately, I never used pytorch and don't know its subtleties. And the code of KPConv needs a lot of implementation tricks to run efficiently. I would be happy to participate in its implementation, but without help, I won't have the time to do it.

Best,
Hugues

from kpconv.

tchaton commented on July 17, 2024

@HuguesTHOMAS, I believe I add you on Facebook. So we can talk easily about all that.

from kpconv.

XuyangBai commented on July 17, 2024

@HuguesTHOMAS Thanks a lot. Problem brought by varying densities is really an interest point, your explanation make sense to me! I will take a look at your first paper and think it.

Best,
Xuyang.

from kpconv.

Reflexion for improvement about kpconv HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent