Coder Social home page Coder Social logo

kpconv's People

Contributors

huguesthomas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kpconv's Issues

Question related to Input Preparation

Hello, thank you for your sharing, I really appreciate your KPConv work and I am working on implement it in PyTorch. But I was confused when I try to understand the code of data preparation part because there are so many encapsulations or wrappers. So would you please give me some high level idea about the implementation. Basically my questions are

  1. About the flat_inputs variable : In my understanding, you build the flat_inputs which consists of the points and its neighbors for each layer before session run. But how can you calculate the neighbors of each layer before the specific point cloud is sent into the network?
  2. For the convolutional part, I already know how to build the matrix and calculate the output feature for the batch input, but before conv I need know the neighborhood indices of each point, I have no idea how to calculate the neighbor for each point cloud in parallel, obviously if I try to calculate the neighbor indices one by one and then concatenation it will be very time comsuing. So in your implementation, how do you solve this problem ?

ubuntu16.0+cuda8.0+cudnn6.0 training ModelNet40

Traceback (most recent call last):
File "training_ModelNet40.py", line 213, in
model = KernelPointCNN(dataset.flat_inputs, config)
File "/home/liangpan/KPConv/models/KPCNN_model.py", line 103, in init
self.dropout_prob)
File "/home/liangpan/KPConv/models/network_blocks.py", line 1065, in assemble_CNN_blocks
training)
File "/home/liangpan/KPConv/models/network_blocks.py", line 414, in resnetb_deformable_block
config)
File "/home/liangpan/KPConv/models/network_blocks.py", line 122, in KPConv_deformable
modulated=config.modulated)
File "/home/liangpan/KPConv/kernels/convolution_ops.py", line 370, in KPConv_deformable
aggregation_mode)
File "/home/liangpan/KPConv/kernels/convolution_ops.py", line 438, in KPConv_deform_ops
new_neighbors_indices = tf.batch_gather(neighbors_indices, new_neighb_inds)
AttributeError: module 'tensorflow' has no attribute 'batch_gather'

Question about runnning 'training_S3DIS.py '

Hi, @HuguesTHOMAS ,
Firstly, thanks for your great work on KPConv. Here I have met some problems when I run 'training_S3DIS.py'. The error information is below:

Traceback (most recent call last):
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,3], [?,3], [?,3], [?,3], [?,3], ..., [?], [?,3], [?,3,3], [?], [?]], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: optimizer/gradients/KernelPointNetwork/layer_0/resnetb_1/conv2/concat_1_grad/GatherV2_2/axis/_222 = _HostSendT=DT_INT32, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1469_...rV2_2/axis", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hwk/KPConv/utils/trainer.py", line 261, in train
_, L_out, L_reg, L_p, probs, labels, acc = self.sess.run(ops, {model.dropout_prob: 0.5})
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,3], [?,3], [?,3], [?,3], [?,3], ..., [?], [?,3], [?,3,3], [?], [?]], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: optimizer/gradients/KernelPointNetwork/layer_0/resnetb_1/conv2/concat_1_grad/GatherV2_2/axis/_222 = _HostSendT=DT_INT32, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1469_...rV2_2/axis", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Caused by op 'IteratorGetNext', defined at:
File "training_S3DIS.py", line 213, in
dataset.init_input_pipeline(config)
File "/home/hwk/KPConv/datasets/common.py", line 749, in init_input_pipeline
self.flat_inputs = iter.get_next()
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 410, in get_next
name=name)), self._output_types,
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2069, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

OutOfRangeError (see above for traceback): End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,3], [?,3], [?,3], [?,3], [?,3], ..., [?], [?,3], [?,3,3], [?], [?]], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: optimizer/gradients/KernelPointNetwork/layer_0/resnetb_1/conv2/concat_1_grad/GatherV2_2/axis/_222 = _HostSendT=DT_INT32, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1469_...rV2_2/axis", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "training_S3DIS.py", line 244, in
trainer.train(model, dataset)
File "/home/hwk/KPConv/utils/trainer.py", line 347, in train
self.cloud_validation_error(model, dataset)
File "/home/hwk/KPConv/utils/trainer.py", line 806, in cloud_validation_error
preds = (sub_preds[dataset.validation_proj[i_val]]).astype(np.int32)
IndexError: arrays used as indices must be of integer (or boolean) type

I am looking forward to your reply.

division by zero error

Hi, @HuguesTHOMAS

When I run python training_Semantic3D.py, I got the following error:

Initiating input pipelines
2019-05-22 21:55:09.646196: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-05-22 21:55:09.810266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 980 Ti major: 5 minor: 2 memoryClockRate(GHz): 1.2405
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 5.10GiB
2019-05-22 21:55:09.810299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0, compute capability: 5.2)
Calib Neighbors 00000000 : timings 1386.89 21.57
Calib Neighbors 00000003 : timings 759.88 20.74
Calib Neighbors 00000011 : timings 131.92 18.18


Traceback (most recent call last):
  File "training_Semantic3D.py", line 213, in <module>
    dataset.init_input_pipeline(config)
  File "/data/code9/KPConv/datasets/common.py", line 710, in init_input_pipeline
    gen_function, gen_types, gen_shapes = self.get_batch_gen('training', config)
  File "/data/code9/KPConv/datasets/Semantic3D.py", line 457, in get_batch_gen
    random_pick_n = int(np.ceil(epoch_n / (self.num_training * (config.num_classes))))
ZeroDivisionError: division by zero

Any hints to fix this problem?

THX!

Same point cloud input - different numeric results

Hi, @HuguesTHOMAS

First of all - great work, and extremely well documented!

After training your model using the ModelNet40 config, I am doing some testing on new dataset (the remarks and readme were very helpful).
I receive 2 different outputs for the same point cloud as input to the network (using ModelNet40 config, deformable conv, output is the layer before the classification head).
The differences are significant in values (up to 0.25 of the vectors Norm)

run command is
sess.run(ops, {model.dropout_prob: 1.0})

Is it a normal behavior of the model? is any part of the computation is non-deterministic?

Thanks,
Ran

Problems about running training_ShapeNetPart.py

Hi @HuguesTHOMAS
When I run the training_ShapeNetPart.py script, the error below occurred many times, and the program stopped.
And I have followed your INSTALL.md operations.

Step 00009033 L_out=0.121 L_reg=0.080 L_p=0.282 Acc=0.96 --- 314.13 ms/batch (Averaged) 2019-10-22 11:18:28.785661: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 2019-10-22 11:18:28.785715: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1 Aborted (core dumped)

Can you help me to solve this problem?

Resource allocation

Hi,
I am trying to use your model on a dataset with various size frames. I faced two main issues:

  • Is there an easy way to use multiple GPUs to train using your code? (The parallelism of training on different batches and averaging gradients on multiple GPUs at the same time)
  • The model crashes on random epochs in larger models due to shortage of GPU memory allocation (I assume this is because frames have different sizes and hence, the batches will not be fixed size). Is there a way to have the model skip those batches during training to avoid crashes?
    Thanks a lot.

Cloud segmentation, reprojection indexing errors

Hi,

Thank you for your work, your paper and this project are really interesting.

I am testing the cloud segmentation on some custom dataset.
I encountered some indexing errors related to reprojection.
Branch Master: 5f9ceca

I had the following error:

TypeError: list indices must be integers or slices, not tuple

With the following lines:

  • utils/tester.py:772
probs = self.test_probs[i_test][dataset.test_proj[i_test], :]
  • utils/tester.py:784
pots = dataset.potentials['test'][i_test][dataset.test_proj[i_test]]
  • utils/trainer.py:806
preds = (sub_preds[dataset.validation_proj[i_val]]).astype(np.int32)

I fixed the errors with the following corresponding code:

  • For utils/tester.py:772
probs = np.zeros(
        (points.shape[0],
         self.test_probs[i_test].shape[1]),
        dtype=self.test_probs[i_test].dtype)
for pi, pv in enumerate(self.test_probs[i_test]):
    probs[dataset.test_proj[i_test][pi]] = pv
  • For utils/tester.py:784
pots = np.zeros(
        (points.shape[0],),
        dtype=dataset.potentials['test'][i_test].dtype)
for pi, pv in enumerate(dataset.potentials['test'][i_test]):
    pots[dataset.test_proj[i_test][pi]] = pv
  • For utils/trainer.py:806
preds = np.zeros(labels.shape, dtype=np.int32)
for si, sp in enumerate(sub_preds):
    preds[dataset.validation_proj[i_val][si]] = sp

Could you confirm I have done the right thing?

Scene Segmentation on ScanNet

Thank you for your repo.

Could you please provide me with the script and the pre-trained model for scene segmentation on ScanNet?

BTW, about the instructions for Ubuntu 18.04. I don't have to remove -D_GLIBCXX_USE_CXX11_ABI=0 flag. Instead, I need to make minor changes to cpp_wrappers/cpp_utils/cloud/cloud.h by

  1. add <cmath> header
  2. line 141
    return PointXYZ(floor(P.x), floor(P.y), floor(P.z));
    to return PointXYZ(std::floor(P.x), std::floor(P.y), std::floor(P.z));

Errors when running the training_Semantic3D.py script

Hi @HuguesTHOMAS

Before I run the training_Semantic3D.py script, I modified the Semantic3D.py script like below:

   # Path of the folder containing ply files
    self.path = 'Data/Semantic3D'

    # Original data path
    self.original_folder = 'original_data'

    # Path of the training files
    # self.train_path = join(self.path, 'ply_subsampled/train')
    # self.test_path = join(self.path, 'ply_subsampled/reduced-8')
    self.train_path = join(self.path, self.original_folder)
    self.test_path = join(self.path, self.original_folder)
    # self.test_path = join(self.path, 'ply_subsampled/semantic-8')

Then I run the training_Semantic3D.py script, The program reports an error as follows :

Calib Neighbors 00000000 : timings 20936.74 18.95
Calib Neighbors 00000001 : timings 3209.54 19.80
Calib Neighbors 00000002 : timings 2743.98 13.21
Calib Neighbors 00000005 : timings 18005.47 21.91
Calib Neighbors 00000008 : timings 2674.94 16.44
Calib Neighbors 00000009 : timings 5039.95 15.20
Calib Neighbors 00000011 : timings 3139.28 17.42
Calib Neighbors 00000013 : timings 2356.07 16.96


Traceback (most recent call last):
  File "training_Semantic3D.py", line 213, in <module>
    dataset.init_input_pipeline(config)
  File "/disk/tia/tia/KPConv/datasets/common.py", line 710, in init_input_pipeline
    gen_function, gen_types, gen_shapes = self.get_batch_gen('training', config)
  File "/disk/tia/tia/KPConv/datasets/Semantic3D.py", line 458, in get_batch_gen
    random_pick_n = int(np.ceil(epoch_n / (self.num_training * (config.num_classes))))
ZeroDivisionError: division by zero

It seems that the config.num_classes is zero, and the value was not overwritten by dataset class when initiating input pipeline.
Can you help me to solve this problem?
Waiting for your reply!

problem about ”sh compile_wrappers.sh ”

(kpconv) czh@dirac:~/project/KPConv/cpp_wrappers$ sh compile_wrappers.sh
running build_ext
building 'grid_subsampling' extension
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
C compiler: gcc -pthread -B /farm/czh/anaconda3/envs/kpconv/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/cpp_wrappers
creating build/temp.linux-x86_64-3.6/cpp_wrappers/cpp_utils
creating build/temp.linux-x86_64-3.6/cpp_wrappers/cpp_utils/cloud
creating build/temp.linux-x86_64-3.6/grid_subsampling
compile options: '-I/farm/czh/anaconda3/envs/kpconv/lib/python3.6/site-packages/numpy/core/include -I/farm/czh/anaconda3/envs/kpconv/include/python3.6m -c'
extra options: '-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0'
gcc: ../cpp_utils/cloud/cloud.cpp
gcc: grid_subsampling/grid_subsampling.cpp
gcc: wrapper.cpp
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /farm/czh/anaconda3/envs/kpconv/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1830:0,
from /farm/czh/anaconda3/envs/kpconv/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /farm/czh/anaconda3/envs/kpconv/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from wrapper.cpp:2:
/farm/czh/anaconda3/envs/kpconv/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it with "
^~~~~~~
grid_subsampling/grid_subsampling.cpp: In function ‘void grid_subsampling(std::vector&, std::vector&, std::vector&, std::vector&, std::vector&, std::vector&, float, int)’:
grid_subsampling/grid_subsampling.cpp:99:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ldim; i++)
~~^~~~~~
wrapper.cpp: In function ‘PyObject* grid_subsampling_compute(PyObject*, PyObject*, PyObject*)’:
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
static char kwlist[] = {"points", "features", "classes", "sampleDl", "method", "verbose", NULL };
^
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char
’ [-Wwrite-strings]
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
g++ -pthread -shared -B /farm/czh/anaconda3/envs/kpconv/compiler_compat -L/farm/czh/anaconda3/envs/kpconv/lib -Wl,-rpath=/farm/czh/anaconda3/envs/kpconv/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/cpp_wrappers/cpp_utils/cloud/cloud.o build/temp.linux-x86_64-3.6/grid_subsampling/grid_subsampling.o build/temp.linux-x86_64-3.6/wrapper.o -o /farm/czh/project/KPConv/cpp_wrappers/cpp_subsampling/grid_subsampling.cpython-36m-x86_64-linux-gnu.so

CPU usage during training.

Hi @HuguesTHOMAS , I am training KPConv with 4 GPUs and 12 CPUs. However when I run my model it uses almost all the CPUs available even when I specify that I want to use only one thread:

            cProto = tf.ConfigProto(allow_soft_placement=True, 
                                    intra_op_parallelism_threads=1, 
                                    inter_op_parallelism_threads=1, 
                                    device_count={'CPU': 1})

And

    # Number of CPU threads for the input pipeline
    input_threads = 1

I use perf top to see what those threads are actually doing and find that most of the cpu is taken by tf_batch_neighbors.so.

屏幕快照 2019-09-18 下午9 33 34

It is really wired because the tf_batch_neighbors is called in tf_classification_inputs or tf_segmentation_inputs function, which is used by get_tf_mapping, so I think setting the input_threads to 1 will limit the thread of this operation to be 1. So I wonder is there any methods to limit the CPU usage of this tf module ?

Problem during training using Semantic3D

Hello,
When I tried to operate training using Semantic3D, I got an error like below:

(tensorflow) C:\Users\plati\Desktop\KPConv>python training_Semantic3D.py
Traceback (most recent call last):
File "training_Semantic3D.py", line 39, in
from datasets.Semantic3D import Semantic3DDataset
File "C:\Users\plati\Desktop\KPConv\datasets\Semantic3D.py", line 46, in
from datasets.common import Dataset
File "C:\Users\plati\Desktop\KPConv\datasets\common.py", line 34, in
tf_neighbors_module = tf.load_op_library('tf_custom_ops/tf_neighbors.so')
File "C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\load_library.py", line 60, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: tf_custom_ops\tf_neighbors.so not found

...and I don't know why. How can I solve it?

Cannot reproduce reported results on S3DIS Area5

Hi @HuguesTHOMAS,thanks for releasing the codes. We followed the default configurations and ran the code under TensorFlow 1.11, Cuda9, cudnn7.3. But we found the final results on S3DIS Area5 was just 60.24(much lower than the reported result 65.4)
image

I wonder whether we've missed some details during the training. Or does it relate to the version of TensorFlow?

ResourceExhaustedError

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[104051,256] and type float

I am using a 6GB memory GTX1060 laptop.

How much graphic memory should I use?

Cannot reproduce the results on s3dis area 5, including pretrained model

Thanks for releasing the pretrained model on S3DIS. However, the mean IoU reported by test_any_model.py is only around 57 on area 5 (pretrained model), which is far from 65 reported by the paper.

image

In addition, we trained the model on s3dis area 5 with the default parameter setting, however, the mean IoU can only achieve around 59.
image

undefined symbol

Hi, thank for your contribution, but I met with this error tensorflow.python.framework.errors_impl.NotFoundError: tf_custom_ops/tf_neighbors.so: undefined symbol: _ZN10tensorflow12OpDefBuilder5InputENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

I am using ubuntu 18.04 with anaconda3, cuda-toolkit=9.0, cudnn=7.3.1 tf-gpu=1.12.0
can you help me with that?
thanks

NaN values

hi @HuguesTHOMAS
Thanks for your work and open-source code!

Under CUDA10.2, Ubuntu 18.04.3, tensorflow 1.12.0, GeForce GTX 1080 Ti, I successfully compiled cpp wrappers and tf_ops by removing the as mentioned tag. However, when I run train_ModelNet.py, everything goes well in first around two epochs, after around 2000 steps, I have the problem of NaN values in loss and acc. I compile tensorflow from source and under the same environment, I compiled other tf-user-ops and there's no problem there.

Minor mistakes in trainer.py

Hi @HuguesTHOMAS,

There seem to be two minor mistakes in the trainer.py file.

  • 164 for ign_label in model.config.ignored_label_inds:
    165 ignored_bool = tf.logical_or(ignored_bool, model.labels == ign_label)
    This line should be changed to the way you wrote in other parts of your code.
    for ign_label in model.config.ignored_label_inds: ignored_bool = tf.logical_or(ignored_bool, tf.equal(model.labels, ign_label))
    The original line will simply output all zeros.

  • 182 self.prob_logits = tf.nn.softmax(new_logits)
    This line should be changed to the way you wrote in tester.py.
    self.prob_logits = tf.nn.softmax(model.logits)
    Otherwise it will be collected by wrong indices in this step.
    696 # Eliminate shadow indices b = b[b < max_ind-0.5]
    # Get prediction (only for the concerned parts) probs = stacked_probs[b] inds = point_inds[b] c_i = cloud_inds[b_i]

The reason that the original code didn't complain here is that the ignored_bool was all zero.

Thank you again for sharing your work.

Question about Convolution ops

Hi, Thank you for your sharing. I read the code about the function KPConv and KPConv_ops in convolution_ops.py and I found that the convolution operation defined in these two functions are for one single point cloud input, so how is the case when I am trying to input a a batch of point cloud since these point cloud fragments may have different number of points ?

Add instructions for compiling with Ubuntu 18.04 -- Anaconda

Thanks for sharing, this is really impressive work.

I got this running on Ubuntu 18.04 and can share my steps for getting it to work. Maybe you'd like to include these steps in INSTALL.md.

  1. Remove the -D_GLIBCXX_USE_CXX11_ABI=0 flag for each line in tf_custom_ops/compile_op.sh. I think this problem is from Ubuntu 18.04 having a new version of gcc.
  2. Also, the specific version of cuDNN you mentioned (cuDNN 7.4) wasn't available on Anaconda, so I found that 7.3.1 works fine with CUDA 9.0 and TF.12. The problem seems to be in 7.5 and 7.6 (and potentially future versions too).

release of trained model

Hi, @HuguesTHOMAS ,

Would you also release the trained model?
I want to some further experiment on semantic segmentation on S3DIS dataset. Would you release the trained model for S3DIS dataset first?

thx!

Reflexion for improvement

Dear @HuguesTHOMAS,

I was wondering if you were going to keep working on improving the model.

I think KPConv could be improved at least with 3 ways.

  1. The convolution definition doesn't use the distance to the central point within its formula.

Screenshot from 2019-11-13 09-03-43

I could have been (a): g(yi) = sum (g(norm(yi) / R) * h(yi, xk) * Wk) or (b): g(yi) = sum (g(yi)* h(yi, xk) * Wk))
(a): g takes a value between 0 and 1 and performs a guidance on it.
(b): g takes yi directly and does the same

It will be similar to this one, as an analogy for image convolution.
Screenshot from 2019-11-13 09-02-00

  1. Sigma could be variable or the density of each point could be used as in

Screenshot from 2019-11-13 09-08-18
Screenshot from 2019-11-13 09-09-05

Effect on convolution of padding point clouds

Greetings @HuguesTHOMAS , in my extension of KPConv for partial point cloud completion it is necessary for the point clouds to be padded (via duplicating certain random existing points) in order for them to have a specific number of points.
I am doing this after the grid subsampling preprocess step of the current implementation. My question is: do you believe this will affect the feature extraction of the kernel point convolutions negatively in any way?

Problems during training.

Hi, Thanks for your sharing. I have tried your code on my own dataset but the I found that initially everything goes well but after several epochs the training suddenly broke up ( accuracy becomes 1 and the loss becomes 0 ) I use tf 1.12.0 and the cuda version is 9.0, cudnn version is 7.1.4

# conda list | grep tensorflow
tensorflow-estimator      1.13.0                     py_0    anaconda
tensorflow-gpu            1.12.0                   pypi_0    pypi
tensorflow-tensorboard    0.4.0                    pypi_0    pypi

Have you met this kind of problem? Another potential problem is that sometimes the training takes 4400 MB GPU memory (see from nvidia-smi), but sometimes it takes more than 7000 MB ( and I do not change the batch size and network architecture) I am pretty confused about these problems. Could you give me some advice?

Question about Input Preparation

Hi! @HuguesTHOMAS
Thanks for your work and sharing code!I currently work on it and try to get a more clearly understanding of your work. And here are some of my questions after looking through other issues.

  1. In convolution_ops.py :190 , you add a fake point shadow_point to the support_point . In my view, this shadow_point play a role like padding if the query_point doesn't get the enough neighbours, isn't it?
  2. In issue #14 ,you mention that we can increase parameter first_subsampling_dl to reduce GPU memory cost. But I only know that this parameter will divide the input point into a grid (which volume is decided by first_subsampling_dl) . I wonder if I increase first_subsampling_dl, will the dataset decrease the number of input points and affect the sample rate ?
  3. According to the trainer.py :469 , you compute a votes confusion. Could you tell me what 's vote means and what's the difference between C1 and C3.
    Sorry to bother you with so many questions, but I really admire your wonderful work.
    Thanks in advance.

performance isuue

Hi @HuguesTHOMAS :
Thanks for sharing the code, in order to get the performance reported in the paper, (iou s3dis area5), I change the self.validation_split = 4 in S3DIS.py. But I cann't get the performance reported in the paper (miou 66). Do you have any suggestion? I haven't finished training, but the miou seems not going up.

Step 00089533 L_out=0.049 L_reg=0.033 L_p=0.000 Acc=0.98 --- 709.37 ms/batch (Averaged)
Step 00089535 L_out=0.061 L_reg=0.033 L_p=0.000 Acc=0.98 --- 711.77 ms/batch (Averaged)
Step 00089537 L_out=0.043 L_reg=0.033 L_p=0.000 Acc=0.98 --- 717.27 ms/batch (Averaged)
Step 00089539 L_out=0.058 L_reg=0.033 L_p=0.000 Acc=0.98 --- 721.15 ms/batch (Averaged)
Validation : 0.0% (timings : 165.89 6.77)
Validation : 6.0% (timings : 276.87 20.58)
Validation : 10.0% (timings : 301.69 28.65)
Validation : 16.0% (timings : 306.19 33.44)
Validation : 22.0% (timings : 313.93 38.35)
Validation : 26.0% (timings : 333.55 41.70)
Validation : 30.0% (timings : 350.38 45.12)
Validation : 32.0% (timings : 383.41 46.56)
Validation : 38.0% (timings : 379.15 50.89)
Validation : 42.0% (timings : 389.80 52.16)
Validation : 46.0% (timings : 404.30 56.77)
Validation : 48.0% (timings : 447.05 57.91)
Validation : 54.0% (timings : 432.56 60.22)
Validation : 60.0% (timings : 424.93 60.28)
Validation : 66.0% (timings : 414.28 59.36)
Validation : 72.0% (timings : 399.32 57.45)
S3DIS mean IoU = 56.9%

NPM3D dataset

Hi Hugues, did you use reflectance feature during training in NPM3D dataset? Thank you.

Question about sampling strategy and size limit

Hi!
Thanks for your wonderful paper and code! I'm currently working on it, trying to reproduce the results reported in your work on Semantic3D dataset. Following are two problems I have encountered recently while reading your code.
1.In common.py, function calibrate_batches. I have no problem understanding the first half of this function. But I have trouble understanding the second part, starting from sum_s = 0. Would you please elaborate the ideas behind?
2.In Semantic3D.py, function spatially_regular_gen, you choose spherical samples from point clouds according to randomly-generated potentials, and update the potentials each round so the potential of center point increases the most. Does this sampling technique have a specific name? Besides, what is the advantage of this center point selection technique compared to others, such as uniform grid selection?
Thank you very much!

Error in debug block

Hello, I really appreciate your KPConv work and shared codes. However, I met a error when I run the training_ModelNet50.py in debug mode. In particular, I change this line to trainer.train(model, dataset, debug_NaN=False). And I have the following error, could you help me to solve that? Thanks so much.
image

question about the visualization code

Hi, @HuguesTHOMAS ,

I had tested the three visualization code; visualize_ERFs.py, visualize_deformations.py, visualize_features.py. But all of they return the following error:

Segmentation fault (core dumped)

Any hints to fix this issue? Or any other config should I modify?

thx!

Cloud completion

Thanks for the great implementation! The paper suggests that KPConv could be well suited for the task of partial point cloud completion. Have you experimented with this type of problem using the KPConv operator in a specific architecture?

NaN error during training

Hi @HuguesTHOMAS Sorry for the bothering, have you ever met such kind of problem during training ?

屏幕快照 2019-10-03 下午10 05 06

My Configuration is:

CUDA Version 9.0.176, TF1.12,  GTX 1080

Like Issue15, this error also occurs randomly during training.

question about running "training_Semantic3D.py"

Hi, @HuguesTHOMAS ,

When I run python training_Semantic3D.py, I got the following error:

Dataset Preparation
*******************
sg27_station10_rgb_intensity-reduced already done

birdfountain_station1_xyz_intensity_rgb already done

castleblatten_station1_intensity_rgb already done

castleblatten_station5_xyz_intensity_rgb already done

marketplacefeldkirch_station1_intensity_rgb already done

marketplacefeldkirch_station4_intensity_rgb already done

MarketplaceFeldkirch_Station4_rgb_intensity-reduced already done

marketplacefeldkirch_station7_intensity_rgb already done

sg28_Station2_rgb_intensity-reduced already done

stgallencathedral_station1_intensity_rgb already done

stgallencathedral_station3_intensity_rgb already done

stgallencathedral_station6_intensity_rgb already done

StGallenCathedral_station6_rgb_intensity-reduced already done


Preparing KDTree for all scenes, subsampled at 0.060
Traceback (most recent call last):
  File "training_Semantic3D.py", line 210, in <module>
    dataset.load_subsampled_clouds(dl0)
  File "/data/code9/KPConv/datasets/Semantic3D.py", line 289, in load_subsampled_clouds
    sub_labels = data['class']
ValueError: no field of name class

In line 289 of Semantic3D.py, data is considered as dict. However, when I debug into the code, data is actually a numpy array, and its shape is(1034819,). Could you give some suggestions to fix this issue?

THX!

gpu memory cost && missing kernels.convolution_ops

Hi, Thomas. Thanks for your code sharing. I have two questions hoping for your suggestions:

  1. Gpu memory cost: to train S3DIS datasets segmentation, for 3247 query points, 40 neighbors, 256 input channels,256 output channels, the theoretical gpu memory should be 3247x40x256x256x4(for float32)=32GB, which exceeds the 11GB capacity of 2080Ti. Is there some tricks to reduce the memory cost?
  2. network_blocks.py imports kernels.convolution_ops, which is absent in current repository. Is this file still under modifying or will not be public?

Unable to run python training_ModelNet40.py with the following error message

Hi,

I'm trying to train the network for classification on ModelNet40 and downloaded the specified data. But when I run python training_ModelNet40.py command I get the error

tensorflow.python.framework.errors_impl.NotFoundError: tf_custom_ops/tf_neighbors.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv
although tf_neighbors.so is right there in tf_custom_ops folder

Regards

Model store and reload.

Hi @HuguesTHOMAS Sorry for the bothersome.

I wonder is it possible to reload the weight that trained in another first_subsampling_dl parameter ? (like I first trained the network using 3cm voxel and then want to use it in a larger scene with 5cm voxel size?). Another question is when reload the pretrained weight, if I change config.first_subsampling_dl, will the location of kernel points change accordingly ? I tried it and found it will, but I cann't find the code doing such things, the model reload part is simply done by self.saver.restore.

Thanks a lot !

Transfer learning

Hi @HuguesTHOMAS

Thank you very much for your awesome work.

I saw the instructions for creating a new dataset for training your code on it. However, I would like to use it to perform transfer learning instead. I would like then to use the learned weights of the network but train the input and output layers.

I have trained and tested the Semantic3D model. However, I am not able to actually find where the "model" is in order to freeze the intermediate layers and do the other modifications for using my data. I was wondering if you could give me some advice about this.

Many thanks!

Compile error: Could not convert const float to PointXYZ

Thanks for sharing code. Nice work! 👍

I have encountered a problem while compiling. I get following errors (here are just few):

error: could not convert 'P.PointXYZ::x' from 'const float' to 'PointXYZ'

In file included from tf_neighbors/neighbors/neighbors.h:3:0, from tf_neighbors/neighbors/neighbors.cpp:2: tf_neighbors/neighbors/../../cpp_utils/cloud/cloud.h: In function 'PointXYZ floor(PointXYZ)': tf_neighbors/neighbors/../../cpp_utils/cloud/cloud.h:142:26: error: could not convert 'P.PointXYZ::x' from 'const float' to 'PointXYZ' return PointXYZ(floor(P.x), floor(P.y), floor(P.z)); ~~^ tf_neighbors/neighbors/../../cpp_utils/cloud/cloud.h:142:38: error: could not convert 'P.PointXYZ::y' from 'const float' to 'PointXYZ' return PointXYZ(floor(P.x), floor(P.y), floor(P.z)); ~~^ tf_neighbors/neighbors/../../cpp_utils/cloud/cloud.h:142:50: error: could not convert 'P.PointXYZ::z' from 'const float' to 'PointXYZ' return PointXYZ(floor(P.x), floor(P.y), floor(P.z)); ~~^

OS: Ubuntu 18.04.
I tried two versions of g++ compilers:

  • g++-5 5.5.0 20171010
  • g++ 7.3.0

Other:
python 3.6.7
tensorflow-gpu 1.13.1

Any ideas or suggestions?

Thank you!

Issues regarding newly-edited Scannet.py

Hello,
Hope this post finds you well and thank you for your excellent work.
The newly-edited file Scannet.py seems to put these two lines from function "load_subsampled_clouds" to "init" by mistake.

    # List of training and test files
    self.train_files = np.sort([join(self.train_path, f) for f in listdir(self.train_path) if f[-4:] == '.ply'])
    self.test_files = np.sort([join(self.test_path, f) for f in listdir(self.test_path) if f[-4:] == '.ply'])

This will result in "No such file or directory" error since they are not created yet in the initialization stage.

Dataset Preparation


Traceback (most recent call last):
File "training_Scannet.py", line 207, in
dataset = ScannetDataset(config.input_threads, load_test=False)
File "/home/zeyu/Projects/KPConv/datasets/Scannet.py", line 149, in init
self.train_files = np.sort([join(self.train_path, f) for f in listdir(self.train_path) if f[-4:] == '.ply'])
FileNotFoundError: [Errno 2] No such file or directory: 'Data/Scannet/training_points'
(KPConv) zeyu@taigroup-System-Product-Name:~/Projects/KPConv$ python training_Scannet.py

Another issue is that when I start the training using the previous version of Scannet.py, the preprocessed data is weird. The ply files in the folder "training_meshes" could not be read by meshlab and the ply files in the folder "training_points" are extremely small(with sizes like 7kb). When I visualize the files in the folder "training_points", it seems like that they are all points along only one line. However, there seems to be no problems with the testing files.
Screenshot from 2019-10-04 13-47-51
Screenshot from 2019-10-04 13-48-52
Thank you in advance for your time and help.
Wish you good luck~(●'◡'●)

question about testing NPM3D dataset by using test_any_model.py

Hi, @HuguesTHOMAS ,

I had followed pretrained model guide. I had downloaded the provided model for NPM3D dataset, and modify test_any_model.py as :

chosen_log = '/data/code11/KPConv/trained_models/Log_pretrained_NPM3D'

And then run this command: python test_any_model.py.
However, I got the the model training again and finished after epoch 269
The log is as follows:

Epoch 269, step 222 (timings : 621.59 12.76). min potential = 100.7
Epoch 269, step 224 (timings : 617.27 12.40). min potential = 100.7
Epoch 269, step 226 (timings : 615.46 12.10). min potential = 100.7
Epoch 269, step 228 (timings : 611.76 11.81). min potential = 100.7
Epoch 269, step 230 (timings : 604.60 11.50). min potential = 100.7
Epoch 269, step 232 (timings : 599.63 11.47). min potential = 100.7
Epoch 269, step 234 (timings : 598.45 11.32). min potential = 100.7
Epoch 269, step 236 (timings : 602.64 11.45). min potential = 100.7
Epoch 269, end. Min potential = 100.7
[114.3945913653916, 114.18811528408176, 113.78857637790165]
Saving clouds

Reproject Vote #100
Done in 339.9 s

How could I get the direct testing result for NPM3D dataset by modifying test_any_model.py?

THX!

NAN problem

     Have you tested KPConv only in tf 1.12?Which cuda version do you use? 
     I also had a NAN problem when training npm data. My environment is tf 1.8 (cuda 9.2). Can you give me some advice?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.