huguesthomas / kpconv Goto Github PK
View Code? Open in Web Editor NEWKernel Point Convolutions
License: MIT License
Kernel Point Convolutions
License: MIT License
Hello, thank you for your sharing, I really appreciate your KPConv work and I am working on implement it in PyTorch. But I was confused when I try to understand the code of data preparation part because there are so many encapsulations or wrappers. So would you please give me some high level idea about the implementation. Basically my questions are
flat_inputs
variable : In my understanding, you build the flat_inputs
which consists of the points and its neighbors for each layer before session run. But how can you calculate the neighbors of each layer before the specific point cloud is sent into the network?Traceback (most recent call last):
File "training_ModelNet40.py", line 213, in
model = KernelPointCNN(dataset.flat_inputs, config)
File "/home/liangpan/KPConv/models/KPCNN_model.py", line 103, in init
self.dropout_prob)
File "/home/liangpan/KPConv/models/network_blocks.py", line 1065, in assemble_CNN_blocks
training)
File "/home/liangpan/KPConv/models/network_blocks.py", line 414, in resnetb_deformable_block
config)
File "/home/liangpan/KPConv/models/network_blocks.py", line 122, in KPConv_deformable
modulated=config.modulated)
File "/home/liangpan/KPConv/kernels/convolution_ops.py", line 370, in KPConv_deformable
aggregation_mode)
File "/home/liangpan/KPConv/kernels/convolution_ops.py", line 438, in KPConv_deform_ops
new_neighbors_indices = tf.batch_gather(neighbors_indices, new_neighb_inds)
AttributeError: module 'tensorflow' has no attribute 'batch_gather'
Hi, @HuguesTHOMAS ,
Firstly, thanks for your great work on KPConv. Here I have met some problems when I run 'training_S3DIS.py'. The error information is below:
Traceback (most recent call last):
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,3], [?,3], [?,3], [?,3], [?,3], ..., [?], [?,3], [?,3,3], [?], [?]], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: optimizer/gradients/KernelPointNetwork/layer_0/resnetb_1/conv2/concat_1_grad/GatherV2_2/axis/_222 = _HostSendT=DT_INT32, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1469_...rV2_2/axis", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hwk/KPConv/utils/trainer.py", line 261, in train
_, L_out, L_reg, L_p, probs, labels, acc = self.sess.run(ops, {model.dropout_prob: 0.5})
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,3], [?,3], [?,3], [?,3], [?,3], ..., [?], [?,3], [?,3,3], [?], [?]], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: optimizer/gradients/KernelPointNetwork/layer_0/resnetb_1/conv2/concat_1_grad/GatherV2_2/axis/_222 = _HostSendT=DT_INT32, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1469_...rV2_2/axis", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Caused by op 'IteratorGetNext', defined at:
File "training_S3DIS.py", line 213, in
dataset.init_input_pipeline(config)
File "/home/hwk/KPConv/datasets/common.py", line 749, in init_input_pipeline
self.flat_inputs = iter.get_next()
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 410, in get_next
name=name)), self._output_types,
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2069, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/hwk/anaconda3/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()
OutOfRangeError (see above for traceback): End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,3], [?,3], [?,3], [?,3], [?,3], ..., [?], [?,3], [?,3,3], [?], [?]], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: optimizer/gradients/KernelPointNetwork/layer_0/resnetb_1/conv2/concat_1_grad/GatherV2_2/axis/_222 = _HostSendT=DT_INT32, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1469_...rV2_2/axis", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "training_S3DIS.py", line 244, in
trainer.train(model, dataset)
File "/home/hwk/KPConv/utils/trainer.py", line 347, in train
self.cloud_validation_error(model, dataset)
File "/home/hwk/KPConv/utils/trainer.py", line 806, in cloud_validation_error
preds = (sub_preds[dataset.validation_proj[i_val]]).astype(np.int32)
IndexError: arrays used as indices must be of integer (or boolean) type
I am looking forward to your reply.
Hi, @HuguesTHOMAS
When I run python training_Semantic3D.py
, I got the following error:
Initiating input pipelines
2019-05-22 21:55:09.646196: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-05-22 21:55:09.810266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 980 Ti major: 5 minor: 2 memoryClockRate(GHz): 1.2405
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 5.10GiB
2019-05-22 21:55:09.810299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0, compute capability: 5.2)
Calib Neighbors 00000000 : timings 1386.89 21.57
Calib Neighbors 00000003 : timings 759.88 20.74
Calib Neighbors 00000011 : timings 131.92 18.18
Traceback (most recent call last):
File "training_Semantic3D.py", line 213, in <module>
dataset.init_input_pipeline(config)
File "/data/code9/KPConv/datasets/common.py", line 710, in init_input_pipeline
gen_function, gen_types, gen_shapes = self.get_batch_gen('training', config)
File "/data/code9/KPConv/datasets/Semantic3D.py", line 457, in get_batch_gen
random_pick_n = int(np.ceil(epoch_n / (self.num_training * (config.num_classes))))
ZeroDivisionError: division by zero
Any hints to fix this problem?
THX!
Hi, @HuguesTHOMAS
First of all - great work, and extremely well documented!
After training your model using the ModelNet40 config, I am doing some testing on new dataset (the remarks and readme were very helpful).
I receive 2 different outputs for the same point cloud as input to the network (using ModelNet40 config, deformable conv, output is the layer before the classification head).
The differences are significant in values (up to 0.25 of the vectors Norm)
run command is
sess.run(ops, {model.dropout_prob: 1.0})
Is it a normal behavior of the model? is any part of the computation is non-deterministic?
Thanks,
Ran
Hi @HuguesTHOMAS
When I run the training_ShapeNetPart.py
script, the error below occurred many times, and the program stopped.
And I have followed your INSTALL.md
operations.
Step 00009033 L_out=0.121 L_reg=0.080 L_p=0.282 Acc=0.96 --- 314.13 ms/batch (Averaged) 2019-10-22 11:18:28.785661: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 2019-10-22 11:18:28.785715: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1 Aborted (core dumped)
Can you help me to solve this problem?
Hi,
I am trying to use your model on a dataset with various size frames. I faced two main issues:
I am still searching for a more powerful computer.
Maybe you can try this dataset, :)
Hi,
Thank you for your work, your paper and this project are really interesting.
I am testing the cloud segmentation on some custom dataset.
I encountered some indexing errors related to reprojection.
Branch Master: 5f9ceca
I had the following error:
TypeError: list indices must be integers or slices, not tuple
With the following lines:
probs = self.test_probs[i_test][dataset.test_proj[i_test], :]
pots = dataset.potentials['test'][i_test][dataset.test_proj[i_test]]
preds = (sub_preds[dataset.validation_proj[i_val]]).astype(np.int32)
I fixed the errors with the following corresponding code:
probs = np.zeros(
(points.shape[0],
self.test_probs[i_test].shape[1]),
dtype=self.test_probs[i_test].dtype)
for pi, pv in enumerate(self.test_probs[i_test]):
probs[dataset.test_proj[i_test][pi]] = pv
pots = np.zeros(
(points.shape[0],),
dtype=dataset.potentials['test'][i_test].dtype)
for pi, pv in enumerate(dataset.potentials['test'][i_test]):
pots[dataset.test_proj[i_test][pi]] = pv
preds = np.zeros(labels.shape, dtype=np.int32)
for si, sp in enumerate(sub_preds):
preds[dataset.validation_proj[i_val][si]] = sp
Could you confirm I have done the right thing?
Thank you for your repo.
Could you please provide me with the script and the pre-trained model for scene segmentation on ScanNet?
BTW, about the instructions for Ubuntu 18.04. I don't have to remove -D_GLIBCXX_USE_CXX11_ABI=0
flag. Instead, I need to make minor changes to cpp_wrappers/cpp_utils/cloud/cloud.h
by
<cmath>
headerreturn PointXYZ(floor(P.x), floor(P.y), floor(P.z));
return PointXYZ(std::floor(P.x), std::floor(P.y), std::floor(P.z));
Before I run the training_Semantic3D.py
script, I modified the Semantic3D.py
script like below:
# Path of the folder containing ply files
self.path = 'Data/Semantic3D'
# Original data path
self.original_folder = 'original_data'
# Path of the training files
# self.train_path = join(self.path, 'ply_subsampled/train')
# self.test_path = join(self.path, 'ply_subsampled/reduced-8')
self.train_path = join(self.path, self.original_folder)
self.test_path = join(self.path, self.original_folder)
# self.test_path = join(self.path, 'ply_subsampled/semantic-8')
Then I run the training_Semantic3D.py
script, The program reports an error as follows :
Calib Neighbors 00000000 : timings 20936.74 18.95
Calib Neighbors 00000001 : timings 3209.54 19.80
Calib Neighbors 00000002 : timings 2743.98 13.21
Calib Neighbors 00000005 : timings 18005.47 21.91
Calib Neighbors 00000008 : timings 2674.94 16.44
Calib Neighbors 00000009 : timings 5039.95 15.20
Calib Neighbors 00000011 : timings 3139.28 17.42
Calib Neighbors 00000013 : timings 2356.07 16.96
Traceback (most recent call last):
File "training_Semantic3D.py", line 213, in <module>
dataset.init_input_pipeline(config)
File "/disk/tia/tia/KPConv/datasets/common.py", line 710, in init_input_pipeline
gen_function, gen_types, gen_shapes = self.get_batch_gen('training', config)
File "/disk/tia/tia/KPConv/datasets/Semantic3D.py", line 458, in get_batch_gen
random_pick_n = int(np.ceil(epoch_n / (self.num_training * (config.num_classes))))
ZeroDivisionError: division by zero
It seems that the config.num_classes
is zero, and the value was not overwritten by dataset class when initiating input pipeline.
Can you help me to solve this problem?
Waiting for your reply!
(kpconv) czh@dirac:~/project/KPConv/cpp_wrappers$ sh compile_wrappers.sh
running build_ext
building 'grid_subsampling' extension
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
C compiler: gcc -pthread -B /farm/czh/anaconda3/envs/kpconv/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/cpp_wrappers
creating build/temp.linux-x86_64-3.6/cpp_wrappers/cpp_utils
creating build/temp.linux-x86_64-3.6/cpp_wrappers/cpp_utils/cloud
creating build/temp.linux-x86_64-3.6/grid_subsampling
compile options: '-I/farm/czh/anaconda3/envs/kpconv/lib/python3.6/site-packages/numpy/core/include -I/farm/czh/anaconda3/envs/kpconv/include/python3.6m -c'
extra options: '-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0'
gcc: ../cpp_utils/cloud/cloud.cpp
gcc: grid_subsampling/grid_subsampling.cpp
gcc: wrapper.cpp
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /farm/czh/anaconda3/envs/kpconv/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1830:0,
from /farm/czh/anaconda3/envs/kpconv/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /farm/czh/anaconda3/envs/kpconv/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from wrapper.cpp:2:
/farm/czh/anaconda3/envs/kpconv/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it with "
^~~~~~~
grid_subsampling/grid_subsampling.cpp: In function ‘void grid_subsampling(std::vector&, std::vector&, std::vector&, std::vector&, std::vector&, std::vector&, float, int)’:
grid_subsampling/grid_subsampling.cpp:99:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ldim; i++)
~~^~~~~~
wrapper.cpp: In function ‘PyObject* grid_subsampling_compute(PyObject*, PyObject*, PyObject*)’:
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
static char kwlist[] = {"points", "features", "classes", "sampleDl", "method", "verbose", NULL };
^
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char’ [-Wwrite-strings]
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
wrapper.cpp:70:98: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
g++ -pthread -shared -B /farm/czh/anaconda3/envs/kpconv/compiler_compat -L/farm/czh/anaconda3/envs/kpconv/lib -Wl,-rpath=/farm/czh/anaconda3/envs/kpconv/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/cpp_wrappers/cpp_utils/cloud/cloud.o build/temp.linux-x86_64-3.6/grid_subsampling/grid_subsampling.o build/temp.linux-x86_64-3.6/wrapper.o -o /farm/czh/project/KPConv/cpp_wrappers/cpp_subsampling/grid_subsampling.cpython-36m-x86_64-linux-gnu.so
Hi @HuguesTHOMAS , I am training KPConv with 4 GPUs and 12 CPUs. However when I run my model it uses almost all the CPUs available even when I specify that I want to use only one thread:
cProto = tf.ConfigProto(allow_soft_placement=True,
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1,
device_count={'CPU': 1})
And
# Number of CPU threads for the input pipeline
input_threads = 1
I use perf top
to see what those threads are actually doing and find that most of the cpu is taken by tf_batch_neighbors.so
.
It is really wired because the tf_batch_neighbors is called in tf_classification_inputs
or tf_segmentation_inputs
function, which is used by get_tf_mapping
, so I think setting the input_threads to 1 will limit the thread of this operation to be 1. So I wonder is there any methods to limit the CPU usage of this tf module ?
Hello,
When I tried to operate training using Semantic3D, I got an error like below:
(tensorflow) C:\Users\plati\Desktop\KPConv>python training_Semantic3D.py
Traceback (most recent call last):
File "training_Semantic3D.py", line 39, in
from datasets.Semantic3D import Semantic3DDataset
File "C:\Users\plati\Desktop\KPConv\datasets\Semantic3D.py", line 46, in
from datasets.common import Dataset
File "C:\Users\plati\Desktop\KPConv\datasets\common.py", line 34, in
tf_neighbors_module = tf.load_op_library('tf_custom_ops/tf_neighbors.so')
File "C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\load_library.py", line 60, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: tf_custom_ops\tf_neighbors.so not found
...and I don't know why. How can I solve it?
Hi @HuguesTHOMAS,thanks for releasing the codes. We followed the default configurations and ran the code under TensorFlow 1.11, Cuda9, cudnn7.3
. But we found the final results on S3DIS Area5 was just 60.24(much lower than the reported result 65.4)
I wonder whether we've missed some details during the training. Or does it relate to the version of TensorFlow
?
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[104051,256] and type float
I am using a 6GB memory GTX1060 laptop.
How much graphic memory should I use?
@HuguesTHOMAS Thanks for your sharing the code! I encountered this error :
“ AttributeError: module 'cpp_wrappers.cpp_subsampling.grid_subsampling' has no attribute 'compute' ”
When I run the script Semantic3D.py.
Could you give me some help? Thank you very much!
Thanks for releasing the pretrained model on S3DIS. However, the mean IoU reported by test_any_model.py is only around 57 on area 5 (pretrained model), which is far from 65 reported by the paper.
In addition, we trained the model on s3dis area 5 with the default parameter setting, however, the mean IoU can only achieve around 59.
Hi, thank for your contribution, but I met with this error tensorflow.python.framework.errors_impl.NotFoundError: tf_custom_ops/tf_neighbors.so: undefined symbol: _ZN10tensorflow12OpDefBuilder5InputENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I am using ubuntu 18.04 with anaconda3, cuda-toolkit=9.0, cudnn=7.3.1 tf-gpu=1.12.0
can you help me with that?
thanks
hi @HuguesTHOMAS
Thanks for your work and open-source code!
Under CUDA10.2, Ubuntu 18.04.3, tensorflow 1.12.0, GeForce GTX 1080 Ti, I successfully compiled cpp wrappers and tf_ops by removing the as mentioned tag. However, when I run train_ModelNet.py, everything goes well in first around two epochs, after around 2000 steps, I have the problem of NaN values in loss and acc. I compile tensorflow from source and under the same environment, I compiled other tf-user-ops and there's no problem there.
I want to migrate this to Windows,but I failed.So I want some one could help me.
Hi @HuguesTHOMAS,
There seem to be two minor mistakes in the trainer.py file.
164 for ign_label in model.config.ignored_label_inds:
165 ignored_bool = tf.logical_or(ignored_bool, model.labels == ign_label)
This line should be changed to the way you wrote in other parts of your code.
for ign_label in model.config.ignored_label_inds: ignored_bool = tf.logical_or(ignored_bool, tf.equal(model.labels, ign_label))
The original line will simply output all zeros.
182 self.prob_logits = tf.nn.softmax(new_logits)
This line should be changed to the way you wrote in tester.py.
self.prob_logits = tf.nn.softmax(model.logits)
Otherwise it will be collected by wrong indices in this step.
696 # Eliminate shadow indices b = b[b < max_ind-0.5]
# Get prediction (only for the concerned parts) probs = stacked_probs[b] inds = point_inds[b] c_i = cloud_inds[b_i]
The reason that the original code didn't complain here is that the ignored_bool was all zero.
Thank you again for sharing your work.
Hi, Thank you for your sharing. I read the code about the function KPConv
and KPConv_ops
in convolution_ops.py
and I found that the convolution operation defined in these two functions are for one single point cloud input, so how is the case when I am trying to input a a batch of point cloud since these point cloud fragments may have different number of points ?
Thanks for sharing, this is really impressive work.
I got this running on Ubuntu 18.04 and can share my steps for getting it to work. Maybe you'd like to include these steps in INSTALL.md.
-D_GLIBCXX_USE_CXX11_ABI=0
flag for each line in tf_custom_ops/compile_op.sh
. I think this problem is from Ubuntu 18.04 having a new version of gcc.Hi, @HuguesTHOMAS ,
Would you also release the trained model?
I want to some further experiment on semantic segmentation on S3DIS dataset. Would you release the trained model for S3DIS dataset first?
thx!
Dear @HuguesTHOMAS,
I was wondering if you were going to keep working on improving the model.
I think KPConv could be improved at least with 3 ways.
I could have been (a): g(yi) = sum (g(norm(yi) / R) * h(yi, xk) * Wk) or (b): g(yi) = sum (g(yi)* h(yi, xk) * Wk))
(a): g takes a value between 0 and 1 and performs a guidance on it.
(b): g takes yi directly and does the same
It will be similar to this one, as an analogy for image convolution.
Greetings @HuguesTHOMAS , in my extension of KPConv for partial point cloud completion it is necessary for the point clouds to be padded (via duplicating certain random existing points) in order for them to have a specific number of points.
I am doing this after the grid subsampling preprocess step of the current implementation. My question is: do you believe this will affect the feature extraction of the kernel point convolutions negatively in any way?
Hi, Thanks for your sharing. I have tried your code on my own dataset but the I found that initially everything goes well but after several epochs the training suddenly broke up ( accuracy becomes 1 and the loss becomes 0 ) I use tf 1.12.0 and the cuda version is 9.0, cudnn version is 7.1.4
# conda list | grep tensorflow
tensorflow-estimator 1.13.0 py_0 anaconda
tensorflow-gpu 1.12.0 pypi_0 pypi
tensorflow-tensorboard 0.4.0 pypi_0 pypi
Have you met this kind of problem? Another potential problem is that sometimes the training takes 4400 MB GPU memory (see from nvidia-smi
), but sometimes it takes more than 7000 MB ( and I do not change the batch size and network architecture) I am pretty confused about these problems. Could you give me some advice?
Hi! @HuguesTHOMAS
Thanks for your work and sharing code!I currently work on it and try to get a more clearly understanding of your work. And here are some of my questions after looking through other issues.
convolution_ops.py :190
, you add a fake point shadow_point
to the support_point
. In my view, this shadow_point
play a role like padding if the query_point
doesn't get the enough neighbours, isn't it?first_subsampling_dl
to reduce GPU memory cost. But I only know that this parameter will divide the input point into a grid (which volume is decided by first_subsampling_dl
) . I wonder if I increase first_subsampling_dl
, will the dataset decrease the number of input points and affect the sample rate ?trainer.py :469
, you compute a votes confusion. Could you tell me what 's vote
means and what's the difference between C1 and C3.Hi @HuguesTHOMAS :
Thanks for sharing the code, in order to get the performance reported in the paper, (iou s3dis area5), I change the self.validation_split = 4 in S3DIS.py. But I cann't get the performance reported in the paper (miou 66). Do you have any suggestion? I haven't finished training, but the miou seems not going up.
Step 00089533 L_out=0.049 L_reg=0.033 L_p=0.000 Acc=0.98 --- 709.37 ms/batch (Averaged)
Step 00089535 L_out=0.061 L_reg=0.033 L_p=0.000 Acc=0.98 --- 711.77 ms/batch (Averaged)
Step 00089537 L_out=0.043 L_reg=0.033 L_p=0.000 Acc=0.98 --- 717.27 ms/batch (Averaged)
Step 00089539 L_out=0.058 L_reg=0.033 L_p=0.000 Acc=0.98 --- 721.15 ms/batch (Averaged)
Validation : 0.0% (timings : 165.89 6.77)
Validation : 6.0% (timings : 276.87 20.58)
Validation : 10.0% (timings : 301.69 28.65)
Validation : 16.0% (timings : 306.19 33.44)
Validation : 22.0% (timings : 313.93 38.35)
Validation : 26.0% (timings : 333.55 41.70)
Validation : 30.0% (timings : 350.38 45.12)
Validation : 32.0% (timings : 383.41 46.56)
Validation : 38.0% (timings : 379.15 50.89)
Validation : 42.0% (timings : 389.80 52.16)
Validation : 46.0% (timings : 404.30 56.77)
Validation : 48.0% (timings : 447.05 57.91)
Validation : 54.0% (timings : 432.56 60.22)
Validation : 60.0% (timings : 424.93 60.28)
Validation : 66.0% (timings : 414.28 59.36)
Validation : 72.0% (timings : 399.32 57.45)
S3DIS mean IoU = 56.9%
Hi Hugues, did you use reflectance feature during training in NPM3D dataset? Thank you.
Hi!
Thanks for your wonderful paper and code! I'm currently working on it, trying to reproduce the results reported in your work on Semantic3D dataset. Following are two problems I have encountered recently while reading your code.
1.In common.py, function calibrate_batches. I have no problem understanding the first half of this function. But I have trouble understanding the second part, starting from sum_s = 0. Would you please elaborate the ideas behind?
2.In Semantic3D.py, function spatially_regular_gen, you choose spherical samples from point clouds according to randomly-generated potentials, and update the potentials each round so the potential of center point increases the most. Does this sampling technique have a specific name? Besides, what is the advantage of this center point selection technique compared to others, such as uniform grid selection?
Thank you very much!
Hello, I really appreciate your KPConv work and shared codes. However, I met a error when I run the training_ModelNet50.py
in debug mode. In particular, I change this line to trainer.train(model, dataset, debug_NaN=False)
. And I have the following error, could you help me to solve that? Thanks so much.
Hi, @HuguesTHOMAS ,
I had tested the three visualization code; visualize_ERFs.py, visualize_deformations.py, visualize_features.py. But all of they return the following error:
Segmentation fault (core dumped)
Any hints to fix this issue? Or any other config should I modify?
thx!
Thanks for the great implementation! The paper suggests that KPConv could be well suited for the task of partial point cloud completion. Have you experimented with this type of problem using the KPConv operator in a specific architecture?
Hi, @HuguesTHOMAS
Thank you for sharing this wonderful work !
I am wondering about the datasets' preparation, what exactly the the Semantic3D , ScanNet and NPM3D looks like under the directory /KPConv/Data/
? How to organize them?
Hi @HuguesTHOMAS Sorry for the bothering, have you ever met such kind of problem during training ?
My Configuration is:
CUDA Version 9.0.176, TF1.12, GTX 1080
Like Issue15, this error also occurs randomly during training.
Hi, @HuguesTHOMAS ,
When I run python training_Semantic3D.py
, I got the following error:
Dataset Preparation
*******************
sg27_station10_rgb_intensity-reduced already done
birdfountain_station1_xyz_intensity_rgb already done
castleblatten_station1_intensity_rgb already done
castleblatten_station5_xyz_intensity_rgb already done
marketplacefeldkirch_station1_intensity_rgb already done
marketplacefeldkirch_station4_intensity_rgb already done
MarketplaceFeldkirch_Station4_rgb_intensity-reduced already done
marketplacefeldkirch_station7_intensity_rgb already done
sg28_Station2_rgb_intensity-reduced already done
stgallencathedral_station1_intensity_rgb already done
stgallencathedral_station3_intensity_rgb already done
stgallencathedral_station6_intensity_rgb already done
StGallenCathedral_station6_rgb_intensity-reduced already done
Preparing KDTree for all scenes, subsampled at 0.060
Traceback (most recent call last):
File "training_Semantic3D.py", line 210, in <module>
dataset.load_subsampled_clouds(dl0)
File "/data/code9/KPConv/datasets/Semantic3D.py", line 289, in load_subsampled_clouds
sub_labels = data['class']
ValueError: no field of name class
In line 289 of Semantic3D.py, data is considered as dict. However, when I debug into the code, data is actually a numpy array, and its shape is(1034819,)
. Could you give some suggestions to fix this issue?
THX!
when do you publish some segmentation codes about scannet data?
Hi, Thomas. Thanks for your code sharing. I have two questions hoping for your suggestions:
Hi,
I'm trying to train the network for classification on ModelNet40 and downloaded the specified data. But when I run python training_ModelNet40.py command I get the error
tensorflow.python.framework.errors_impl.NotFoundError: tf_custom_ops/tf_neighbors.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv
although tf_neighbors.so is right there in tf_custom_ops folder
Regards
Hi @HuguesTHOMAS Sorry for the bothersome.
I wonder is it possible to reload the weight that trained in another first_subsampling_dl
parameter ? (like I first trained the network using 3cm voxel and then want to use it in a larger scene with 5cm voxel size?). Another question is when reload the pretrained weight, if I change config.first_subsampling_dl
, will the location of kernel points change accordingly ? I tried it and found it will, but I cann't find the code doing such things, the model reload part is simply done by self.saver.restore
.
Thanks a lot !
Thank you very much for your awesome work.
I saw the instructions for creating a new dataset for training your code on it. However, I would like to use it to perform transfer learning instead. I would like then to use the learned weights of the network but train the input and output layers.
I have trained and tested the Semantic3D model. However, I am not able to actually find where the "model" is in order to freeze the intermediate layers and do the other modifications for using my data. I was wondering if you could give me some advice about this.
Many thanks!
uncomplete code
Thanks for sharing code. Nice work! 👍
I have encountered a problem while compiling. I get following errors (here are just few):
error: could not convert 'P.PointXYZ::x' from 'const float' to 'PointXYZ'
In file included from tf_neighbors/neighbors/neighbors.h:3:0, from tf_neighbors/neighbors/neighbors.cpp:2: tf_neighbors/neighbors/../../cpp_utils/cloud/cloud.h: In function 'PointXYZ floor(PointXYZ)': tf_neighbors/neighbors/../../cpp_utils/cloud/cloud.h:142:26: error: could not convert 'P.PointXYZ::x' from 'const float' to 'PointXYZ' return PointXYZ(floor(P.x), floor(P.y), floor(P.z)); ~~^ tf_neighbors/neighbors/../../cpp_utils/cloud/cloud.h:142:38: error: could not convert 'P.PointXYZ::y' from 'const float' to 'PointXYZ' return PointXYZ(floor(P.x), floor(P.y), floor(P.z)); ~~^ tf_neighbors/neighbors/../../cpp_utils/cloud/cloud.h:142:50: error: could not convert 'P.PointXYZ::z' from 'const float' to 'PointXYZ' return PointXYZ(floor(P.x), floor(P.y), floor(P.z)); ~~^
OS: Ubuntu 18.04.
I tried two versions of g++ compilers:
Other:
python 3.6.7
tensorflow-gpu 1.13.1
Any ideas or suggestions?
Thank you!
Hello,
Hope this post finds you well and thank you for your excellent work.
The newly-edited file Scannet.py seems to put these two lines from function "load_subsampled_clouds" to "init" by mistake.
# List of training and test files
self.train_files = np.sort([join(self.train_path, f) for f in listdir(self.train_path) if f[-4:] == '.ply'])
self.test_files = np.sort([join(self.test_path, f) for f in listdir(self.test_path) if f[-4:] == '.ply'])
This will result in "No such file or directory" error since they are not created yet in the initialization stage.
Dataset Preparation
Traceback (most recent call last):
File "training_Scannet.py", line 207, in
dataset = ScannetDataset(config.input_threads, load_test=False)
File "/home/zeyu/Projects/KPConv/datasets/Scannet.py", line 149, in init
self.train_files = np.sort([join(self.train_path, f) for f in listdir(self.train_path) if f[-4:] == '.ply'])
FileNotFoundError: [Errno 2] No such file or directory: 'Data/Scannet/training_points'
(KPConv) zeyu@taigroup-System-Product-Name:~/Projects/KPConv$ python training_Scannet.py
Another issue is that when I start the training using the previous version of Scannet.py, the preprocessed data is weird. The ply files in the folder "training_meshes" could not be read by meshlab and the ply files in the folder "training_points" are extremely small(with sizes like 7kb). When I visualize the files in the folder "training_points", it seems like that they are all points along only one line. However, there seems to be no problems with the testing files.
Thank you in advance for your time and help.
Wish you good luck~(●'◡'●)
Dear @HuguesTHOMAS,
Thank you very much for your source code. After training, I want to visualize the result as Figure 4 for the S3DIS dataset. However, I can not do it. I only have the result from the visualization of layers.
Please help me to visualize the final result for segmentation like figure 4?
Thank you again!
Hi, @HuguesTHOMAS ,
I had followed pretrained model guide. I had downloaded the provided model for NPM3D dataset, and modify test_any_model.py as :
chosen_log = '/data/code11/KPConv/trained_models/Log_pretrained_NPM3D'
And then run this command: python test_any_model.py
.
However, I got the the model training again and finished after epoch 269
The log is as follows:
Epoch 269, step 222 (timings : 621.59 12.76). min potential = 100.7
Epoch 269, step 224 (timings : 617.27 12.40). min potential = 100.7
Epoch 269, step 226 (timings : 615.46 12.10). min potential = 100.7
Epoch 269, step 228 (timings : 611.76 11.81). min potential = 100.7
Epoch 269, step 230 (timings : 604.60 11.50). min potential = 100.7
Epoch 269, step 232 (timings : 599.63 11.47). min potential = 100.7
Epoch 269, step 234 (timings : 598.45 11.32). min potential = 100.7
Epoch 269, step 236 (timings : 602.64 11.45). min potential = 100.7
Epoch 269, end. Min potential = 100.7
[114.3945913653916, 114.18811528408176, 113.78857637790165]
Saving clouds
Reproject Vote #100
Done in 339.9 s
How could I get the direct testing result for NPM3D dataset by modifying test_any_model.py
?
THX!
Have you tested KPConv only in tf 1.12?Which cuda version do you use?
I also had a NAN problem when training npm data. My environment is tf 1.8 (cuda 9.2). Can you give me some advice?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.