I had been using the CLI version of Somoclu and getting results consistent with other

Tryed <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-

Differing results between R and CLI about somoclu HOT 14 CLOSED

peterwittek commented on July 18, 2024

Differing results between R and CLI

from somoclu.

Comments (14)

xgdgsc commented on July 18, 2024

The kernelType is different? Have you tried both use kernelType=0?

from somoclu.

peterwittek commented on July 18, 2024

The starting learning rate is also different.

from somoclu.

brogie62 commented on July 18, 2024

Copied wrong CLI command. They were both done with l =1. (BTW, I was doing parameter optimization and found that, using gaussian, the starting learning rate has little effect on quant error which I had seen previously.) The CLI run was gpu so kernal is 1. I had previously ran cpu on CLI with kernel = 0 and got identical results to the corresponding gpu run.

from somoclu.

peterwittek commented on July 18, 2024

Actually, if it is compiled without CUDA support, the GPU kernel (=1) falls back to the CPU kernel without saying a word.

In any case, the problem is odd. To comply with CRAN, the random number generator of the R version is the one from <R.h>:

(RAND_MAX * unif_rand())

which should be identical in effect to the rand() function in <cstdlib>. The generated integer random number is than transformed to the [0, 1] interval in both cases. I saw major discrepancies if and only if the data coordinates were not normalized to [0, 1]. So lets get through a couple of basic points:

Is your data normalized?
Did you set the environment variable OMP_NUM_THREADS? It should not have any impact on the actual result, but it is good to know how many cores you are using.
How do you evaluate quantization error?

from somoclu.

brogie62 commented on July 18, 2024

The CLI version was compiled with CUDA support.

Randomization also should not matter as I am using an initial codebook.

The data is not nomalized and is identical in both cases.

I have not made any changes to OMP_NUM_THREADS.

I evaluate quant error by averaging the euclidean distance between each input vector and its BMU using the rdist function in the fields package in R:

weights <- res$codebook
inputs <- dataSource
distMatrix <- rdist(inputs, weights)
result <- t(sapply(seq(nrow(distMatrix)), function(i) {
j <- which.min(distMatrix[i,])
c(distMatrix[i,j])
}))

MinM <- mean(result)

from somoclu.

xgdgsc commented on July 18, 2024

For me the above commit gives me the codebook more similar with the CLI version than before. So it might be related to the wrong handling of column-major matrix when converting array between C and R. Please try if this fixes the issue.

from somoclu.

brogie62 commented on July 18, 2024

I deleted my previous comment with the images. Although they are accurate, I decided I am not ready to have my work in a public forum yet. I hope that is OK.

from somoclu.

brogie62 commented on July 18, 2024

I'll give that commit a try and let you know. Does that commit include the neighborhood function parameter?

from somoclu.

xgdgsc commented on July 18, 2024

That includes the neighborhood function parameter

from somoclu.

brogie62 commented on July 18, 2024

Tryed #26. Gaussian gave quant error of 5.78. Very much in line with CLI. Bubble gave improved quant error of 5.34. With Matlab and kohonen R package I was getting < 5. I may need to optimize parameters.

Thanks

from somoclu.

peterwittek commented on July 18, 2024

The R interface is a bit of a mistreated foster child, as we are inexperienced with it. Thanks for pointing out this bug.

from somoclu.

xgdgsc commented on July 18, 2024

The fix is on CRAN now.

from somoclu.

peterwittek commented on July 18, 2024

Thanks very much. I did some overdue clean up and tagged version 1.5.1. The update is released on MLOSS and GitHub. Please update PyPI.

from somoclu.

xgdgsc commented on July 18, 2024

OK. Just uploaded source to PyPI, will build the binaries later.

from somoclu.

Differing results between R and CLI about somoclu HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent