Comments (9)
Probably not, unless you are extremely careful about all of the parameters that control learning (starting and finishing learning rate, starting and finishing radius).
from somoclu.
Another thing to keep in mind is that you must have a deterministic initialization of the initial codebook; the default being random. Furthermore, for the radius parameters, only integer values are accepted. If you choose linear cooling, at low level, this is how the value is calculated (this is for both the learning rate and the radius):
float linearCooling(float start, float end, float nEpoch, float epoch) {
float diff = (start - end) / (nEpoch - 1);
return start - (epoch * diff);
}
Here start
is the starting value, end
is the final value, nEpoch
is the total number of training epochs requested, and epoch
is the current epoch.
from somoclu.
Awesome.
Gonna try on this and report my result.
from somoclu.
Actually, I have no idea why the denominator had nEpoch-1
: this will make the training overshoot. Commit f21cf01 fixes this. With this, the following Python code gives identical codebooks:
import somoclu
import numpy as np
data = np.float32(np.random.rand(50, 2))
n_rows, n_columns = 30, 50
som_a = somoclu.Somoclu(n_columns, n_rows, data=data, initialization="pca")
som_a.train(epochs=10, radius0=10, radiusN=1, scale0=0.1, scaleN=0.01)
som_b = somoclu.Somoclu(n_columns, n_rows, data=data, initialization="pca")
som_b.train(epochs=5, radius0=10, radiusN=6, scale0=0.1, scaleN=0.064)
som_b.train(epochs=5, radius0=5, radiusN=1, scale0=0.055, scaleN=0.01)
print(np.any(som_a.codebook != som_b.codebook))
At least most of the time. The single-precision floats allow for some uncertainty to creep in, but this is by design: SOM is a qualitative method.
from somoclu.
What is wrong that I did? and/or How to improve this?
import somoclu
import numpy as np
data = np.float32(np.random.rand(900, 3))
n_rows, n_columns = 30, 50
step_total = 20
checking_point = [3,5,7,13,17,19,step_total]
codebook_trajectory = []
count = 0
som_a = somoclu.Somoclu(n_columns, n_rows, data=data, initialization="pca")
som_a.train(epochs=step_total, radius0=0, radiusN=1, scale0=0.1, scaleN=0.01)
ref = som_a.codebook
def linear_cooling_rate(epoch, start=[np.round(np.minimum(n_rows, n_columns)/2),0.1], end=[1,.01],nEpoch=step_total):
diff = np.subtract(start, end)/nEpoch
new = start - (epoch * diff)
new[0] = np.round(new[0])
return new
for index,interruption in enumerate(checking_point):
if index > 0:
last_checkpoint = checking_point[index-1]
little_steps = checking_point[index]-last_checkpoint
som_b = somoclu.Somoclu(n_columns, n_rows, data=data, initialcodebook=som_b.codebook)
else:
little_steps = interruption
last_checkpoint = 0
som_b = somoclu.Somoclu(n_columns, n_rows, data=data, initialization="pca", initialcodebook=None)
pars_linear = linear_cooling_rate(interruption)
pars_last = linear_cooling_rate(last_checkpoint)
som_b.train(epochs=little_steps, radius0=int(pars_last[0]), radiusN=int(pars_linear[0]), scale0=pars_last[1], scaleN=pars_linear[1])
codebook_trajectory.append(som_b.codebook)
print(np.any(np.around(som_a.codebook.astype(float),decimals=3) != np.around(som_b.codebook.astype(float),decimals=3)))
No much changes:
for codebook in codebook_trajectory:
plt.imshow(codebook[:,:,2])
plt.show()
(Much?) difference:
def rsquare(*vec):
return np.sum(np.power(vec,2))
diff = som_a.codebook-som_b.codebook
r2 = np.array(map(rsquare,diff.reshape(n_rows*n_columns,3)))
plt.imshow(r2.reshape(n_rows,n_columns))
plt.show()
from somoclu.
One obvious thing is that in the next round, you should have the starting radius and learning rate calculated at last step + 1 (see my example). I am uncertain whether the rounding of the radii will affect the result. Also, after the first iteration, it is unnecessary to create the som_b object again and again, although in principle this should have no bearing on the result.
from somoclu.
Really got confused with the number:
radius0=10, radiusN=6
(difference is4
) vs.radius0=5, radiusN=1
(difference is4
). These two differences are consistent as each SOM is trained with5
epochs.scale0=0.1, scaleN=0.064
(difference is0.036
) vs.scale0=0.055, scaleN=0.01
(difference is0.045
). This is a really weird "linear" to me. How you compute0.064
and0.055
? Even more weird, SOMs match with each other under these 2 numbers.
I feel nEpoch-1
is correct as it satisfies the boundary condition 10
to 1
, and 0.1
to 0.01
.
def linear_cooling_rate(epoch, start=[10,.1], end=[1,.01],nEpoch=step_total):
diff = np.subtract(start, end)/(nEpoch-1)
new = start - (epoch * diff)
new[0] = int(new[0])
return new
for i in range(10):
print i,linear_cooling_rate(i)
0 [ 10. 0.1]
1 [ 9. 0.09]
2 [ 8. 0.08]
3 [ 7. 0.07]
4 [ 6. 0.06]
5 [ 5. 0.05]
6 [ 4. 0.04]
7 [ 3. 0.03]
8 [ 2. 0.02]
9 [ 1. 0.01]
But this will not give 0.064
and 0.055
.
from somoclu.
You are right, the denominator should be nEpochs-1
, I reverted that. Then I tried this:
som_b = somoclu.Somoclu(n_columns, n_rows, data=data, initialization="pca")
som_b.train(epochs=5, radius0=10, radiusN=6, scale0=0.1, scaleN=0.06)
som_b.train(epochs=5, radius0=5, radiusN=1, scale0=0.05, scaleN=0.01)
So this gives you the right step size (1 and 0.01 for the radius and the learning rate, respectively). I did a few runs, and som_a
and som_b
seem to be equivalent.
from somoclu.
All right, thanks very much.
from somoclu.
Related Issues (20)
- /home/docker/R/Rsomoclu/libs/Rsomoclu.so: undefined symbol: _ZTI8Snapshot
- Licensing and GPL HOT 6
- MATLAB interfece Batch algorithm HOT 4
- (core dumped) HOT 3
- Attempting to use an MPI routine before initializing MPI HOT 4
- single dimensional clustering
- upgradation problem HOT 1
- errors importing native C library (python 3.8.6 & swig 4.0.1) HOT 4
- update official repos (pypi and conda) HOT 3
- TypeError: train expected 23 arguments, got 22 HOT 2
- Can I set the random seed? HOT 4
- Get bmu of testing data HOT 2
- Numpy requirement in setup.py HOT 5
- conda-forge PackagesNotFoundError HOT 6
- How can I assign a cluster to new data? HOT 1
- Batch mode and learning rate HOT 3
- Can't build wheel with somoclu and pip 23.1 HOT 2
- Warning: the binary library cannot be imported. You cannot train maps, but you can load and analyze ones that you have already saved. If you installed Somoclu with pip on Windows, this typically means missing DLLs. Please refer to the documentation. HOT 4
- linux/arm64 for conda-forge
- About UMatrix visualization (question)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from somoclu.