Coder Social home page Coder Social logo

about the treemap about randomforest-matlab HOT 31 OPEN

tingliu avatar tingliu commented on May 29, 2024
about the treemap

from randomforest-matlab.

Comments (31)

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
sorry, I use the package in matlab...

Original comment by [email protected] on 26 Sep 2012 at 2:36

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
hi zhang

treemap has the left and right node information for the trees in the forest. 
the variable is used for navigating the tree 

in this code i used treemap to plot the tree
code: 
http://code.google.com/p/randomforest-matlab/issues/attachmentText?id=18&aid=180
001000&name=tutorial_plot_tree.m

relevant information on the treeplotting: 
http://code.google.com/p/randomforest-matlab/issues/detail?id=18&can=1

treemap - stores the tree info. in regression code you have two variables ldau 
and rdau that treemap consists of.
nodestatus = stores whether individual nodes are internal or leaf nodes
nodeclass = the class of the leaf nodes
bestvar = variable that splits the node
xbestsplit = value of the variable that splits the node (> goes to the right 
side, else the left side)

the above variable are all NEEDED for prediction. 

Original comment by abhirana on 26 Sep 2012 at 5:59

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Thank you for your reply.
When I further check the values of the treemap. I found that 
model.treemap(:,tree_num*2) are always zeros. So what do these  zeros stand 
for?  

Original comment by [email protected] on 26 Sep 2012 at 7:10

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
zeros mean that the nodes donot have a daughter. the values map the indices to 
child nodes. so X will mean go to the index model.treemap(X,tree_num*2) to find 
the right child node

btw, i think your condition will happen only if the tree are one sided like 
only growing left or right.

Original comment by abhirana on 26 Sep 2012 at 7:39

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Thanks. But I tried several dataset ,but the index model.treemap(X,tree_num*2) 
are all zeros.And I am quit puzzled about this result.

Original comment by [email protected] on 26 Sep 2012 at 8:25

Attachments:

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
can you send me the model file if possible?

Original comment by abhirana on 26 Sep 2012 at 2:55

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
i know the issue


treemap = [model.treemap(:,tree_num*2-1); model.treemap(:,tree_num*2);];
lDau = treemap(1:2:end);  lDau = lDau(1:num_nodes);
rDau = treemap(2:2:end);  rDau = rDau(1:num_nodes);

two columns of treemap are concatenated and generates lDau and rDau. lDau and 
rDau are alternative.

most trees do not occupy nrnodes (max size) and that is the reason why most 
times the second column is empty

Original comment by abhirana on 26 Sep 2012 at 3:01

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Oh,thanks for you reply.
I guess now I have a better understanding of thr treemap now.
I have another question, when we use the model to predict the testing data,
is there existing a way to find which node in each tree does the testing data 
locate?
Thanks.

Original comment by [email protected] on 27 Sep 2012 at 1:39

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
i guess you are looking for node information

http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/tuto
rial_ClassRF.m#249

http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/clas
sRF_predict.m#26


Original comment by abhirana on 27 Sep 2012 at 1:44

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Hi, when I go through the details. I found that the node is not ntest by ntree 
matrix. In my example, if the number of test is 50, the the node matrix is 50 
times 1.

Original comment by [email protected] on 27 Sep 2012 at 8:14

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
When I try the follwing code
 model = classRF_train(X_trn,Y_trn);
clear test_options
test_options.predict_all = 1;
test_options.proximity = 1;
 [Y_hat, votes, prediction_per_tree, proximity_ts] = classRF_predict(X_tst,model,test_options);
 Then there is an error whcih says
??? Error using ==> classRF_predict
Too many output arguments.

Original comment by [email protected] on 27 Sep 2012 at 8:26

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
hi zhang

are you using the latest svn source if not sync to the svn source. or use this 
download link
http://randomforest-matlab.googlecode.com/issues/attachment?aid=410008000&name=r
f-rev55+-+20+Sep+2012.zip&token=DiBZ0BWzfgmEWFULfd4MDOaKvTo%3A1348784889462 (i 
uploaded it in a different issue 
http://code.google.com/p/randomforest-matlab/issues/detail?id=41&can=1)

Original comment by abhirana on 27 Sep 2012 at 10:30

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Ok.Thank you for you advice. I have update the package now.
I still have a question.In each node, RF find the best spit among the randomly 
selected features. But in the package this is just like a black-nox. 
So is there some methods for me to modify the spilting rule in the RF?

Original comment by [email protected] on 28 Sep 2012 at 3:08

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
yeh, RF splits are based on the CART algorithm splits.

nah, the methods are too much imbued that it might be hard to modify the 
splitting rule in RF. 

take a look into findbestsplit function (reg_Rf.cpp for regression. rfsub.f for 
classification) and search for crit

Original comment by abhirana on 28 Sep 2012 at 6:23

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Hi,abhirana,
In the file of tutorial_Proximity_training_test ,how do you calulte the 
Proximity between the training sample and test sample?
Do we need to find the node information about the testing and training sample. 
And if the located in the same node, then the Proximity between them is added 
by 1.
Then we normolize the Proximity matrix.
Is that the way to calcute the Proximity between the test and train sample?

Original comment by [email protected] on 28 Sep 2012 at 7:36

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
give me half a day. i need to fix a bug in the computeproximity routine. 

if i remember somewhat, proximity is calculated somewhat as you described

Original comment by abhirana on 28 Sep 2012 at 7:39

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
i just added the bug fix in computeproximity routine and its the svn.

if you dont want to redownload the source, just change
line 245 in RF_Class_C\src\rfutils.cpp  (computeProximity)
from (inbag[i] > 0) ^ (inbag[j] > 0) to (inbag[i] > 0) || (inbag[j] > 0)

i guess you are correct. computeProximity calculates the proximity matrix


Original comment by abhirana on 28 Sep 2012 at 7:50

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Ok.But I am not sure why 
 prox:    n x n proximity matrix 
I guess prox should be a length(Y_tst) Times  (length(Y_tst)+length(Y_trn))
where the first length(Y_tst)  times length(Y_tst)  should be the proximity 
between the test samples and the rest be the Proximity between the train and 
test.

Original comment by [email protected] on 28 Sep 2012 at 8:35

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
note that there are two cases described in the tutorial file 
tutorial_proximity_training_test.m

one where training is done and the testing is not aware of the training 
examples. the proximity calculation REQUIRES training example information and 
when that is not available will default to proximity of only the test examples

the second example is what you are looking for
pass test examples and labels into classRF_train.. the returned model will have 
the proximity information
 model2 = classRF_train(X_trn,Y_trn, 2000, 0, extra_options,X_tst,Y_tst);
 model2.proximity_tst

do post a snippet of code if you still have issues.

Original comment by abhirana on 28 Sep 2012 at 8:46

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Hi,abhirana,
Is there some method for us to find the margin for each tree?

Original comment by [email protected] on 28 Sep 2012 at 10:32

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
can you define margin?

Original comment by abhirana on 28 Sep 2012 at 5:55

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
the margin is befined by breiman .
I guess we  can get it from the 'prediction_pre_tree'.
sorry, we can only get it for a collection of trees, not for each tree. 

Original comment by [email protected] on 29 Sep 2012 at 2:27

Attachments:

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
when you use prediction_per_tree you will get a nexample x ntree matrix, so you 
will get it for individual tree predition for each test example

Original comment by abhirana on 29 Sep 2012 at 2:29

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Hi,abhirana.
If I set the extra_optipns.replace=0.
 There are still so many zeros in the model.inbag,which means some samples are still out of bag.Why this happen?
code:load data/twonorm

%modify so that training data is NxD and labels are Nx1, where N=#of
%examples, D=# of features

X = inputs';
Y = outputs;

[N D] =size(X);
%randomly split into 250 examples for training and 50 for testing
randvector = randperm(N);

X_trn = X(randvector(1:250),:);
Y_trn = Y(randvector(1:250));
X_tst = X(randvector(251:end),:);
Y_tst = Y(randvector(251:end));
extra_options.replace = 0 ;

extra_options.keep_inbag = 1; %(Default = 0)
model = classRF_train(X_trn,Y_trn, 100, 4, extra_options);

Original comment by [email protected] on 29 Sep 2012 at 7:09

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
replace will only change the the replacement scheme from with 
replacement(default or 1) to without replacement(0). it doesn't have any effect 
on the number of out bag examples because thats controlled by the sampsize 
variable

if you want to change how many examples you want to sample per tree change the 
sampsize variable

Original comment by abhirana on 29 Sep 2012 at 11:37

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Ok.But what is the default value for the sampsize in your code? Seems it does 
not mention in the tutorial file.

Original comment by [email protected] on 1 Oct 2012 at 7:05

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
randomforests default: sampling N times with replacement from N training 
examples (which are the same as what is done for bagging).

Original comment by abhirana on 1 Oct 2012 at 5:25

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
So ,in this case,if replace=0, why there are so many 0s in the inbag?
I guess the 0 in the inbag means this sample is out of bag.But we have to 
sample N times without replacement.So ervey sample should in the bag.

Original comment by [email protected] on 2 Oct 2012 at 2:09

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Note the sampsize default is .632*N when doing without replacement. That 
proportion is around the same when doing with replacement. So you at having 
same number of out bags both ways

Original comment by abhirana on 2 Oct 2012 at 3:00

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
Ok.  So when with replacement, we sample N from N. If replace=0, we sample 
0.632*N from N without replacement.
I have another question. When we select mtry feature from all the features, 
could we assign a weight vector and select the feature according to their 
weight? If so, where could I change the code?
I can see there are 'mexRF_train' funtion in your code.However, I could not 
find the code for this funtion in the package.
Thanks.

Original comment by [email protected] on 2 Oct 2012 at 3:26

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 29, 2024
you can always change how many examples are being sampled by tweaking sampsize

mexRF_train is compiled from a bunch of files in the src folder. you can find 
the list of files being compiled in compile_windows.m. you will have to modify 
the c/c++ and maybe fortran code to implement that.

Original comment by abhirana on 2 Oct 2012 at 3:47

from randomforest-matlab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.