Comments (31)
sorry, I use the package in matlab...
Original comment by [email protected]
on 26 Sep 2012 at 2:36
from randomforest-matlab.
hi zhang
treemap has the left and right node information for the trees in the forest.
the variable is used for navigating the tree
in this code i used treemap to plot the tree
code:
http://code.google.com/p/randomforest-matlab/issues/attachmentText?id=18&aid=180
001000&name=tutorial_plot_tree.m
relevant information on the treeplotting:
http://code.google.com/p/randomforest-matlab/issues/detail?id=18&can=1
treemap - stores the tree info. in regression code you have two variables ldau
and rdau that treemap consists of.
nodestatus = stores whether individual nodes are internal or leaf nodes
nodeclass = the class of the leaf nodes
bestvar = variable that splits the node
xbestsplit = value of the variable that splits the node (> goes to the right
side, else the left side)
the above variable are all NEEDED for prediction.
Original comment by abhirana
on 26 Sep 2012 at 5:59
from randomforest-matlab.
Thank you for your reply.
When I further check the values of the treemap. I found that
model.treemap(:,tree_num*2) are always zeros. So what do these zeros stand
for?
Original comment by [email protected]
on 26 Sep 2012 at 7:10
from randomforest-matlab.
zeros mean that the nodes donot have a daughter. the values map the indices to
child nodes. so X will mean go to the index model.treemap(X,tree_num*2) to find
the right child node
btw, i think your condition will happen only if the tree are one sided like
only growing left or right.
Original comment by abhirana
on 26 Sep 2012 at 7:39
from randomforest-matlab.
Thanks. But I tried several dataset ,but the index model.treemap(X,tree_num*2)
are all zeros.And I am quit puzzled about this result.
Original comment by [email protected]
on 26 Sep 2012 at 8:25
Attachments:
- }0261BO3B
D)
U4LWWP~N8D.jpg`U4LWWP~N8D.jpg)
from randomforest-matlab.
can you send me the model file if possible?
Original comment by abhirana
on 26 Sep 2012 at 2:55
from randomforest-matlab.
i know the issue
treemap = [model.treemap(:,tree_num*2-1); model.treemap(:,tree_num*2);];
lDau = treemap(1:2:end); lDau = lDau(1:num_nodes);
rDau = treemap(2:2:end); rDau = rDau(1:num_nodes);
two columns of treemap are concatenated and generates lDau and rDau. lDau and
rDau are alternative.
most trees do not occupy nrnodes (max size) and that is the reason why most
times the second column is empty
Original comment by abhirana
on 26 Sep 2012 at 3:01
from randomforest-matlab.
Oh,thanks for you reply.
I guess now I have a better understanding of thr treemap now.
I have another question, when we use the model to predict the testing data,
is there existing a way to find which node in each tree does the testing data
locate?
Thanks.
Original comment by [email protected]
on 27 Sep 2012 at 1:39
from randomforest-matlab.
i guess you are looking for node information
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/tuto
rial_ClassRF.m#249
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/clas
sRF_predict.m#26
Original comment by abhirana
on 27 Sep 2012 at 1:44
from randomforest-matlab.
Hi, when I go through the details. I found that the node is not ntest by ntree
matrix. In my example, if the number of test is 50, the the node matrix is 50
times 1.
Original comment by [email protected]
on 27 Sep 2012 at 8:14
from randomforest-matlab.
When I try the follwing code
model = classRF_train(X_trn,Y_trn);
clear test_options
test_options.predict_all = 1;
test_options.proximity = 1;
[Y_hat, votes, prediction_per_tree, proximity_ts] = classRF_predict(X_tst,model,test_options);
Then there is an error whcih says
??? Error using ==> classRF_predict
Too many output arguments.
Original comment by [email protected]
on 27 Sep 2012 at 8:26
from randomforest-matlab.
hi zhang
are you using the latest svn source if not sync to the svn source. or use this
download link
http://randomforest-matlab.googlecode.com/issues/attachment?aid=410008000&name=r
f-rev55+-+20+Sep+2012.zip&token=DiBZ0BWzfgmEWFULfd4MDOaKvTo%3A1348784889462 (i
uploaded it in a different issue
http://code.google.com/p/randomforest-matlab/issues/detail?id=41&can=1)
Original comment by abhirana
on 27 Sep 2012 at 10:30
from randomforest-matlab.
Ok.Thank you for you advice. I have update the package now.
I still have a question.In each node, RF find the best spit among the randomly
selected features. But in the package this is just like a black-nox.
So is there some methods for me to modify the spilting rule in the RF?
Original comment by [email protected]
on 28 Sep 2012 at 3:08
from randomforest-matlab.
yeh, RF splits are based on the CART algorithm splits.
nah, the methods are too much imbued that it might be hard to modify the
splitting rule in RF.
take a look into findbestsplit function (reg_Rf.cpp for regression. rfsub.f for
classification) and search for crit
Original comment by abhirana
on 28 Sep 2012 at 6:23
from randomforest-matlab.
Hi,abhirana,
In the file of tutorial_Proximity_training_test ,how do you calulte the
Proximity between the training sample and test sample?
Do we need to find the node information about the testing and training sample.
And if the located in the same node, then the Proximity between them is added
by 1.
Then we normolize the Proximity matrix.
Is that the way to calcute the Proximity between the test and train sample?
Original comment by [email protected]
on 28 Sep 2012 at 7:36
from randomforest-matlab.
give me half a day. i need to fix a bug in the computeproximity routine.
if i remember somewhat, proximity is calculated somewhat as you described
Original comment by abhirana
on 28 Sep 2012 at 7:39
from randomforest-matlab.
i just added the bug fix in computeproximity routine and its the svn.
if you dont want to redownload the source, just change
line 245 in RF_Class_C\src\rfutils.cpp (computeProximity)
from (inbag[i] > 0) ^ (inbag[j] > 0) to (inbag[i] > 0) || (inbag[j] > 0)
i guess you are correct. computeProximity calculates the proximity matrix
Original comment by abhirana
on 28 Sep 2012 at 7:50
from randomforest-matlab.
Ok.But I am not sure why
prox: n x n proximity matrix
I guess prox should be a length(Y_tst) Times (length(Y_tst)+length(Y_trn))
where the first length(Y_tst) times length(Y_tst) should be the proximity
between the test samples and the rest be the Proximity between the train and
test.
Original comment by [email protected]
on 28 Sep 2012 at 8:35
from randomforest-matlab.
note that there are two cases described in the tutorial file
tutorial_proximity_training_test.m
one where training is done and the testing is not aware of the training
examples. the proximity calculation REQUIRES training example information and
when that is not available will default to proximity of only the test examples
the second example is what you are looking for
pass test examples and labels into classRF_train.. the returned model will have
the proximity information
model2 = classRF_train(X_trn,Y_trn, 2000, 0, extra_options,X_tst,Y_tst);
model2.proximity_tst
do post a snippet of code if you still have issues.
Original comment by abhirana
on 28 Sep 2012 at 8:46
from randomforest-matlab.
Hi,abhirana,
Is there some method for us to find the margin for each tree?
Original comment by [email protected]
on 28 Sep 2012 at 10:32
from randomforest-matlab.
can you define margin?
Original comment by abhirana
on 28 Sep 2012 at 5:55
from randomforest-matlab.
the margin is befined by breiman .
I guess we can get it from the 'prediction_pre_tree'.
sorry, we can only get it for a collection of trees, not for each tree.
Original comment by [email protected]
on 29 Sep 2012 at 2:27
Attachments:
- P}9D0{E32TNY86PL}@GE)P1.jpgP1.jpg)
from randomforest-matlab.
when you use prediction_per_tree you will get a nexample x ntree matrix, so you
will get it for individual tree predition for each test example
Original comment by abhirana
on 29 Sep 2012 at 2:29
from randomforest-matlab.
Hi,abhirana.
If I set the extra_optipns.replace=0.
There are still so many zeros in the model.inbag,which means some samples are still out of bag.Why this happen?
code:load data/twonorm
%modify so that training data is NxD and labels are Nx1, where N=#of
%examples, D=# of features
X = inputs';
Y = outputs;
[N D] =size(X);
%randomly split into 250 examples for training and 50 for testing
randvector = randperm(N);
X_trn = X(randvector(1:250),:);
Y_trn = Y(randvector(1:250));
X_tst = X(randvector(251:end),:);
Y_tst = Y(randvector(251:end));
extra_options.replace = 0 ;
extra_options.keep_inbag = 1; %(Default = 0)
model = classRF_train(X_trn,Y_trn, 100, 4, extra_options);
Original comment by [email protected]
on 29 Sep 2012 at 7:09
from randomforest-matlab.
replace will only change the the replacement scheme from with
replacement(default or 1) to without replacement(0). it doesn't have any effect
on the number of out bag examples because thats controlled by the sampsize
variable
if you want to change how many examples you want to sample per tree change the
sampsize variable
Original comment by abhirana
on 29 Sep 2012 at 11:37
from randomforest-matlab.
Ok.But what is the default value for the sampsize in your code? Seems it does
not mention in the tutorial file.
Original comment by [email protected]
on 1 Oct 2012 at 7:05
from randomforest-matlab.
randomforests default: sampling N times with replacement from N training
examples (which are the same as what is done for bagging).
Original comment by abhirana
on 1 Oct 2012 at 5:25
from randomforest-matlab.
So ,in this case,if replace=0, why there are so many 0s in the inbag?
I guess the 0 in the inbag means this sample is out of bag.But we have to
sample N times without replacement.So ervey sample should in the bag.
Original comment by [email protected]
on 2 Oct 2012 at 2:09
from randomforest-matlab.
Note the sampsize default is .632*N when doing without replacement. That
proportion is around the same when doing with replacement. So you at having
same number of out bags both ways
Original comment by abhirana
on 2 Oct 2012 at 3:00
from randomforest-matlab.
Ok. So when with replacement, we sample N from N. If replace=0, we sample
0.632*N from N without replacement.
I have another question. When we select mtry feature from all the features,
could we assign a weight vector and select the feature according to their
weight? If so, where could I change the code?
I can see there are 'mexRF_train' funtion in your code.However, I could not
find the code for this funtion in the package.
Thanks.
Original comment by [email protected]
on 2 Oct 2012 at 3:26
from randomforest-matlab.
you can always change how many examples are being sampled by tweaking sampsize
mexRF_train is compiled from a bunch of files in the src folder. you can find
the list of files being compiled in compile_windows.m. you will have to modify
the c/c++ and maybe fortran code to implement that.
Original comment by abhirana
on 2 Oct 2012 at 3:47
from randomforest-matlab.
Related Issues (20)
- weak learner HOT 1
- Compiling on Mac Lion HOT 6
- Compiled mexmaci64 for OSX 10.8.2 (Mountain Lion) HOT 2
- about the unbalanced data HOT 32
- Segmentation violation problem HOT 2
- Hierarchical sampling of data? HOT 3
- memory leak in HOT 1
- probability of classes for highly skewed dataset HOT 2
- Feature Normalization HOT 1
- sampsize problem
- score values from random forest HOT 1
- MATLAB crashes after tens of thousands runs !! HOT 3
- Compilation Problems with Matlab 2014a on Mac HOT 7
- How to get individual tree predictions for regression HOT 2
- use library (gcc) in matlab and error with compile of mex HOT 1
- NaN data HOT 4
- multivariate label output in regression analysis
- Matlab (randomly) crash after a number of runs HOT 5
- Directions for Bagging Regression HOT 2
- Quantifying Fractal Dimension HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from randomforest-matlab.