Coder Social home page Coder Social logo

Comments (16)

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
[deleted comment]

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
I want to use a 10-fold cross-validation with your RandomForest,but I am not 
sure if 
your program could support it?

Original comment by [email protected] on 28 Feb 2010 at 12:02

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
Comment 1:
this package works as you will with any matlab function/variable.

say model_RF=classRF_train()

then you can save model_RF to a file via save in matlab

and then load it later and use it again for classRF_predict()



Comment 2:
RF does something called oob, to regulate overfitting 
http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#ooberr

I think you will just be able to give an error rate at the most

You should be able to do something like CV simply in matlab


%assume X is your data and Y the labels
N = size(X,1) %number of examples
num_random_exp = 100 %number of experiments to do
type_of_CV = 10 %for 10 fold

results_array=[] %array to store results

for i=1:num_random_exp
   indices = randperm(N); %shuffle the indices 
   train_indices = indices(1:floor(N-N/type_of_CV));
   test_indices = indices(1+floor(N-N/type_of_CV):end);

   %training set
   X_trn = X(train_indices,:);
   Y_trn = Y(train_indices);

   %test set
   X_tst = X(test_indices,:);
   Y_tst = Y(test_indices);

   model_RF = classRF_train(X_trn,Y_trn);
   Y_hat = classRF_predict(X_tst,model_RF);

   results_array(i) = length(find(Y_hat~=Y_tst));
end

error_rate = mean(results_array)

Original comment by abhirana on 28 Feb 2010 at 12:22

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
[deleted comment]

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
About Comment2,I read the ooberr,but in my test ,the same data test in weka3.7 
with
CV 10 training.the error_rate only 0.2;and in your program,I got error_rate is 
0.51 
without CV 10

Original comment by [email protected] on 28 Feb 2010 at 1:24

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
are you sure that number of trees and mtry are the same. sometimes 1000 trees 
are the 
least number of trees that you need.

also affecting the results are if weka is considering categorical data as in 
the 
current form you have to define it explicitly in the program.

Original comment by abhirana on 28 Feb 2010 at 1:26

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
Abhirana,thanks for you help.I am testing it,and then I will report it to you.

Original comment by [email protected] on 28 Feb 2010 at 1:36

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
I use cv-10,trees =100,mtry=6,error_rate=0.238068 little higer than wake.
Under wake:trees = 10,mtry = 6,error_rate = 0.1932.

Original comment by [email protected] on 28 Feb 2010 at 8:53

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
donot consider trees less than 1000 (or atleast 500 if your data is large) to 
come up 
with results.

random forests are very dependent on random numbers and many a times you need 
to have 
atleast 1000 trees  before the forest stabilizes


Original comment by abhirana on 28 Feb 2010 at 9:00

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
My test data more than 20000,if I use tress =1000,it will run very slowly.and I 
will
try it,but I am affraid it maybe produced "out memory" error in matlab.

Original comment by [email protected] on 28 Feb 2010 at 9:09

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
oh. it might run slowly, but i have routinely used files of large sizes like 
around 
30000+ data points and 50 dimensions on a reasonable machine

Try a short example: gradually increase the number of trees and plot the oob 
rate 

If your oob rate still seem to go lower on increasing the number of trees,it 
means 
your tree has not yet stabilized. After a while the oob rate will stabilize and 
that 
many trees are atleast required to get a decent stable answer. 

See the example 16 in the tutorial. the dataset is simple and around 100 trees 
suffice to bring a steady oob rate

% example 16: getting the OOB rate. model will have errtr whose first
% column is the OOB rate. and the second column is for the 1-st class and
% so on
    model = classRF_train(X_trn,Y_trn);
    Y_hat = classRF_predict(X_tst,model);
    fprintf('\nexample 16: error rate %f\n',   
length(find(Y_hat~=Y_tst))/length(Y_tst));

    figure('Name','OOB error rate');
    plot(model.errtr(:,1)); title('OOB error rate');  xlabel('iteration (# trees)'); 
ylabel('OOB error rate');

Original comment by abhirana on 28 Feb 2010 at 9:23

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
Hi,abhirana.I use trees =500 train my dataset,and I got error_rate 0.003,but 
use this
model to test another dataset,I got error_rate 0.240596.And I upload a oob rate 
plot 
for you.

when I use tress =1000,the error "Out of memory" happened.And I can't solve 
this 
error.If you have any good idea?My train dataset is 25000x42.

Original comment by [email protected] on 28 Feb 2010 at 3:59

Attachments:

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
well, if you are getting worse error rates on a different dataset, it might 
just mean 
that these two datasets are a bit different.

hmm, 500 trees should be good enough, but in case if you are looking for larger 
number of trees, you should look into getting a 64bit OS with 64bit Matlab. I 
think 
you are using a 32bit OS that has a limit of only allowing around 2GB of memory 
for a 
process. http://support.microsoft.com/kb/555223


Original comment by abhirana on 28 Feb 2010 at 8:21

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
That's right,I am using a 32bit OS.I want to use PCA or ICA to reduce the 
dimension 
of the datasets,maybe it could get good result.

Original comment by [email protected] on 1 Mar 2010 at 12:55

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024
well, i donot think that you have too many dimensions (and thus PCA and ICA 
might 
just remove out important dimensions). 40 are a reasonable bunch and you have 
lots 
and lots of examples. Dimensionality reduction usually helps if you have lots 
of 
dimension but not that many examples in comparison.

do try out SVM (like http://www.csie.ntu.edu.tw/~cjlin/libsvm/ or 
http://svmlight.joachims.org/ toolbox) and see if they help in your case. That 
will 
allow you to build a baseline accuracy comparison against RandomF (try out both 
linear and non-linear svm and note the results). Sometimes data just doesnt 
have 
enough information (or the test and training are too different) either in terms 
of 
variety of examples or features to make prediction better. if comparing with 
svms 
give you approximately the same results, then it means that you might have to 
look 
into getting more types of data.

Original comment by abhirana on 1 Mar 2010 at 1:04

from randomforest-matlab.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 8, 2024

Original comment by abhirana on 17 Mar 2010 at 6:22

  • Changed state: Done

from randomforest-matlab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.