Coder Social home page Coder Social logo

randomforest-matlab's Introduction

randomforest-matlab

This is a fork of the Google Code project randomforest-matlab (https://code.google.com/archive/p/randomforest-matlab) by Abhishek Jaiantilal under GNU GPL v2. Please cite the original project if you use it. The original README is as below:

Random Forest (Regression, Classification and Clustering) implementation for MATLAB (and Standalone)

This is a Matlab (and Standalone application) port for the excellent machine learning algorithm `Random Forests' - By Leo Breiman et al. from the R-source by Andy Liaw et al. http://cran.r-project.org/web/packages/randomForest/index.html ( Fortran original by Leo Breiman and Adele Cutler, R port by Andy Liaw and Matthew Wiener.) Current code version is based on 4.5-29 from source of randomForest package.

I especially am grateful for all the help i got from Andy Liaw. This project would not have been possible if not for the previous code by Andy Liaw, Matthew Wiener, Leo Brieman, Adele Cutler.

The wiki has short articles on using rfImpute to input in missing values and basic installation procedures.

early 2012 Bug: if you are training a lot of trees and you tend to get an error. try the svn source. that replaces allocation via callocs with mxcallocs http://code.google.com/p/randomforest-matlab/issues/detail?id=21

1-march-2010 Bug: Note the inputs to the package are in double. So make sure you are sending in doubles (the package just assumes that you are sending in doubles).

6-feb-2010 Added a separate precompiled windows binary (mex files) that will allow to run it on windows without compiling anything. Probably will require installation of a microsoft redistributable package. Edited the installation page here to reflect that http://code.google.com/p/randomforest-matlab/wiki/Introduction

Known Bug: right now it has a existing bug that is fixed in the R version 4.5-33. http://cran.r-project.org/web/packages/randomForest/NEWS . If both importance=TRUE and proximity=TRUE, the proximity matrix returned is incorrect. Those computed with importance=FALSE, or with proximity=TRUE are correct. Will fix it in sometime.

Version History:

SVN source (for now) contains unsupervised RF. Explanation in tutorial_ClusterRF.m

v0.02 (May-16-09) [Major Update] Supports Classification and Regression based RF's with allowing to change many parameters including mtry, ntree, nodesize, prox measure, importance etc. Roughly on par, with regards to functionality, with the R version. Not supported yet from the R-version are stratified sampling. Also Not tested are categorical variables (will do that once I get that type of data from somewhere). Added tutorial files to show how to get the different measures and examine the RF. Decrease in download file size as earlier version had a .svn folder present.

v0.01-preview (Deprecated) Supports Classification and Regression based RF's with allowing to change mtry (variables) to split on and ntree (number of trees). Many secondary things not supported yet. Based on 4.5-29 of randomForest source.

Source and Readme inside the code package. Works on all 32, 64-bit Windows and Linux.

-Abhishek Jaiantilal

Questions? Comments,etc direct to abhishek.jaiantilal (at) colorado.edu or post in the issues section.

randomforest-matlab's People

Contributors

ajaiantilal avatar tingliu avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

randomforest-matlab's Issues

Stratified Sampling


What version of the product are you using? On what operating system?

Windows-Precompiled-RF_MexStandalone-v0.02-\RF_MexStandalone-v0.02-precompiled

If I want to use Stratified Sampling for splitting data into testing and 
training sets in this randomforest package, can you please suggest anything?

Original issue reported on code.google.com by [email protected] on 1 Nov 2012 at 7:07

Nodesize Selection

Hi 

Could you please let me know how nodesize would affect the classification 
(Regression) result using RF?

It doesnt mean the higher the nodesize is the more accurate the result would 
be, correct? How can we determine the nodesize?

for number of the trees and the choose of mtry you mentioned some hints so I 
want to know how I can choose a reasonable mtry for classification (Regression)

Thanks,
Saleh

Original issue reported on code.google.com by [email protected] on 15 May 2012 at 11:02

Class Probability

Hi

I have a question regarding the output of RF classification. You use majority 
vote to assign a class label to a query. I was wondering is it possible to have 
the probability that the query belongs to a class rather than the class label?

Let me give you an example to better convey my meaning. Assume that we have 
three classes A, B, and C I'd like to see the probability that a query x 
belongs to class A, the probability that x belongs to B and the probability 
that x belongs to C. If each tree produces the class probabilities for query x, 
we can average the class probabilities to have the total class probabilities.

If the class label that your code assigns to the x is A it is reasonable to see 
higher probability for x belonging to A than other two classes.

Thanks,
Saleh

Original issue reported on code.google.com by [email protected] on 11 May 2012 at 2:34

Undefined function or method 'mexClassRF_train' for input arguments of type 'int32'

When I run the tutorial 'tutorial_ClassRF.m', I get the error:

??? Undefined function or method 'mexClassRF_train' for input arguments of
type 'int32'.

Error in ==> classRF_train at 347
[nrnodes,ntree,xbestsplit,classwt,cutoff,treemap,nodestatus,nodeclass,bestvar,nd
bigtree,mtry
...

I am running the student version of Matlab 7.4.0.287 (R2007a) on a MacBook
Pro with and Intel Core 2 Duo (64 bits).  I downloaded
RF_MexStandalone-v0.02.zip and also MacOS_precompiled-WITHOUT_SOURCE-v0.02.
 As directed I copied the files from the '2009b 64-bit' folder from
MacOS_precompiled-WITHOUT_SOURCE-v0.02 into the 'RF_Class_C' and the
'RF_Reg_C' folders produced from RF_MexStandalone-v0.02.zip.  I added all
the folders to my path.  Then when I run the tutorial file I get the error
above.

Based on the other email I saw concerning this same issue I guess this is
some sort of compiler issue.  Do I need to recompile everything?  i.e.
don't use the the precompiled files?  

Thank you for your help!
Corinne





Original issue reported on code.google.com by [email protected] on 17 May 2010 at 12:12

Segfault when call training a lot of times

I call RF training more than 10000 times consequently or in parallel. During 
the random iteration around 10000 it always fails with segfault.

Try to execute
parfor i = 1:10000, classRF_train(features, cols, 20, 3);  end

to reproduce. I'm not sure if specific input is important, but it failed for 
various inputs (which were all quite large). One of them is attached.

The program leaks a bit, so it looks like there is no memory available, but it 
is not the case (there are still 5Gb of free memory when it fails). Probably 
64x issue?

Win7, Matlab 7.12, 12 Gb of RAM
The dump file is attached.

Original issue reported on code.google.com by [email protected] on 10 Nov 2011 at 5:09

Attachments:

Training speed of Regression Forest

First thank you very much for this wonderful software!

I notice that for same number of samples and features, if only difference is 
the labeling type so one problem is classification and the other problem is 
regression, the time taken for construction of regression forest will be 
considerably longer than classification forest (using default parameters for 
msplit and keep ntrees the same. We also estimate variable importance along the 
way.) Is there any reasons behind this?

Thanks a lot!


Original issue reported on code.google.com by [email protected] on 27 Sep 2012 at 8:15

calling this code in c++or csharp

What steps will reproduce the problem?
1.I want to avoid the "out of memory" err in matlab,so I want to use c++ 
or csharp to calling this code.How about your advise?

Original issue reported on code.google.com by [email protected] on 17 Mar 2010 at 1:59

segmentation violation on high-dim dataset

What steps will reproduce the problem?
1. attempt to train a RF with a high-dimensional dataset (34
1300-dimensional vectors), using 101 trees and mtry=200 features:

myRF = classRF_train(foo_vecs(2:35,1:1300),foo_classLabels(2:35,:),101,200);

foo_vecs is a 36 x 4005 matrix of doubles
foo_classLabels is a 36 x 1 vector of doubles (-1,+1)
(see attached file)


What is the expected output? What do you see instead?
Expected: a trained RF.
Instead: a segmentation violation, with stack trace:

  [0] mexClassRF_train.mexmaci64:makeA(double*, int, int, int*, int*,
int*)~ + 151 bytes
  [1] mexClassRF_train.mexmaci64:classRF(double*, int*, int*, int*, int*,
int*, int*, int*, int*, int*, int*, int*, double*, double*, int*, int*,
int*, double*, double*, double*, double*, int*, int*, int*, int*, int*,
int*, double*, double*, int*, double*, int*, int*, double*, int*, int,
double*, double*, int*)~ + 2673 bytes
  [2] mexClassRF_train.mexmaci64:mexFunction~ + 3192 bytes
 ... more stuff


What version of the product are you using? On what operating system?
Version svn-v8? (0.02), MacOSX 10.6.2, Matlab 7.9.0.529 (R2009b) 64 bits, 
Intel Core 2 Duo (x86 Family 6 Model 7 Stepping 10). Mex file compiled from
source.

Note: the mex file works fine until the training set reaches about 34 x
1200, thereafter crashes. Could it be a memory allocation issue?

Original issue reported on code.google.com by [email protected] on 26 Feb 2010 at 4:22

Attachments:

How to save the model?Help me!

What steps will reproduce the problem?
1.When I train a RandomForest,How to save a model to predict with the new 
data next time?

Original issue reported on code.google.com by [email protected] on 27 Feb 2010 at 2:29

Memory management

What steps will reproduce the problem?
1. With large datasets,  I get an out of memory error, is there any fix for 
this in Matlab?


Original issue reported on code.google.com by [email protected] on 8 Apr 2012 at 5:10

Citation

I plan to cite your algorithm.  Is there any particular way I should do this?  

Original issue reported on code.google.com by [email protected] on 17 Jun 2010 at 11:55

Can't compile in Windows nor Unix

When compiling using the .m script files provided with MS Visual Studio 2010's 
cl on a PCWIN64 machine or with g++ on a Unix64 machine, I get several errors 
like the following:

src\mex_ClassificationRF_train.cpp(179) : error C2664: 
'mxCreateNumericMatrix_730' : cannot convert parameter 4 from 'int' to 
'mxComplexity'. Conversion to enumeration type requires an explicit cast 
(static_cast, C-style cast or function-style cast) 

My understanding is that C++ does not support implicit casting from int to enum 
types. I also tried using OPTIMFLAGS="$OPTIMFLAGS /Tc" in mex but the code does 
not seem to be C compatible either.


Original issue reported on code.google.com by [email protected] on 31 Jan 2011 at 12:41

Categorical Predictor Variables

Great work on the random forest implementation.  Coming from the R version, 
this was an easy adjustment.  I have been using it effectively on matrices of 
continuous data, but how does it handle categorical data?  I can't pass in an 
array of strings, nor can I assign integer values to categories because I don't 
want them to be treated as continuous.  Any suggestions?

Original issue reported on code.google.com by [email protected] on 23 Aug 2012 at 7:06

Incorrect relabeling of training set caused crash (Bugs in r54)

A. What steps will reproduce the problem?
1. Use RF to perform a classification task. We run the training program.
2. For training set, the labeling is [0,2]
3. We do not specify testing data when perform training.

B. What is the expected output? What do you see instead?
Expected successful training.
However. current version contains a bug causing crash.

C. What version of the product are you using? On what operating system?
R54. On Windows 8.

D. Possible cause is because of the following relabeling in classRF_train.m:

if exist('Xtst','var') && exist('Ytst','var') 
        if(size(Xtst,1)~=length(Ytst))
            error('Size of Xtst and Ytst dont match');
        end
        fprintf('Test data available\n');
        tst_available=1;
        tst_size = length(Ytst);
    else
        Xtst=1;
        Ytst=1;
        tst_available=0;
        tst_size=0;
    end

    TRUE=1;
    FALSE=0;

    orig_labels = sort(unique([Y; Ytst]));
    Y_new = Y;
    Y_new_tst = Ytst;
    new_labels = 1:length(orig_labels);

    for i=1:length(orig_labels)
        Y_new(find(Y==orig_labels(i)))=Inf;
        Y_new(isinf(Y_new))=new_labels(i);

        Y_new_tst(find(Ytst==orig_labels(i)))=Inf;
        Y_new_tst(isinf(Y_new_tst))=new_labels(i);
    end

    Y = Y_new;
    Ytst = Y_new_tst;

When running the code using above input:
orig_labels=[0 1 2] and unique(Y_new)=[1 3];

However. after relabeling unique(Y_new) shall become [1 2].

E. Correction is:
Change the line:
  Xtst=1;
  Ytst=1;
into:
  Xtst=X(1,:);
  Ytst=Y(1);

Again. Thanks for the wonderful software!









Original issue reported on code.google.com by [email protected] on 18 Sep 2012 at 8:06

Compiling RF on Ubuntu

What steps will reproduce the problem?
1.MatLab + Ubuntu
2.Run compile_linux.m

What is the expected output? What do you see instead?
A wonderful mex file

What version of the product are you using? On what operating system?
Ubuntu, gcc 4.4, matlab 2011


Two problems in my case:
1. mex calls the LateX mex instead of the Matlab one => change the makefile to 
the one attached that calls the Matlab Mex in the Matlab bin directory (the 
path will be different on other plateforms)
2. Matlab complains about the gcc version 4.4 so you need to follow the 
instructions [http://ubuntuforums.org/showthread.php?t=1413330 here]:
Code:
> sudo mv /usr/bin/gcc /usr/bin/gcc_mybackup
> sudo ln -s /usr/bin/gcc-'_what ever compliant version of gcc_' /usr/bin/gcc
*Compile your mex here*
and retore the gcc
> sudo mv /usr/bin/gcc_mybackup /usr/bin/gcc

Original issue reported on code.google.com by [email protected] on 5 Mar 2012 at 4:49

Attachments:

Matlab crashes after training several RFs

Hi all, 

I'm using the RF toolbox applied to supervised classification with 
active-learning (AL). A feature of this method, is that a classifier get 
iteratively retrained, in this case a RF classifier. When it reach something 
like 20.000 retrains, Matlab crashes and display the attach image.

I tested on windows 7 running Matlab 2008b and 2011a, obtaining a similar 
response.

Here's the Error Message:

MATLAB crash file:C:\Users\Hyper!\AppData\Local\Temp\matlab_crash_dump.4280
------------------------------------------------------------------------
       Segmentation violation detected at Wed Feb 01 01:20:23 2012
------------------------------------------------------------------------

Configuration:
  MATLAB Version:   7.7.0.471 (R2008b)
  Window System:    Version 6.1 (Build 7600)
  Processor ID:     x86 Family 6 Model 15 Stepping 13, GenuineIntel
  Virtual Machine:  Java 1.6.0_21 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode
  Default Encoding:  windows-1252

Fault Count: 1

Register State:
  rax = 0000000000000022   rbx = 0000000017b44700
  rcx = 000000ffffffffff   rdx = 0000000031e270c0
  rbp = 0000000000000001   rsi = 0000000031e20000
  rdi = 0000000031e270d0   rsp = 000000000102a650
   r8 = 0000000000000000    r9 = 0000031f270c0001
  r10 = 0000000000000010   r11 = 000000000000fa12
  r12 = 0000000000000000   r13 = 0000000100000001
  r14 = ffffffff00007fff   r15 = 00000000ffff0000
  rip = 0000000077891612   flg = 0000000000010202

Stack Trace:
  [  0] 0000000077891612                ntdll.dll+333330 (RtlFreeHeap+000306)
  [  1] 0000000077742A8A             kernel32.dll+141962 (HeapFree+000010)
  [  2] 00000000715BC7BC              MSVCR90.dll+313276 (free+000028)

This error was detected while a MEX-file was running. If the MEX-file
is not an official MathWorks function, please examine its source code
for errors. Please consult the External Interfaces Guide for information
on debugging MEX-files.

If it is an official MathWorks function, please
follow these steps to report this problem to The MathWorks so we
have the best chance of correcting it:

The next time MATLAB is launched under typical usage, a dialog box will
open to help you send the error log to The MathWorks. Alternatively, you
can send an e-mail to [email protected] with the following file attached:
    AppData\Local\Temp\matlab_crash_dump.4280

If the problem is reproducible, please submit a Service Request via:
    http://www.mathworks.com/support/contact_us/ts/help_request_1.html

A technical support engineer might contact you with further information.

Thank you for your help. MATLAB may attempt to recover, but even if recovery 
appears successful,
we recommend that you save your workspace and restart MATLAB as soon as 
possible.

I appreciate any comments.

Regards



Original issue reported on code.google.com by [email protected] on 1 Feb 2012 at 4:37

Attachments:

how to use random forest in matlab

i need to know that what are the basic step required to be done so that i could 
use random forest in MATLAB?
any complete documentation on programs given here (in standalone) would be much 
appreciated so that any novice could understand the capability of random forest 
and how it could help lot in MATLAB.

Original issue reported on code.google.com by [email protected] on 21 Dec 2011 at 7:46

Modelling the trees

Hi Abhishek Jaiantilal 

Thanks for a great code. Well described!

my dataset is a 50x10000matrix and using classification trees to determine the 
variables with the greatest importance. However RF is know to work as this 
blackbox so, I was wondering...

if there is any way to view each tree and include it into a report? or is the 
closest your "extra_options.do_trace"-function? which outputs:

tree      OOB      1      2      3      4      5      6      7      8      9    
 10     11     12     13     14     15     16     17     18     19     20     
21     22     23     24     25     26     27     28     29     30     31     32 
    33     34     35     36     37     38     39     40     41     42     43    
 44     45     46     47     48
    1:  56.13%  0.00%  0.00% 14.29% -1.#J% -1.#J%100.00%  0.00% 27.27% 72.73% 61.54% 54.55% 71.43% 92.31% 77.78% 90.00% 87.50% 71.43% 70.59% 70.27% 72.41% 71.43% 70.21% 96.30% 96.88% 91.67% 94.12% 94.44% 86.96% 88.24% 90.48% 92.86% 81.82% 84.85% 84.85% 72.62% 51.44% 50.80% 55.32% 56.55% 72.97% 67.01% 50.00% 50.96% 48.45% 27.18% 28.13% 20.45% 27.27% 
(as two rows).

the reason I'm asking is because I'd find this video showing each tree and how 
their importance is, at 4:18:

http://www.youtube.com/watch?v=RE7VO_AB7PI&feature=player_embedded

Thanks again for your program, and writing your citation in another topic.
Regards Thomas

Original issue reported on code.google.com by [email protected] on 9 Jul 2011 at 2:00

variable importance

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


If you are trying to run via some custom arguments and parameters, i.e. for
your own datasets etc. Can you post in the argument size and type (you can
get that via whos('argumentname')?

Please provide any additional information below.
How did u calculate the variable importance? My data set is of binary form? I 
want to know which the score  of importance of variables .

Original issue reported on code.google.com by [email protected] on 29 Apr 2012 at 7:38

Cutting-point selection

Hello,
Thanks for submitting this implementation which is great.

I'm wondering in this implementation how do you determine the best 
cutting-point at each node given a randomly selected attribute?

Does it search through all possible cutting points and use the one with best 
score in certain metric?

Thanks

Original issue reported on code.google.com by [email protected] on 30 Mar 2012 at 11:34

Dimension of output variable ndbigtree incorrect

Hi,

first of all thank you for enabling to use RF in Matlab.

This is nothing big, but initially I was a little bit confused by the size of 
the returned model.ndbigtree, which is [nrnodes x ntrees].

I found a description of Andy Liaw stating that the size should be a vector of 
size ntree, containing the number of nodes for each tree.
https://stat.ethz.ch/pipermail/r-help/2003-April/032256.html

I suppose changing the mex_ClassificationRF_train.cpp line 114 might solve that 
problem:
plhs[9] = mxCreateNumericMatrix(1, nt, mxINT32_CLASS, mxREAL);

You also might want to consider adding the above mentioned descriptions of the 
ouput variables into your .m file. I had a hard time finding out which content 
these variables are holding.

Regards,
Johannes

Original issue reported on code.google.com by [email protected] on 24 May 2011 at 2:39

Training hangs on controversial output

classRF_train([1 0; 1 0], [1 2]', 10, 2)

hangs my machine with probability 1.

Windows 7, Matlab 7.12.0.

It would be better to add some error handling it this case.

Thanks!

Original issue reported on code.google.com by [email protected] on 24 Oct 2011 at 4:59

Matlab crashes

Hello Abhishek, 

First of all thank you very much for making your code publicly available for 
research. I am using an hierarchical object recognition model that I have 
created,  which reduces to vectors learned by a classifier. I use a SVM but I 
am experimenting on other classifiers. 

I tried therefore to apply Random forests, but in both regression and 
classification as soon as the code hits your MEX files (which I compiled 
successfully) it crashes to desktop. 

I am running Windows 7 and the Matlab version is 2011a at 64bit.

Many Thanks
Aris

Original issue reported on code.google.com by [email protected] on 22 Nov 2012 at 4:51

Confusion Matrix

Hi,
i d like to make an accuracy assessment on each class im using.
In many papers ive read, thats its possible to compute a confusion Matrix, from 
which i could calculate my classaccuracy...
Unfourtntly i dont know how to implement the confusion Matrix, also its written 
in the readme.
Im usin v0.02 from RF_MexStandalone-v0.02.zip 

Would be awesome if u could help me out, coz i need it for my ba thesis badly ;)

greetings

Original issue reported on code.google.com by [email protected] on 25 May 2011 at 11:39

some questions

Hi,

  First of all many thanks for this wonderful code. I have some questions.

1) Is it possible to set the depth of each tree in random forest?
2) Each node in the tree should have one of the two node status values(1,-1) 
non terminal and terminal. But when I run tutorial_ClassRF.m I find a lot of 
nodes with a nodestatus of zero. What does this zero mean.?

Kindly guide me.

Original issue reported on code.google.com by [email protected] on 23 Oct 2012 at 11:19

Bootstrap sample for each tree

Hi Abhishek,

I am trying to extract the exact bootstrap sample used in each tree.
The return value of inbag tells me which samples where in bag for a certain 
tree. However it does not tell me how often each sample was selected.
Is there a way to find this out?

I would need this to be able to reproduce the gini impurity value of each node 
of a tree.

Thank you for your answer.
Johannes



Original issue reported on code.google.com by [email protected] on 11 May 2012 at 10:57

invalid precompiled mex files

Hi,

I am running Matlab version 7.1.0.246 (R14) Service pack 3 on a 64 bit machine, 
however the Matlab is installed in C:\Program Files (x86) which indicates that 
it is a 32 bit installation. I have downloaded the precompliled files and when 
I run the tutorial_RegRF.m I get the following error:

Setting to defaults 500 trees and mtry=3
??? Invalid MEX-file 
'C:\Users\Igor\Projects\Windows-Precompiled-RF_MexStandalone-v0.02-\RF_MexStanda
lone-v0.02-precompiled\randomforest-matlab\RF_Reg_C\mexRF_train.mexw32': The 
specified procedure could not be found.

.

Error in ==> regRF_train at 283
    [ldau,rdau,nodestatus,nrnodes,upper,avnode,...

I have checked that my machine has Microsoft visual C++ 2005 redistributable 
installed in the Control Panel. 

My version of Matlab is quiet old. 

Do you think compiling the files myself will solve the issue? 

Thank you in advance. 

Original issue reported on code.google.com by [email protected] on 17 Apr 2012 at 4:05

non-existent xbestsplit field in classRF_predict

What steps will reproduce the problem?
1. After training, regression model does not include xbestsplit
2. ClassRF_predict gives error : no ??? Reference to non-existent field 
'xbestsplit'.

version: Windows-Precompiled-RF_MexStandalone-v0.02-.zip   445 

Original issue reported on code.google.com by [email protected] on 8 Apr 2012 at 4:22

Cannot compile for Octave in Ubuntu12.04

What steps will reproduce the problem?
1. Just run command 'make', and then it will show that 'mex cannot be found'.
2. If modify the Makefile by replacing 'mex' with 'mkotfile --mex' and run 
'make' again, there are compiling errors in the source code.

What is the expected output? What do you see instead?
I cannot install it.

What version of the product are you using? On what operating system?
The lastest version in Ubuntu12.04

If you are trying to run via some custom arguments and parameters, i.e. for
your own datasets etc. Can you post in the argument size and type (you can
get that via whos('argumentname')?

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 19 Oct 2012 at 3:52

Non-Efficient Memory Allocation for RF Storage

This is a great MATLAB transportation of Random Forest. Thank you very much.

However, I found the memory allocated to store the RF (for classification) 
model is very non-efficient. Specifically, there are a lot of non-necessary 
elements stored in treemap, nodestatus, nodeclass, bestvar, xbestsplit, and 
ndbigtree. 

I will take twonorm.mat as an example for illustrain purpose, where only one 
tree is trained for simplicity. 
    >> load ./data/twonorm.mat
    >> model = classRF_train(inputs', outputs, 1);
Dimension of variables in the model are listed below (on my computer with 
64-bit Windows and MATLAB 2011b):

        Value
nrnodes     601
treemap     <601*2 int32>
nodestatus  <601*1 int32>
nodeclass   <601*1 int32>
bestvar     <601*2 int32>
xbestsplit  <601*2 int32>
ndbigtree   <601*2 int32>

Actually, ndbigtree denotes the number of nodes in each tree, in which only the 
first #model.ntree (here only 1) elements are useful and the rest are all zero. 
In my output, ndbigtree(1) is 

63. I checked on the predictClassTree() function in classTree.cpp to see how 
the prediction is made based on the tree hierarchy. I found that only the first 
#model.ntree (here only 1) elements in nodestatus, nodeclass, bestvar and 
xbestsplit are useful. The index of the left and right child of the kth node is 
treemap[2*k]-1 and treemap[2*k+1]-1, respectively. 

I am not sure why there are only 63 nodes in the tree, but we have to store as 
much as 601 (#nrnodes) elements in say nodestatus, and 2*63 elements in 
treemap. It will cost a great deal of extra memory to store RF with many trees 
trained on large number of samples. Is that possible to improve the momery 
allocation?

Original issue reported on code.google.com by [email protected] on 7 Apr 2012 at 10:42

Confidence Measure

Hello there,
   First, thanks for the wonderful code for random forest. I would like to know about the functionality of calculating the confidence measure for each prediction. Can you tell me a way to measure the confidence of each prediction by the random forest. Does your code include this functionality. It will be great if you can let us know a way to do it. 
Thanks

Original issue reported on code.google.com by gayumahalingam on 7 Oct 2010 at 2:17

Stabilizing Number of Trees.

Hi Abhishek,

I am using random forest package for my thesis. its great and simple.
I have few questions regarding initial settings. It will be a great help to me 
if you can help me out. 

I am trying to stabilize number of trees to be used. My Professor wants 
me to use as less trees as possible with out compromising performance. Is there 
any standard way that I can accomplish this.

Second question what is the role of seed value in the algorithm. Is it used to 
get the bootstrap sample to grow the tree?. Can I change the seed value.


Original issue reported on code.google.com by [email protected] on 31 Jan 2012 at 3:40

matlab2009a errors

What steps will reproduce the problem?
1.when I run "tutorial_ClassRF.m" in matlab2009a,I got these errors:
Random
Forest\nversion\Windows-Precompiled-RF_MexStandalone-v0.02-
\RF_MexStandalone-v0.02-precompiled\randomforest-
matlab\RF_Class_C\mexClassRF_train.mexw32':
由于应用程序配置不正确,应用程序未能启动。重新安装应��
�程序可能会纠正这个问题。.

Error in ==> classRF_train at 347    
[nrnodes,ntree,xbestsplit,classwt,cutoff,treemap,nodestatus,nodeclass,bestv
ar,ndbigtree,mtry    ...

Error in ==> tutorial_ClassRF at 39
    model = classRF_train(X_trn,Y_trn);

How can I do it ???

Original issue reported on code.google.com by [email protected] on 25 Feb 2010 at 2:27

Feature Selection

I want to use random forest for biological sequence classification.Is it 
possible to use this code for sequence classification? I have some positions as 
features. Is it possible to use those features in this classifier?

Original issue reported on code.google.com by [email protected] on 1 Feb 2012 at 8:08

how to do voting woth random forest?

hi all
i am doing some project work using random forest where i need to use random 
forest for voting purpose. i mean every tree would vote for desired feature and 
the best feature is taken into account at the end.
how to use this random forest for this purpose? would the give code would help 
me to do so? if not, how can i approach??
kindly reply to guide me.

thank you

Original issue reported on code.google.com by [email protected] on 1 Feb 2012 at 7:04

Addition of new training examples to an existing RF classifier?

Hello,

first of all, thank you very much for this code, I've been using it more and 
more for various research projects and will hopefully soon be able to cite it 
in a paper!

I was wondering if there was a way to add additional training examples to a 
previously trained RF classifier (using the same set of features, of course). I 
am interested in creating an interactive classification tool and being able to 
add additional examples without having to re-train the whole classifier would 
be _very_ useful!

I haven't investigated the fundamental aspect of random forests yet so it might 
be obvious that it is impossible but I thought it would be easier to ask before 
trying to figure it out by myself.

Thanks again for this code!


Regards,

Nicolas


Original issue reported on code.google.com by [email protected] on 7 Sep 2012 at 3:59

what is Y_train in classRF_train?

in the function, classRF_train(X,Y,ntree,mtry, extra_options), what are X & Y?? 
as per readme file, they are X: data matrix, Y: target values. could you please 
explain more clearly their individual role.
as far i am getting, for xtrain and xtest, features are being taken as input, 
but what about ytrain and ytest? what should be the possible input their? is 
that a some kind of index? please correct me if i am wrong.
also tell me when to use RF_Class_C and when RF_Reg_C with some example....
thank you.

Original issue reported on code.google.com by [email protected] on 7 Mar 2012 at 3:54

random number generator initialization

Initialize random number generator with srand() in classRF.cpp, otherwise
seedMT is seeded deterministically.

#include <time.h> 
srand ( time(NULL) ); 

prior to:

seedMT(2*rand()+1);


Original issue reported on code.google.com by [email protected] on 5 Jan 2010 at 7:54

about the categorical feature

hi,abhirana .
Thanks for your nice code.
I am not sure how you treat categorical features.I mean if there exist some 
categorical features in my dataset, how could I transfer them into numerical 
ones that can use your package.
Kindly guide me, please.

Original issue reported on code.google.com by [email protected] on 16 Nov 2012 at 6:27

Classification Validation

Hi

Im trying to use RF to do pixel classification for images of size 101*101 
pixels.

There are 18 features corresponding to each pixel and the number of classes is 
3. Also, my dataset contains 70 images.

Reading Leo Breiman and Adele Cutler website: 

http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

They said: "In random forests, there is no need for cross-validation or a 
separate test set to get an unbiased estimate of the test set error. It is 
estimated internally, during the run"

I was going to use k-fold cross validation (if k =5, build my model using 65 
images and test the model on 5 left over images)to validate RF classification 
result which time consuming but seems like I dont need to do that.

But I noticed that in tutorial_ClassRF.m you split your dataset into two 
classes of training and test and after building the model, you run the model on 
the test set. Could you please clarify this? How can I use this property of 
random forests via your code?

Best,
Saleh



Original issue reported on code.google.com by [email protected] on 11 May 2012 at 2:20

Problem in RF

I am trying to run a image data base in RF-MATLAB ,so that it can classify from 
an given database,,,,,but cont... it gives error,,,needs two class for 
classification,,,I wannt that it should run on my data base as it runs on 
twonorm data base,,,,i have attach that file with database ,,the file is 
vatsnewrf.m ,,n database is yale_database_B.mat,,,,,please help me,,,

Original issue reported on code.google.com by [email protected] on 29 Apr 2012 at 8:00

Attachments:

about the treemap

Hi,everyone.
When I run the randomforest,say, if I have the ntree to be 500, then the 
model.treemap weill be a matrix with the size of 501 X 1000,
and most of the elements are zeros. So what this treemap means? 
Thank you.

Original issue reported on code.google.com by [email protected] on 26 Sep 2012 at 2:34

using this code in c#

Hi, 
I want to run this code c#,
I don't have knowledge in matlab, I am using dot.net
and I don't have matlab installed,but I can install it if necessary(I'm 
student), Can you please Help me out what steps I need to do in order to make 
it work .

Thank's

Original issue reported on code.google.com by [email protected] on 12 Apr 2011 at 4:43

number of trees

I am using MATLAB 7.5 on Windows 7

I am try to use the code and use 2500 random forests for my training set. And 
it runs out of memory. 
So my question is: is the code retaining ALL the trees during the training or 
should it (or does it) only just retain the best one so far? The second option 
will not cause memory issues.

This is related to file classRF.cpp: and specifically the line: 

for(jb = 0; jb < Ntree; jb++) {

i will be grateful for the reply,

Original issue reported on code.google.com by [email protected] on 7 Mar 2011 at 11:41

Too less/many parameters: You supplied 15??? One or more output arguments not assigned during call to "mexClassRF_train".

Basically, I just downloaded MacOS_precompile_WITHOUT_SOURCE_v0.02.tar and
I tried it. I copied mexClassRF_predict.mexmaci64 and
mexClassRF_train.mexmaci64 to the right position and I ran it. However, it
turned out the following message:

Too less/many parameters: You supplied 15??? One or more output arguments
not assigned during call to "mexClassRF_train".

Error in ==> classRF_train at 353

[nrnodes,ntree,xbestsplit,classwt,cutoff,treemap,nodestatus,nodeclass,bestvar,nd
bigtree,mtry
    ...  

I have not do anything change to this package. So, could anyone help me fix
this issue?


What version of the product are you using? On what operating system?
My MATLAB is R2009b(64-bit), MACi64 and my system is MAC OS X 10.6.2.

Thanks a lot


Original issue reported on code.google.com by [email protected] on 16 Feb 2010 at 1:48

problem with rfsub.f

What steps will reproduce the problem?

When I'm trying to compile the code I get the following error:

Compiling rfsub.f (fortran subroutines)
gfortran   -O2 -fpic -march=native -c src/rfsub.f -o rfsub.o
src/rfsub.f:0: error: bad value (native) for -march= switch
src/rfsub.f:0: error: bad value (native) for -mtune= switch


Have you ever crossed with the same issue?
Any help is appreciated.

MJ

Original issue reported on code.google.com by [email protected] on 5 Jun 2012 at 11:07

Matlab died when randomforest finished its process

What steps will reproduce the problem?
 I am using the randomforest on 64bit Linux machine with Matlab version
7.9.0.529 (R2009b) through SSH Secure Shell Version 3.2.9. 
Everything is fine but after randomforest finished its job, Matlabe is just
stuck. This means that Matlab does not respond. 
I looked at the process, but it seems Matlab is just stuck but the process
is still alive, but I cannot do anything else, so I have to disconnect the
terminal and log in again. 
Once I disconnected the terminal then Matlab process is killed and there is
no dump file. 


Original issue reported on code.google.com by [email protected] on 21 Apr 2010 at 9:09

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.