Comments (14)
Hello,
i think i put in categories somewhere (via ncat and cat variables) but at that
point i didn't have time/and a dataset to test out both the r version (i am
somewhat r-challenged) and the matlab version. if you have a simple
categorical/numerical mixed dataset can you send it out to me? i can try in
matlab and r; if not that's ok, let me see if i can get one from somewhere.
the issue with strings is that they are anyways converted to categorical
integers within the r code; and i don't know if matlab even supports mixing a
integer, string into a single matrix (for me to process within the training
code);
so if you can do some kind of preprocessing (like converting into categorical
integers) before sending it to the rf training code than maybe that works out?
Original comment by abhirana
on 24 Aug 2012 at 6:00
- Changed state: Accepted
- Added labels: Priority-High
- Removed labels: Priority-Medium
from randomforest-matlab.
Thanks for the quick response!
Yeah, combining strings and integers in one matrix is not possible. After a
bit more reading, it seems the best bet would probably be to construct a
dataset array (http://www.mathworks.com/help/toolbox/stats/bqziht7-1.html)
which is similar to R's data tables. They can hold cells, categorical,
ordinal, and numeric columns, and columns can be accessed by name or by index.
This seems like it would be the most flexible, but they are relatively new
(R2007a) and require the Statistics Toolbox so I have never seen them used.
Anyway, back to random forests: yes it wold be simple to convert to integer
codes for categorical variables. I just need to be sure that they are being
treated as categorical instead of continuous so that the order of the coding
doesn't bias the splits. I didn't see anything in the tutorial about
categorical variables, but if there's already a way to do it that's great,
could you explain?
If not, I don't have my dataset yet, but the hospital dataset in the statistics
toolbox has mixed data I think, as does census income from the UCI repository.
I haven't looked at these too much, but I hope they help. I looked at
TreeBagger again, and it has an option to enter a logical array to identify
categorical variables, but I would prefer to use your package as I have read it
is considerably faster.
Thanks for the help!
Original comment by [email protected]
on 24 Aug 2012 at 6:14
from randomforest-matlab.
Hey
i just added new code into the svn (both classification/regression) and i think
categorical data is now considered within code.
how do i know its being considered? shorter and more accurate trees are being
created.
just make sure that the categorical data values get a unique number (a unique
integer should suffice) for each categories they belong to.
the example code is at the end of the tutorial files ( i converted existing
datasets into categorical data). its basically telling what features are
categorical via an option, extra_options.categorical_feature = 1xD vector with
mapping of what features to consider as categorical
do tell if you run into any issues.
yeh, i guess i will skip the mixed matrix till it is available in base matlab.
Original comment by abhirana
on 26 Aug 2012 at 12:02
- Changed state: Fixed
from randomforest-matlab.
Awesome, thanks! It's great to see such a quick update.
Original comment by [email protected]
on 27 Aug 2012 at 3:43
from randomforest-matlab.
So I finally got my dataset and want to run the random forest, but I'm not
seeing the example in the tutorial. Did you upload the changes?
Original comment by [email protected]
on 20 Sep 2012 at 7:27
from randomforest-matlab.
oh its at the end of the tutorial file
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/tuto
rial_ClassRF.m#256
Original comment by abhirana
on 20 Sep 2012 at 7:54
from randomforest-matlab.
Ah, I see. I had just redownloaded the precompiled .zip from the download
link. What do I need to update from the files in the source tab?
Original comment by [email protected]
on 20 Sep 2012 at 9:15
from randomforest-matlab.
actually the svn version is somewhat ahead of the precompiled version in the
download link.
attached file is an extract
Original comment by abhirana
on 20 Sep 2012 at 9:24
Attachments:
- [rf-rev55 - 20 Sep 2012.zip](https://storage.googleapis.com/google-code-attachments/randomforest-matlab/issue-41/comment-8/rf-rev55 - 20 Sep 2012.zip)
from randomforest-matlab.
Awesome, thank you. I really appreciate all the work you've put in here. I'll
check back in once I've tested it out.
Original comment by [email protected]
on 20 Sep 2012 at 9:44
from randomforest-matlab.
Looks good, the forests seem to be working as expected. Thank's for all your
work!
Original comment by [email protected]
on 1 Oct 2012 at 9:03
from randomforest-matlab.
[deleted comment]
from randomforest-matlab.
Hello,
Could you please provide me with the pre-compiled version of the code shared
above.
I tried a lot but failed to generate mex file.
As my compilation is giving various error in classRF.cpp code.
Original comment by [email protected]
on 24 Jun 2013 at 6:18
from randomforest-matlab.
attached is the latest pre-compiled version of the code
Original comment by abhirana
on 25 Jun 2013 at 3:13
Attachments:
from randomforest-matlab.
Ah, thanks a lot !
Original comment by [email protected]
on 25 Jun 2013 at 4:54
from randomforest-matlab.
Related Issues (20)
- weak learner HOT 1
- Compiling on Mac Lion HOT 6
- Compiled mexmaci64 for OSX 10.8.2 (Mountain Lion) HOT 2
- about the unbalanced data HOT 32
- Segmentation violation problem HOT 2
- Hierarchical sampling of data? HOT 3
- memory leak in HOT 1
- probability of classes for highly skewed dataset HOT 2
- Feature Normalization HOT 1
- sampsize problem
- score values from random forest HOT 1
- MATLAB crashes after tens of thousands runs !! HOT 3
- Compilation Problems with Matlab 2014a on Mac HOT 7
- How to get individual tree predictions for regression HOT 2
- use library (gcc) in matlab and error with compile of mex HOT 1
- NaN data HOT 4
- multivariate label output in regression analysis
- Matlab (randomly) crash after a number of runs HOT 5
- Directions for Bagging Regression HOT 2
- Quantifying Fractal Dimension HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from randomforest-matlab.