kingfengji / gcforest Goto Github PK

View Code? Open in Web Editor NEW

1.3K 89.0 428.0 119 KB

This is the official implementation for the paper 'Deep forest: Towards an alternative to deep neural networks'

Home Page: http://lamda.nju.edu.cn/code_gcForest.ashx

Python 100.00%

machine-learning random-forest ensemble-learning deep-forest

gcforest's Introduction

Update (Feb 1, 2021)

ATTENTION!

This repository will no longer be maintained, please check our new repository for Deep Forest with GREAT improvements on efficiency.

Details at:

Repository: https://github.com/LAMDA-NJU/Deep-Forest
Documentation: https://deep-forest.readthedocs.io/
Package on PyPI: https://pypi.org/project/deep-forest/

You can install the newer version of gcForest via pip

pip install deep-forest

The older version (gcForest v1.1.1) in this repo will only be served as an illustration of the algorithm.

gcForest v1.1.1 Is Here!

This is the official clone for the implementation of gcForest.(The University's webserver is unstable sometimes, therefore we put the official clone here at github)

Package Official Website: http://lamda.nju.edu.cn/code_gcForest.ashx

This package is provided "AS IS" and free for academic usage. You can run it at your own risk. For other purposes, please contact Prof. Zhi-Hua Zhou ([email protected]).

Description: A python 2.7 implementation of gcForest proposed in [1].
A demo implementation of gcForest library as well as some demo client scripts to demostrate how to use the code.
The implementation is flexible enough for modifying the model or fit your own datasets.

Reference: [1] Z.-H. Zhou and J. Feng. Deep Forest: Towards an Alternative to Deep Neural Networks.
In IJCAI-2017. (https://arxiv.org/abs/1702.08835v2 )

Requirements: This package is developed with Python 2.7, please make sure all the dependencies are installed,
which is specified in requirements.txt

ATTN: This package was developed and maintained by Mr.Ji Feng(http://lamda.nju.edu.cn/fengj/) .For any problem concerning the codes, please feel free to contact Mr.Feng.（[email protected]) or open some issues here.

What's NEW:

Scikit-Learn style API
Some more detailed examples
GPU support if you want to use xgboost as base estimators
Support Python 3.5(v1.1.1)

v1.1.1 Python 3.5 Compatibility: The package should work for Python 3.5. Haven't check everything for now but it seems OK.

v1.1.1 Bug Fixed : When doing multiple predictions for the same model, the result will be consistant if you are using pooling layer. The bug only occurs for the scikit-learn APIs and now it is OK for the new api also.

Quick start

The simplest way of using the library is as follows:

from gcforest.gcforest import GCForest
gc = GCForest(config) # should be a dict
X_train_enc = gc.fit_transform(X_train, y_train)
y_pred = gc.predict(X_test)

And that's it. Please see /examples/demo_mnist.py for a detailed useage.

For order versons AND some more model configs reported in the original paper, please refer:

v1.0

Supported Based Classifier

The based classifiers inside gcForest can be any classifiers. This library support the following ones:

RandomForestClassifier
XGBClassifier
ExtraTreesClassifier
LogisticRegression
SGDClassifier

To add any classifiers, you could manually add them from lib/gcforest/estimators/__init__.py

Define your own structure

Define your model with a single json file.

IF you only need cascading forest structure. You only need to write one json file. see /examples/demo_mnist-ca.json for a reference.(here -ca is for cascading)
IF you need both fine grained and cascading forests, you will need to specifying the Finegraind structure of your model also.See /examples/demo_mnist-gc.json for a reference.
Then, use gcforest.utils.config_utils.load_json to load your json file.
```
config = load_json(your_json_file)
gc = GCForest(config) # that's it
```
and run python examples/demo_mnist.py --model examples/yourmodel.json

Define your model inside your python scripts.

You can also define the model structure inside your python script. The model config should be a python dictionary, see the get_toy_config in /examples/demo_mnist.py as a reference.

Supported APIs

fit_transform(X_train,y_train)
fit_transform(X_train,y_train, X_test=X_test, y_test=y_test), this allows you to evaluate your model during training.
set_keep_model_in_mem(False). If your RAM is not enough, set this to false. (default is True). IF you set this to False, you would have to use fit_transform(X_train,y_train, X_test=X_test, y_test=y_test) to evaluate your model.
predict(X_test)
transform(X_test)

Supported Data Types

If you wish to use Cascade Layer only, the legal data type for X_train, X_test can be:

2-D numpy array of shape (n_sampels, n_features).
3-D or 4-D numpy array are also acceptable. For example, passing X_train of shape (60000, 28, 28) or (60000,3,28,28) will be automatically be reshape into (60000, 784)/(60000,2352).

If you need to use Finegraind Layer, X_train, X_test MUST be a 4-D numpy array

for image-like data. the dimension should be (n_sampels, n_channels, n_height, n_width)
for sequence-like data. the dimension should be (n_sampels, n_features, seq_len, 1). e.g. For IMDB data, n_features is 1. For music MFCC data, n_features is 13.

Others

Please read examples/demo_mnist.py for a detailed walk-through.

package dependencies

The package is developed in python 2.7, higher version of python is not suggested for the current version.

run the following command to install dependencies before running the code: pip install -r requirements.txt

Order Versions

For order versons, please refer:

v1.0

Happy Hacking.

gcforest's People

Contributors

Stargazers

Watchers

Forkers

sadscv allensmile fulquan felixwzh junjin8433 adolfoeliazat chen89 jgglg mylearning2017 wanjinchang benjamesbabala youngdev jdc08161063 dagangwood163 oraix chenjun40 zhouzhenkun drheli ichito yzy5630 ruimao1988 hhlisme jinyuwang groningenutrecht123 csjunxu arunnairid chenyangh michaelfeng87 wsyjwps1983 song-xx lgen hulalazz zhangxujinsh alanbrown1 jankim jane8816 lxw4939 mrxuefei hy17003 hanahimi mornydew dacapricorn edwardzeng sinberyh huihui7987 gzhy nansbas johnson-yue fanfannothing kkpop fshaikh-fast guillermogsjc gearchen shuaiyuan1996 countryold idiosyncraticdragon phunghx sue2415535899 rambala7 yanghui15 yaoliweb xiaolin5869 yuanjungod vrgwfapft flag-c yinonglong hanfeng-cdd woai88 zvcxoyo mutual-ai maggie0830 xinhandi codeaudit safibaig chrinide quxiaofeng zhyhhust walkers-mv george1ee lidaboo thesage21 ninachang1107 hangjie720 whatbeg dansonc cosecant-csc fengzean mr-rxz pokbe secmatrix rex-hou erik-ly leoliu926 faruihuihui allenqian0531 zhanghao11 frankfqchen evergreen1992 movinghera rayshark

gcforest's Issues

How to save and load models？

请问可以用在回归任务里吗

rt，我想将深度森林用于特征点定位里，请问有什么指导性的建议吗？主要在哪些地方改动呢，谢谢！

AttributeError: 'NoneType' object has no attribute 'n_classes_'

There was an error when I ran 'python tools/train_fg.py --model models/mnist/gcforest/fg-tree500-depth100-3folds.json --log_dir logs/gcforest/mnist/fg --save_outputs'

Traceback (most recent call last):
File "tools/train_fg.py", line 49, in
net.fit_transform(data_train.X, data_train.y, data_test.X, data_test.y, train_config)
File "lib/gcforest/fgnet.py", line 53, in fit_transform
layer.fit_transform(train_config)
File "lib/gcforest/layers/fg_win_layer.py", line 106, in fit_transform
keep_model_in_mem=train_config.keep_model_in_mem)
File "lib/gcforest/estimators/kfold_wrapper.py", line 103, in fit_transform
y_proba = est.predict_proba(X[val_idx].reshape((-1, n_dims)), cache_dir=cache_dir)
File "lib/gcforest/estimators/base_estimator.py", line 80, in predict_proba
batch_size = batch_size or self.default_predict_batch_size(est, X)
File "lib/gcforest/estimators/sklearn_estimators.py", line 44, in default_predict_batch_size
return forest_predict_batch_size(clf, X)
File "lib/gcforest/estimators/sklearn_estimators.py", line 23, in forest_predict_batch_size
mem_size_1 = clf.n_classes * clf.n_estimators * 16
AttributeError: 'NoneType' object has no attribute 'n_classes'

I tired to debug the code, but I could not solve the problems. I will appreciate it if anyone can help me.
Thanks a lot!

signal data training

Hi,
Currently, I am trying to use this model for the EEG signal training, the data is 1-d, not 2-d( not image).

Then, according to the DataFormat requirement: Data Format-------"
0. Please refer lib/datasets/mnist.py as an example the dataset should has attribute X,y to represent the data and label y should be 1-d array
For fine-grained scanning, X should be 4-d array (N x channel x H x W). (e.g. cifar10 shoud be Nx3x32x32, mnist should be Nx1x28x28, uci_semg should be Nx1x3000x1)."--------

Kind of confused about it, could you help me about how to change the code to meet my own dataset training?
Great thanks.

gc训练的模型测试结果固定100个是为什么?

您好!
我用demo_mnist-gc.json训练了一个模型.
测试的时候,出现如下图所示的情况.
测试时,我输入788张32x32的RGB彩色图像,但是最后y_pred只输出了100个值是为什么?不应该输出788个值吗?
希望收到您的回复!谢谢!

how to save the fine-grained scanning model without the pooling results?

Hi @kingfengji, It's exciting for the V1.1, very nice work, thanks for your effort.
A new API set_keep_model_in_mem was supported.

If it is set True, the model will be saved. But I want to know how do I just save fine-grained model without the pooling results.

fail to install tensorflow

tensorflow can be only devloped with python3.X,but other package is developed with python 2.7.how can i developed tensorflow with python 2.7? looking forward to your reply.

how to improve the Accuracy by fine-tuning the parameter of gcForest？

Hi, I am sorry about the previous question. the gcForest was working well under my environment. However, I am confused about the result after using gcForest to handle the multi-classification problem. Here is the related code of my issue.

 def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", dest="model", type=str, default=None, help="gcfoest Net Model File")
    args = parser.parse_args()
    return args


def get_toy_config():
config = {}
ca_config = {}
ca_config["random_state"] = 0
ca_config["max_layers"] = 100
ca_config["early_stopping_rounds"] = 3
ca_config["n_classes"] = 3
ca_config["estimators"] = []
ca_config["estimators"].append(
        {"n_folds": 5, "type": "XGBClassifier", "n_estimators": 10, "max_depth": 5,
         "objective": "multi:softprob", "silent": True, "nthread": -1, "learning_rate": 0.1} )
ca_config["estimators"].append({"n_folds": 5, "type": "RandomForestClassifier", "n_estimators": 10, "max_depth": None, "n_jobs": -1})
ca_config["estimators"].append({"n_folds": 5, "type": "ExtraTreesClassifier", "n_estimators": 10, "max_depth": None, "n_jobs": -1})
ca_config["estimators"].append({"n_folds": 5, "type": "LogisticRegression"})
config["cascade"] = ca_config
return config

In my case, I just change the number of classes to three. BTW, the input vector was the matrix（1280*320)， and the labeled data was the matrix（1280,) . It turns out the accuracy of leave-one-group-out was just like this.

And, I used the MLP for my data also. the result is much better than gcForest. Do you have any clue for this problem？maybe the hyper-parameter of gcForest？ Thanks for your patience.
Best regards
Irving

Does gcforest can only run on cpu?

When I run demos, it seems that the program doesn't use gpu .
And I saw the program outputs "Using TensorFlow backend."
But , I didn't find a "import tensorflow" in codes.
So, is there anything wrong with my understanding OR it can't use gpu OR how to use gpu?
Thank you so much !

the process killed when I run the fine grained model of MNIST database,but the cascade model works,so how does it happen?

[ 2017-07-11 18:42:43,583][fgnet.fit_transform] X_train.shape=(60000, 1, 28, 28), y_train.shape=(60000,), X_test.shape=(10000, 1, 28, 28), y_test.shape=(10000,)
[ 2017-07-11 18:42:43,584][fg_win_layer.fit_transform] [data][win1/7x7], bottoms=[u'X', u'y'], tops=[u'win1/7x7/ets', u'win1/7x7/rf']
[ 2017-07-11 18:42:43,584][fg_win_layer.fit_transform] [progress][win1/7x7] ti=0/2, top_name=win1/7x7/ets
[ 2017-07-11 18:42:43,621][fg_win_layer.fit_transform] [data][win1/7x7,train] bottoms.shape=60000x1x28x28,60000
[ 2017-07-11 18:42:43,634][win_utils.get_windows] get_windows_start: X.shape=(60000, 1, 28, 28), X_win.shape=(49, 7260000), nw=11, nh=11, c=1, win_x=7, win_y=7, stride_x=2, stride_y=2
[ 2017-07-11 18:42:44,037][win_utils.get_windows] get_windows_end
[ 2017-07-11 18:42:44,134][fg_win_layer.fit_transform] [data][win1/7x7,test] bottoms.shape=10000x1x28x28,10000
[ 2017-07-11 18:42:44,139][win_utils.get_windows] get_windows_start: X.shape=(10000, 1, 28, 28), X_win.shape=(49, 1210000), nw=11, nh=11, c=1, win_x=7, win_y=7, stride_x=2, stride_y=2
[ 2017-07-11 18:42:44,371][win_utils.get_windows] get_windows_end
[ 2017-07-11 18:42:46,279][base_estimator.fit] X_train.shape=(4839516, 49), y_train.shape=(4839516,)
已杀死

Usage questions

So I ran the UCI letter training data with:

python tools/train_cascade.py --model models/uci_letter/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/uci_letter/ca

After the training finished I am looking at
In [1]: object

Can I pass images of letters into gcTree for classification at this point?

Generally speaking I'm not sure what to do once training is finished...

A gcForest example on my own dataset

Hi, I used gcForest to train the Boiler Water-Wall Tube Dataset . The code is here. And the result and my question are in the below:

Without multi-grained forests

77.5%

With multi-grained forests

(Sliding window size:{32}): 81.25%
(Sliding window size:{16}, {32}): 82.50%
(Sliding window size:{16}, {32}, {64}): 82.50%
(Sliding window size:{8}, {16}, {32}, {64}): 83.75%
(Sliding window size:{8}, {16}, {32}, {64}, {128}): 82.5%
(Sliding window size:{4}, {8}, {16}, {32}, {64}, {128}): 83.75%

How to use gcforest to fit my own dataset?

In the README, it says I can refer lib/datasets/mnist.py as an example for my own dataset.
But the MNIST dataset downloaded by mnist.py is a .npz file.
And my own dataset is consisted of pictures.
So, could you please tell me more about how to use gcforest on my own dataset?
Thank you so much .

Use gcForest to train my own image dataset

Hi, thanks for your excellent work. I want to use gcForest to train my own image dataset but I don't find example that I can refer to. I try to write by my self as following:

using gcForest to classify normal and defect metal images.
image size: 256*256
train size: 320(160 normal + 160 defect)
test size: 80(40 normal + 40 defect)
folder tree:
 data
   --train
     --normal
     --defect
   --test
     --normal
     --decect
 main.py
 cascade.json

main.py

parent_path=os.path.dirname(os.path.realpath(__file__))
train_data_dir = parent_path + '/data/train/'
validation_data_dir = parent_path + '/data/test/'
X_train=[]
Y_train=[]
X_test=[]
Y_test=[]

for directory in os.listdir(train_data_dir):
    for file in os.listdir(train_data_dir+directory):
        print(train_data_dir+directory+"/"+file)
        img=Image.open(train_data_dir+directory+"/"+file).convert('L')
        featurevector=np.array(img).flatten() 
        X_train.append(featurevector)
        Y_train.append(directory)

for directory in os.listdir(validation_data_dir):
    for file in os.listdir(validation_data_dir+directory):
        print(validation_data_dir+directory+"/"+file)
        img=Image.open(validation_data_dir+directory+"/"+file).convert('L')
        featurevector=np.array(img).flatten() 
        X_test.append(featurevector)
        Y_test.append(directory)

config = load_json('cascade.json')
gc = GCForest(config)  # should be a dict

X_train = np.array(X_train)
Y_train = np.array(Y_train)
Y_train = Y_train.reshape(320, 1)
X_test = np.array(X_test)
Y_test = np.array(Y_test)
Y_test = Y_test.reshape(80, 1)

X_train_enc = gc.fit_transform(X_train, Y_train)
pred_X = gc.predict(X_test)
print(pred_X)
# evaluating accuracy
accuracy = accuracy_score(y_true=Y_test, y_pred=pred_X)
print('gcForest accuracy : {}'.format(accuracy))

cascade.json

{
"cascade": {
    "random_state": 0,
    "max_layers": 100,
    "early_stopping_rounds": 3,
    "n_classes": 2,
    "estimators": [
        {"n_folds":5,"type":"XGBClassifier","n_estimators":2,"max_depth":5,"objective":"multi:softprob", "silent":true, "nthread":-1, "learning_rate":0.1},
        {"n_folds":5,"type":"RandomForestClassifier","n_estimators":2,"max_depth":null,"n_jobs":-1},
        {"n_folds":5,"type":"ExtraTreesClassifier","n_estimators":2,"max_depth":null,"n_jobs":-1},
        {"n_folds":5,"type":"LogisticRegression"}
    ]
}
}

But when I run main.py, there is an error:

File "/home/maroon/tmp/gcForest/lib/gcforest/estimators/kfold_wrapper.py", line 71, in fit_transform
assert len(X.shape) == len(y.shape) + 1
AssertionError

I take much time but still can't solve it. How should I do to deal with image dataset and is there more material to refer?

gcForest 1.1 is here

Hi all.
I've updated the package. Previously I was working on a project about extracting information along decision paths. The paper just got accepted in AAAI-18(called encoderForest). Tonight I finally got time to update the code.
Please give me feed backs if you are in trouble.

AssertionError:n_classes(10) != len(unique(y)) [0 1]

cannot find the '*.pkl' datas

an error came out when I ran 'python tools/train_cascade.py --model models/mnist/gcforest/fg-tree500-depth100-3folds-ca.json' after running python tools/train_fg.py --model models/mnist/gcforest/fg-tree500-depth100-3folds.json --log_dir logs/gcforest/mnist/fg --save_outputs:

Loading data from /data/unagi0/CT_Anomaly_Detection/Huang/gcforest/cache/mnist/fg-tree300-depth0/datas/train/outputs.pkl Traceback (most recent call last): File "tools/train_cascade.py", line 38, in <module> data_train = get_dataset(config["dataset"]["train"]) File "lib/gcforest/datasets/__init__.py", line 52, in get_dataset return ds_class(**ds_config) File "lib/gcforest/datasets/ds_pickle2.py", line 17, in __init__ with open(data_path) as f: IOError: [Errno 2] No such file or directory: u'/data/unagi0/CT_Anomaly_Detection/Huang/gcforest/cache/mnist/fg-tree300-depth0/datas/train/outputs.pkl'

I thought the 'outputs.pkl' file should be generated after running train_cascade.py but not before running i; however, I may be wrong. Is there some setting for the path where the outputs.pkl is saved.

Best regards!

Can be used in GPU for the parallel implementation of the gcforest theoretically？

NameError: name 'basestring' is not defined

lib\gcforest\layers\fg_pool_layer.py
NameError: name 'basestring' is not defined

plans for gcForest V1.1

For the next version, I plan to write some more wrappers such as
model.fit() / model.predict()
for a more user-friendly API if your goal here is to train the model out-of-box.
(again, currently you actually can extract predictions)

Other suggestions is welcomed.
Thanks.

ImageSegmentaion

Does anyone use gcForest for segmentaion? I am a beginer in the gcForest and have no idea about processing the image and label. I use the demo_mnist.py and load the image and label by opencv, until now.
The label is an binary image. Forground is 1, backgroud is 0.

Cascade gcForest not working due to missing xrange() function in Python 3.5

I am trying to run a minimal example of a cascade classifier. However, it fails due to the xrange() function not being implemented in Python 3.5.The example goes as follows (omitting config and definition of training data for clarity):

from gcforest.gcforest import GCForest
gc = GCForest(config)
gc.fit_transform(X_train, y_train, X_test, y_test)

kernel died

When I ran gc.fit_transform(X_train, y_train) and then the kernel died .

I don't know how to solve this problem...

I modified the model and log dir in the train_fg.py file but encountered some problems

I modified the code in the train_fg.py to test on the mnist dataset and the code is as follows:
parser.add_argument('--model', dest='models/mnist/gcforest/fg-tree500-depth100-3folds.json', type=str, required=True, help='gcfoest Net Model File')

parser.add_argument('--save_outputs', dest='save_outputs', action="store_true", help="Save outputs")

parser.add_argument('--log_dir', dest='logs/gcforest/mnist/fg', type=str, default=None, help='Log file directory')
But the program reported wrong:
usage: train_fg.py [-h] --model

MODELS/MNIST/GCFOREST/FG-TREE500-DEPTH100-3FOLDS.JSON

[--save_outputs] [--log_dir LOGS/GCFOREST/MNIST/FG]

train_fg.py: error: argument --model is required
I don't know how to solve this problem and need help. Thank you!

ImportError: No module named gcforest.gcforest

I don't know how to fix it. I run pip install -r requirements.txt successfully. But it alerts me another error:

Using TensorFlow backend. Traceback (most recent call last): File "/home/mingyi/gcForest-master/examples/demo_mnist.py", line 19, in <module> from gcforest.gcforest import GCForest ImportError: No module named gcforest.gcforest
Thanks!

What's the difference between complete random tree forests and random forests

Sorry to bother you but I wonder what's the difference between complete random tree forests and random forests, which was refered in your paper.Thanks!

How to save the cascade model for test after fit the forest in the cascade level

Hi, How could I reuse the model after training?

that is to say how to save the trained whole model to disk, in order to reuse the model with other test set to evaluate the performance. Because each time to evaluate test set need to retrain the model, which waste too much time.
Thank you.

Any plans to support regression problems?

Would love to see how gcForest compares to LSTMs for time series regressions.

Is the `argparse` package still needed for the Python 3.5 version?

When I deploy the gcForest in the Anaconda environment with Python 3.5 using the following command

conda create -n gcForest python=3.5 argparse

and the error message is displayed as following:

Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:
  - argparse -> python=2.6
  - python=3.5
Use "conda info <package>" to see the dependencies for each package.

It seems that there is a confliction between Python 3.5 environment and the argparse package. So I am wondering whether the argparse is still required for the code in Python 3.5 version?

I am wondering why there is no field to control trees' scale in the cascade config --- "demo_mnist-ca.json"

the oroginal config in cascade :

"cascade": {
"random_state": 0,
"max_layers": 100,
"early_stopping_rounds": 3,
"look_indexs_cycle": [
[0, 1],
[2, 3],
[4, 5]
],
"n_classes": 10,
"estimators": [
{"n_folds":5,"type":"ExtraTreesClassifier","n_estimators":1000,"max_depth":null,"n_jobs":-1},
{"n_folds":5,"type":"RandomForestClassifier","n_estimators":1000,"max_depth":null,"n_jobs":-1}
]
}
}

sometimes we can set "min_samples_leaf":10 or "max_depth":10 in the estimators to control the scale of a tree in the forest

fg-tree500-depth100-3folds.json

"train":{
"keep_model_in_mem":0,
//"model_cache_dir":"/mnt/raid/fengji/gcforest/mnist/fg-tree500-depth100-3folds/models",
"random_state":0,
"data_cache":{
"cache_in_disk":{
"default":1
},
"keep_in_mem":{
"default":0
},
"cache_dir":"/mnt/raid/fengji/gcforest/mnist/fg-tree500-depth100-3folds/datas"
}
}

我想问一下，上面的"keep_model_in_mem":0, 是指把训练的模型保存在外存中，如果"keep_model_in_mem":1,是把训练模型保存在内存中，但是，我不知道 "data_cache"和 "cache_dir"是什么含义，是用来保存什么的。能帮忙解答一下吗

Issue about the demo MNIST

@kingfengji
Hi!
I'm trying to run the demo following the readme.txt.
When I tried to train the MNIST, I input this :

And the program begins running.
BUT, it stopped here:

I've been waiting for a long time, but it just stopped, no error or warning.
What happened to the demo? Did I do something wrong?
Looking forward to your reply.
Thank you.

What does it take to train on big colored pictures?

Thanks

How can I install the package?

I've been waiting for the "official package" for a long while since the original paper was released. Nice job!!

Just trying to get my head around the installation as I couldn't seem to find� where to get started. It's mentioned in the readme.md that in order to make it work one needs to install the dependencies but how about the main package? My apologies if the questions sound naive but if you can add a setup.py or so that would be very helpful.

use 2-D numpy array data, encounter error

error:
xgboost.core.XGBoostError: value 0 for Parameter num_class should be greater equal to 1

Train the Fine Grained Forest use the model cifar10 just go [win_utils.get_windows] get_windows_end shows the MemoryError!

I run the example cifar10 model to train a fine grained Forest,the hardware environment is Ubantu16.04 TLS，pycharm community Edtion,python 2.7,8g memory.When the code run to
[win_utils.get_windows] get_windows_start: X.shape=(10000, 3, 32, 32), X_win.shape=(192, 1690000), nw=13, nh=13, c=3, win_x=8, win_y=8, stride_x=2, stride_y=2
[win_utils.get_windows]get_windows_end
the terminal shows the following Error:
Traceback (most recent call last):
File "tools/train_fg.py", line 49, in
net.fit_transform(data_train.X, data_train.y, data_test.X, data_test.y, train_config)
File "lib/gcforest/fgnet.py", line 53, in fit_transform
layer.fit_transform(train_config)
File "lib/gcforest/layers/fg_win_layer.py", line 106, in fit_transform
keep_model_in_mem=train_config.keep_model_in_mem)
File "lib/gcforest/estimators/kfold_wrapper.py", line 98, in fit_transform
est.fit(X[train_idx].reshape((-1, n_dims)), y[train_idx].reshape(-1), cache_dir=cache_dir)
MemoryError

I will be grateful to those who can answer me where the point is.

I intend to run MNIST,after a corresponding change according to README.md,but some problem have arisen

root@ballonflower-HP-Pavilion-Notebook:/home/ballonflower/gcForest-master# python tools/train_fg.py --model models/mnist/gcforest/fg-tree500-depth100-3folds.json --log_dir logs/gcforest/mnist/fg --save_outputs
Using TensorFlow backend.
[ 2017-09-07 16:19:28,970][train_fg.] tools.train_fg
[ 2017-09-07 16:19:28,971][train_fg.]
{
"dataset":{
"test":{
"data_set":"test",
"layout_x":"tensor",
"type":"mnist"
},
"train":{
"data_set":"train",
"layout_x":"tensor",
"type":"mnist"
}
},
"net":{
"layers":[
{
"bottoms":[
"X",
"y"
],
"estimators":[
{
"max_depth":100,
"min_samples_leaf":10,
"n_estimators":500,
"n_folds":3,
"n_jobs":-1,
"type":"ExtraTreesClassifier"
},
{
"max_depth":100,
"min_samples_leaf":10,
"n_estimators":500,
"n_folds":3,
"n_jobs":-1,
"type":"RandomForestClassifier"
}
],
"n_classes":10,
"name":"win1/7x7",
"stride_x":2,
"stride_y":2,
"tops":[
"win1/7x7/ets",
"win1/7x7/rf"
],
"type":"FGWinLayer",
"win_x":7,
"win_y":7
},
{
"bottoms":[
"X",
"y"
],
"estimators":[
{
"max_depth":100,
"min_samples_leaf":10,
"n_estimators":500,
"n_folds":3,
"n_jobs":-1,
"type":"ExtraTreesClassifier"
},
{
"max_depth":100,
"min_samples_leaf":10,
"n_estimators":500,
"n_folds":3,
"n_jobs":-1,
"type":"RandomForestClassifier"
}
],
"n_classes":10,
"name":"win1/10x10",
"stride_x":2,
"stride_y":2,
"tops":[
"win1/10x10/ets",
"win1/10x10/rf"
],
"type":"FGWinLayer",
"win_x":10,
"win_y":10
},
{
"bottoms":[
"X",
"y"
],
"estimators":[
{
"max_depth":100,
"min_samples_leaf":10,
"n_estimators":500,
"n_folds":3,
"n_jobs":-1,
"type":"ExtraTreesClassifier"
},
{
"max_depth":100,
"min_samples_leaf":10,
"n_estimators":500,
"n_folds":3,
"n_jobs":-1,
"type":"RandomForestClassifier"
}
],
"n_classes":10,
"name":"win1/13x13",
"stride_x":2,
"stride_y":2,
"tops":[
"win1/13x13/ets",
"win1/13x13/rf"
],
"type":"FGWinLayer",
"win_x":13,
"win_y":13
},
{
"bottoms":[
"win1/7x7/ets",
"win1/7x7/rf",
"win1/10x10/ets",
"win1/10x10/rf",
"win1/13x13/ets",
"win1/13x13/rf"
],
"name":"pool1",
"pool_method":"avg",
"tops":[
"pool1/7x7/ets",
"pool1/7x7/rf",
"pool1/10x10/ets",
"pool1/10x10/rf",
"pool1/13x13/ets",
"pool1/13x13/rf"
],
"type":"FGPoolLayer",
"win_x":2,
"win_y":2
}
],
"outputs":[
"pool1/7x7/ets",
"pool1/7x7/rf",
"pool1/10x10/ets",
"pool1/10x10/rf",
"pool1/13x13/ets",
"pool1/13x13/rf"
]
},
"train":{
"data_cache":{
"cache_dir":"/mnt/raid/fengji/gcforest/mnist/fg-tree500-depth100-3folds/datas",
"cache_in_disk":{
"default":1
},
"keep_in_mem":{
"default":0
}
},
"keep_model_in_mem":0,
"random_state":0
}
}
Traceback (most recent call last):
File "tools/train_fg.py", line 45, in
data_train = get_dataset(config["dataset"]["train"])
File "lib/gcforest/datasets/init.py", line 52, in get_dataset
return ds_class(**ds_config)
File "lib/gcforest/datasets/mnist.py", line 24, in init
(X_train, y_train), (X_test, y_test) = mnist.load_data()
File "/usr/local/lib/python2.7/dist-packages/keras/datasets/mnist.py", line 16, in load_data
f = np.load(path)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 412, in load
pickle_kwargs=pickle_kwargs)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 171, in init
_zip = zipfile_factory(fid)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 101, in zipfile_factory
return zipfile.ZipFile(file, *args, **kwargs)
File "/usr/lib/python2.7/zipfile.py", line 770, in init
self._RealGetContents()
File "/usr/lib/python2.7/zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
Exception AttributeError: "'NpzFile' object has no attribute 'zip'" in <bound method NpzFile.del of <numpy.lib.npyio.NpzFile object at 0x7fa6a707ef10>> ignored

No module named joblib

Doing the import

from gcforest.gcforest import GCForest

gives the error:

No module named 'joblib'

Solved by changing the win_utils.py to include

from sklearn.externals.joblib import Parallel, delayed

How to use the model trained by myself ?

@kingfengji
Hi !
I have trained a model on my own dataset, and I got a .pkl model.
So how to use it?
Just use it like " y_pred = gc.predict(X_test) " ?
Looking forward to your reply !
Thank you !

NameError: name 'basestring' is not defined

I download the latest verion of gfForest. and build the environment. When I use the commander as "python examples/demo_mnist.py --model examples/demo_mnist-gc.json", the messager is follow as:
Using TensorFlow backend.

Traceback (most recent call last):
File "/home/cf/Documents/gcForest/examples/demo_mnist.py", line 55, in
gc = GCForest(config)
File "lib/gcforest/gcforest.py", line 16, in init
self.fg = FGNet(self.config["net"], self.train_config.data_cache)
File "lib/gcforest/fgnet.py", line 36, in init
layer = get_layer(layer_config, self.data_cache)
File "lib/gcforest/layers/init.py", line 32, in get_layer
layer = layer_class(layer_config, data_cache)
File "lib/gcforest/layers/fg_pool_layer.py", line 27, in init
self.pool_method = self.get_value("pool_method", "avg", basestring)
NameError: name 'basestring' is not defined

I think maybe the varable 'basestring'. Would you like fix this bug?

Beside this, there is another bug in this version: line 26 in base_layer.py

def get_value(self, key, default_value, value_types, required=False, config=None):
return get_config_value(config or self.layer_config, key, default_value, value_types,
required=required, config_name=self.name)
return value

this function has two return sentence.
@kingfengji

'GCForest' object has no attribute 'get_params'

Trying to run inside a sklearn ensemble, any way around this?

Afer I modified the file "demo_mnist-gc.json"and trying to save the model ,then the problem occurs like this

[ 2018-03-06 14:31:54,717][fgnet.fit_transform] X_train.shape=(60000, 1, 28, 28), y_train.shape=(60000,), X_test.shape=(10000, 1, 28, 28), y_test.shape=(10000,)
[ 2018-03-06 14:31:54,719][fg_win_layer.fit_transform] [data][win1/7x7], bottoms=[u'X', u'y'], tops=[u'win1/7x7/ets', u'win1/7x7/rf']
[ 2018-03-06 14:31:54,719][fg_win_layer.fit_transform] [progress][win1/7x7] ti=0/2, top_name=win1/7x7/ets
[ 2018-03-06 14:31:54,721][fg_win_layer.fit_transform] [data][win1/7x7,train] bottoms.shape=60000x1x28x28,60000
[ 2018-03-06 14:31:54,753][win_utils.get_windows] get_windows_start: X.shape=(60000, 1, 28, 28), X_win.shape=(49, 7260000), nw=11, nh=11, c=1, win_x=7, win_y=7, stride_x=2, stride_y=2
[ 2018-03-06 14:31:54,994][win_utils.get_windows] get_windows_end
[ 2018-03-06 14:31:55,043][fg_win_layer.fit_transform] [data][win1/7x7,test] bottoms.shape=10000x1x28x28,10000
[ 2018-03-06 14:31:55,049][win_utils.get_windows] get_windows_start: X.shape=(10000, 1, 28, 28), X_win.shape=(49, 1210000), nw=11, nh=11, c=1, win_x=7, win_y=7, stride_x=2, stride_y=2
[ 2018-03-06 14:31:55,164][win_utils.get_windows] get_windows_end
[ 2018-03-06 14:32:54,977][base_estimator.fit] Save estimator to /home/zhaopanpan/gcforest/mnist/fg-tree50-depth10-3folds/models/win1-7x7-ets-3_folds/win1-7x7-ets-3_folds-0.pkl ...
[ 2018-03-06 14:32:55,601][base_estimator.predict_proba] Load estimator from /home/zhaopanpan/gcforest/mnist/fg-tree50-depth10-3folds/models/win1-7x7-ets-3_folds/win1-7x7-ets-3_folds-0.pkl ...
[ 2018-03-06 14:32:55,603][base_estimator.predict_proba] done ...
Traceback (most recent call last):
File "examples/demo_mnist.py", line 65, in
X_All_enc = gc.fit_transform(X_train, y_train, X_test=X_test, y_test=y_test)
File "lib/gcforest/gcforest.py", line 31, in fit_transform
self.fg.fit_transform(X_train, y_train, X_test, y_test, train_config)
File "lib/gcforest/fgnet.py", line 54, in fit_transform
layer.fit_transform(train_config)
File "lib/gcforest/layers/fg_win_layer.py", line 107, in fit_transform
keep_model_in_mem=train_config.keep_model_in_mem)
File "lib/gcforest/estimators/kfold_wrapper.py", line 101, in fit_transform
y_proba = est.predict_proba(X[val_idx].reshape((-1, n_dims)), cache_dir=cache_dir)
File "lib/gcforest/estimators/base_estimator.py", line 77, in predict_proba
batch_size = batch_size or self.default_predict_batch_size(est, X)
File "lib/gcforest/estimators/sklearn_estimators.py", line 44, in default_predict_batch_size
return forest_predict_batch_size(clf, X)
File "lib/gcforest/estimators/sklearn_estimators.py", line 23, in forest_predict_batch_size
mem_size_1 = clf.n_classes * clf.n_estimators * 16
AttributeError: 'NoneType' object has no attribute 'n_classes'

the modified demo_mnist-gc.json content is as follow:
"dataset":{
"train": {"type": "mnist", "data_set": "train", "layout_x": "tensor"},
"test": {"type": "mnist", "data_set": "test", "layout_x": "tensor"}
},
"train":{
"keep_model_in_mem":1,
"model_cache_dir":"/home/zhaopanpan/gcforest/mnist/fg-tree50-depth10-3folds/models",
"random_state":0,
"data_cache":{
"cache_in_disk":{
"default":1
},
"keep_in_mem":{
"default":0
},
"cache_dir":"/home/zhaopanpan/gcforest/mnist/fg-tree50-depth10-3folds/datas"
}
},
"dataset": {
"train": {
"type": "ds_pickle2",
"data_path": "/home/zhaopanpan/gcforest/mnist/fg-tree50-depth10-3folds/datas/train/outputs.pkl",
"X_keys": ["pool1/7x7/ets", "pool1/7x7/rf", "pool1/10x10/ets", "pool1/10x10/rf", "pool1/13x13/ets", "pool1/13x13/rf"]
},
"test": {
"type": "ds_pickle2",
"data_path": "/home/zhaopanpan/gcforest/mnist/fg-tree50-depth10-3folds/datas/test/outputs.pkl",
"X_keys": ["pool1/7x7/ets", "pool1/7x7/rf", "pool1/10x10/ets", "pool1/10x10/rf", "pool1/13x13/ets", "pool1/13x13/rf"]
}
},
and the rest is same as the original in the github. When the problem occurs, I have check the code for severl times, and no errors were found. .Hopeing someone can help me .............

Could you tell me which conference was this essay published? I want to download this paper,hoping your help

How could I predict the test set?

I'm not talking about the validation set with golden labels, but the test set.
When I have trained the model, I want to predict the test set. But I could not find the predict function or any configuration.

Maybe,there is something wrong in "gcforest-master/lib/gcforest/cascade/cascade_classifier.py"?

in the function "def _check_group_dims (self, X_groups, is_fit):"

if is_fit:
group_dims.append( X_group.shape[1] )
#According to the paper. I think the next line code should be
# " group_starts.append(i if i == 0 else group_starts[i - 1] + group_dims[i-1])"
group_starts.append(i if i == 0 else group_starts[i - 1] + group_dims[i])
group_ends.append(group_starts[i] + group_dims[i])

 because the look _indexs_cycle are[[0,1],[2,3],[4,5]] those are consecutive.so the original input ( six pool1) should next to each other.if write like this  "group_starts.append(i if i == 0 else group_starts[i - 1] + group_dims[i])"  ,one of the look_index_cycle which is input to the layer is not consecutive.
Is that right? I am thinking

how to install gcforest under ubantu?

how to install gcforest package under ubantu?

ValueError: look_indexs unlegal!!! look_indexs=[6, 7]

When I am trying to run "python tools/train_cascade.py --model models/mnist/gcforest/fg-tree500-depth100-3folds-ca.json", there is an error as the fllowing:

File "lib/gcforest/cascade/cascade_classifier.py", line 123, in fit_transform
raise ValueError("look_indexs unlegal!!! look_indexs={}".format(look_indexs))
ValueError: look_indexs unlegal!!! look_indexs=[6, 7]

I don't know what's the problem, can anyone help me?

can't not run with python3

Environment : python3.6, Centos 7.

code in lib/gcforest/layers/fg_pool_layer.py:

class FGPoolLayer(BaseLayer):
    def __init__(self, layer_config, data_cache):
        """
        Pooling Layer (MaxPooling, AveragePooling)
        """
        super(FGPoolLayer, self).__init__(layer_config, data_cache)
        self.win_x = self.get_value("win_x", None, int, required=True)
        self.win_y = self.get_value("win_y", None, int, required=True)
        self.pool_method = self.get_value("pool_method", "avg", basestring)
.....

while run using the config examples/demo_mnist-gc.json, it has an error:

Traceback (most recent call last):
  File "examples/demo_mnist.py", line 55, in <module>
    gc = GCForest(config)
  File "lib/gcforest/gcforest.py", line 16, in __init__
    self.fg = FGNet(self.config["net"], self.train_config.data_cache)
  File "lib/gcforest/fgnet.py", line 36, in __init__
    layer = get_layer(layer_config, self.data_cache)
  File "lib/gcforest/layers/__init__.py", line 32, in get_layer
    layer = layer_class(layer_config, data_cache)
  File "lib/gcforest/layers/fg_pool_layer.py", line 27, in __init__
    self.pool_method = self.get_value("pool_method", "avg", basestring)
NameError: name 'basestring' is not defined

the basestring function is no longer used in python3.

How to train the model on my own dataset?

@kingfengji
Thank you for your effort on the v1.1 !
I would like to train the model on my own RGB image dataset.
But I noticed the data in the examples are all .pkl
So should I transform my dataset to plk ? Or is there another way to train the model on my own dataset?
Waiting for your reply !
Thanks again for the v1.1 !