Coder Social home page Coder Social logo

flennerhag / mlens Goto Github PK

View Code? Open in Web Editor NEW
831.0 29.0 106.0 10.18 MB

ML-Ensemble – high performance ensemble learning

Home Page: http://ml-ensemble.com

License: MIT License

Python 99.97% Shell 0.03%
ensemble-learning machine-learning ensemble learners stacking stack ensembles python

mlens's People

Contributors

flennerhag avatar jlopezpena avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlens's Issues

Loading pickled models cause NotFittedError

Thanks yours model ,this is great idea.
I use the model in mulit layer, and use pickle dump to store the train model,but when load model which have beed stored some error is found .
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/python3.5/lib/python3.5/site-packages/mlens/ensemble/base.py", line 615, in predict return self._backend.predict(X, **kwargs) File "/usr/python3.5/lib/python3.5/site-packages/mlens/ensemble/base.py", line 206, in predict out = self._predict(X, 'predict', **kwargs) File "/usr/python3.5/lib/python3.5/site-packages/mlens/ensemble/base.py", line 266, in _predict out = manager.stack(self, job, X, return_preds=r, **kwargs) File "/usr/python3.5/lib/python3.5/site-packages/mlens/parallel/backend.py", line 655, in stack return self.process(caller=caller, out=out, **kwargs) File "/usr/python3.5/lib/python3.5/site-packages/mlens/parallel/backend.py", line 700, in process self._partial_process(task, parallel, **kwargs) File "/usr/python3.5/lib/python3.5/site-packages/mlens/parallel/backend.py", line 721, in _partial_process task(self.job.args(**kwargs), parallel=parallel) File "/usr/python3.5/lib/python3.5/site-packages/mlens/parallel/layer.py", line 123, in __call__ "Layer instance (%s) not fitted." % self.name) mlens.utils.exceptions.NotFittedError: Layer instance (layer-1) not fitted.

so how can i store and use my model of trained?

Error while using AUC-ROC as the scoring metric

While adding any model in the ensemble layer, it is throwing an error as:

RuntimeError: Cannot clone object Learner(attr='predict', backend='threading', dtype=<class 'numpy.float32'>,
estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=6, max_features='sqrt', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=300, n_jobs=1,
oob_score=True, random_state=1, verbose=0, warm_start=False),
indexer=FoldIndex(X=None, folds=2, raise_on_exception=True), n_jobs=-1,
name='randomforestclassifier', preprocess=None, proba=False,
raise_on_exception=True, scorer=make_scorer(roc_auc_score)), as the constructor does not seem to set parameter scorer

Silence subprocess warnings on windows

Unit testing on windows with multiprocessing yields a warning

ResourceWarning: unclosed file <_io.BufferedReader name=4>

which is due to how mlens.parallel.ParallelProcessing.close() handles cache destruction.

Find the cause of the warning and amend. If due to nosetests, capture warning when running unit tests.

Multithreading issue - IndexError

here is my code:

from sklearn.linear_model import LinearRegression

class MyClass(LinearRegression):

    def __init__(self, **kwargs):
        super(MyClass, self).__init__(**kwargs)

    def fit(self, X, y):
        """Fit estimator."""
        super(MyClass, self).fit(X, y)
        return self

    def predict(self, X):
        """Generate partition"""
        p = super(MyClass, self).predict(X)
        return 1 * (p > p.mean())

def build_ensemble(incl_meta, propagate_features=None):
    """Return an ensemble."""
    if propagate_features:
        n = len(propagate_features)
        propagate_features_1 = propagate_features
        propagate_features_2 = [i for i in range(n)]
    else:
        propagate_features_1 = propagate_features_2 = None

    estimators = [RandomForestRegressor(random_state=seed, n_jobs=6), RandomForestRegressor(n_jobs=5)]

    ensemble = SuperLearner()
    ensemble.add(estimators, propagate_features=propagate_features_1)
    ensemble.add(estimators, propagate_features=propagate_features_2)

    if incl_meta:
        ensemble.add_meta(MyClass())
    return ensemble

base = build_ensemble(False,[1, 3])
base.fit(X, y)
pred = base.predict(X)[:5]
print("Input to meta learner :\n %r" % pred)

And here is the error output, I think the reason for the error is n_jobs selection. Any thoughts about it?

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/_parallel_backends.py in __call__(self, *args, **kwargs)
    349         try:
--> 350             return self.func(*args, **kwargs)
    351         except KeyboardInterrupt:

~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in __call__(self)
    134     def __call__(self):
--> 135         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    136 

~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in <listcomp>(.0)
    134     def __call__(self):
--> 135         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    136 

~/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in __call__(self)
    124         """Launch job"""
--> 125         return getattr(self, self.job)()
    126 

~/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in fit(self, path)
    133 
--> 134         self._fit(transformers)
    135 

~/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in _fit(self, transformers)
    179         # Fit estimator
--> 180         self.estimator.fit(xtemp, ytemp)
    181         self.fit_time_ = time() - t0

~/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py in fit(self, X, y, sample_weight)
    315                 tree = self._make_estimator(append=False,
--> 316                                             random_state=random_state)
    317                 trees.append(tree)

~/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/base.py in _make_estimator(self, append, random_state)
    126         estimator.set_params(**dict((p, getattr(self, p))
--> 127                                     for p in self.estimator_params))
    128 

~/anaconda3/lib/python3.6/site-packages/sklearn/base.py in set_params(self, **params)
    264             return self
--> 265         valid_params = self.get_params(deep=True)
    266 

~/anaconda3/lib/python3.6/site-packages/sklearn/base.py in get_params(self, deep)
    240             finally:
--> 241                 warnings.filters.pop(0)
    242 

IndexError: pop from empty list

During handling of the above exception, another exception occurred:

TransportableException                    Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in retrieve(self)
    702                 if getattr(self._backend, 'supports_timeout', False):
--> 703                     self._output.extend(job.get(timeout=self.timeout))
    704                 else:

~/anaconda3/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    643         else:
--> 644             raise self._value
    645 

~/anaconda3/lib/python3.6/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
    118         try:
--> 119             result = (True, func(*args, **kwds))
    120         except Exception as e:

~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/_parallel_backends.py in __call__(self, *args, **kwargs)
    358             text = format_exc(e_type, e_value, e_tb, context=10, tb_offset=1)
--> 359             raise TransportableException(text, e_type)
    360 

TransportableException: TransportableException
___________________________________________________________________________
IndexError                                         Thu Jan 25 17:26:56 2018
PID: 3404                   Python 3.6.3: /home/pyybor/anaconda3/bin/python
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in __call__(self=<mlens.externals.joblib.parallel.BatchedCalls object>)
    130     def __init__(self, iterator_slice):
    131         self.items = list(iterator_slice)
    132         self._size = len(self.items)
    133 
    134     def __call__(self):
--> 135         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<mlens.parallel.learner.SubLearner object>, (), {})]
    136 
    137     def __len__(self):
    138         return self._size
    139 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    130     def __init__(self, iterator_slice):
    131         self.items = list(iterator_slice)
    132         self._size = len(self.items)
    133 
    134     def __call__(self):
--> 135         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <mlens.parallel.learner.SubLearner object>
        args = ()
        kwargs = {}
    136 
    137     def __len__(self):
    138         return self._size
    139 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in __call__(self=<mlens.parallel.learner.SubLearner object>)
    120         else:
    121             self.processing_index = ''
    122 
    123     def __call__(self):
    124         """Launch job"""
--> 125         return getattr(self, self.job)()
        self = <mlens.parallel.learner.SubLearner object>
        self.job = 'fit'
    126 
    127     def fit(self, path=None):
    128         """Fit sub-learner"""
    129         if not path:

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in fit(self=<mlens.parallel.learner.SubLearner object>, path=[])
    129         if not path:
    130             path = self.path
    131         t0 = time()
    132         transformers = self._load_preprocess(path)
    133 
--> 134         self._fit(transformers)
        self._fit = <bound method SubLearner._fit of <mlens.parallel.learner.SubLearner object>>
        transformers = None
    135 
    136         if self.out_array is not None:
    137             self._predict(transformers, self.scorer is not None)
    138 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in _fit(self=<mlens.parallel.learner.SubLearner object>, transformers=None)
    175         t0 = time()
    176         if transformers:
    177             xtemp, ytemp = transformers.transform(xtemp, ytemp)
    178 
    179         # Fit estimator
--> 180         self.estimator.fit(xtemp, ytemp)
        self.estimator.fit = <bound method BaseForest.fit of RandomForestRegr... random_state=2017, verbose=0, warm_start=False)>
        xtemp = array([[  1.00000000e+00,   3.63000000e+02,   2....04,   2.66834227e-08,
          5.41752459e+02]])
        ytemp = array([ -1.95638686e-04,  -3.79831391e-03,  -2.9...9152394e-05,   3.94220996e-05,  -8.21904840e-06])
    181         self.fit_time_ = time() - t0
    182 
    183     def _load_preprocess(self, path):
    184         """Load preprocessing pipeline"""

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py in fit(self=RandomForestRegressor(bootstrap=True, criterion=..., random_state=2017, verbose=0, warm_start=False), X=array([[  1.00000000e+00,   3.63000000e+02,   2....5e-08,
          5.41752441e+02]], dtype=float32), y=array([[ -1.95638686e-04],
       [ -3.79831391e...  [  3.94220996e-05],
       [ -8.21904840e-06]]), sample_weight=None)
    311                 random_state.randint(MAX_INT, size=len(self.estimators_))
    312 
    313             trees = []
    314             for i in range(n_more_estimators):
    315                 tree = self._make_estimator(append=False,
--> 316                                             random_state=random_state)
        random_state = <mtrand.RandomState object>
    317                 trees.append(tree)
    318 
    319             # Parallel loop: we use the threading backend as the Cython code
    320             # for fitting the trees is internally releasing the Python GIL

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/base.py in _make_estimator(self=RandomForestRegressor(bootstrap=True, criterion=..., random_state=2017, verbose=0, warm_start=False), append=False, random_state=<mtrand.RandomState object>)
    122         Warning: This method should be used to properly instantiate new
    123         sub-estimators.
    124         """
    125         estimator = clone(self.base_estimator_)
    126         estimator.set_params(**dict((p, getattr(self, p))
--> 127                                     for p in self.estimator_params))
        self.estimator_params = ('criterion', 'max_depth', 'min_samples_split', 'min_samples_leaf', 'min_weight_fraction_leaf', 'max_features', 'max_leaf_nodes', 'min_impurity_decrease', 'min_impurity_split', 'random_state')
    128 
    129         if random_state is not None:
    130             _set_random_states(estimator, random_state)
    131 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/base.py in set_params(self=DecisionTreeRegressor(criterion='mse', max_depth...resort=False, random_state=None, splitter='best'), **params={'criterion': 'mse', 'max_depth': None, 'max_features': 'auto', 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_impurity_split': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'random_state': 2017})
    260         self
    261         """
    262         if not params:
    263             # Simple optimization to gain speed (inspect is slow)
    264             return self
--> 265         valid_params = self.get_params(deep=True)
        valid_params = undefined
        self.get_params = <bound method BaseEstimator.get_params of Decisi...esort=False, random_state=None, splitter='best')>
    266 
    267         nested_params = defaultdict(dict)  # grouped by prefix
    268         for key, value in params.items():
    269             key, delim, sub_key = key.partition('__')

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/base.py in get_params(self=DecisionTreeRegressor(criterion='mse', max_depth...resort=False, random_state=None, splitter='best'), deep=True)
    236                     value = getattr(self, key, None)
    237                 if len(w) and w[0].category == DeprecationWarning:
    238                     # if the parameter is deprecated, don't show it
    239                     continue
    240             finally:
--> 241                 warnings.filters.pop(0)
    242 
    243             # XXX: should we rather test if instance of estimator?
    244             if deep and hasattr(value, 'get_params'):
    245                 deep_items = value.get_params().items()

IndexError: pop from empty list
___________________________________________________________________________

During handling of the above exception, another exception occurred:

JoblibIndexError                          Traceback (most recent call last)
<ipython-input-49-f8f29c8588e9> in <module>()
----> 1 score_no_prep = evaluate_ensemble(None)
      2 score_prep = evaluate_ensemble([1,3])
      3 print("Test set score no feature propagation  : %.3f" % score_no_prep)
      4 print("Test set score with feature propagation: %.3f" % score_prep)

<ipython-input-46-a9bf13defd9a> in evaluate_ensemble(propagate_features)
      2     """Wrapper for ensemble evaluation."""
      3     ens = build_ensemble(True, propagate_features)
----> 4     ens.fit(X.iloc[:75].values, Y.iloc[:75].values)
      5     pred = ens.predict(X.iloc[75:2000].values)
      6     #print(pred[:5])

~/anaconda3/lib/python3.6/site-packages/mlens/ensemble/base.py in fit(self, X, y, **kwargs)
    514             self._id_train.fit(X)
    515 
--> 516         out = self._backend.fit(X, y, **kwargs)
    517         if out is not self._backend:
    518             # fit_transform

~/anaconda3/lib/python3.6/site-packages/mlens/ensemble/base.py in fit(self, X, y, **kwargs)
    156         with ParallelProcessing(self.backend, self.n_jobs,
    157                                 max(self.verbose - 4, 0)) as manager:
--> 158             out = manager.stack(self, 'fit', X, y, **kwargs)
    159 
    160         if self.verbose:

~/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in stack(self, caller, job, X, y, path, return_preds, wart_start, split, **kwargs)
    653             job=job, X=X, y=y, path=path, warm_start=wart_start,
    654             return_preds=return_preds, split=split, stack=True)
--> 655         return self.process(caller=caller, out=out, **kwargs)
    656 
    657     def process(self, caller, out, **kwargs):

~/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in process(self, caller, out, **kwargs)
    698                 self.job.clear()
    699 
--> 700                 self._partial_process(task, parallel, **kwargs)
    701 
    702                 if task.name in return_names:

~/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in _partial_process(self, task, parallel, **kwargs)
    719             self._gen_prediction_array(task, self.job.job, self.__threading__)
    720 
--> 721         task(self.job.args(**kwargs), parallel=parallel)
    722 
    723         if not task.__no_output__ and getattr(task, 'n_feature_prop', 0):

~/anaconda3/lib/python3.6/site-packages/mlens/parallel/layer.py in __call__(self, args, parallel)
    150 
    151         parallel(delayed(sublearner, not _threading)()
--> 152                  for learner in self.learners
    153                  for sublearner in learner(args, 'main'))
    154 

~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in __call__(self, iterable)
    791                 # consumption.
    792                 self._iterating = False
--> 793             self.retrieve()
    794             # Make sure that we get a last message telling us we are done
    795             elapsed_time = time.time() - self._start_time

~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in retrieve(self)
    742                     exception = exception_type(report)
    743 
--> 744                     raise exception
    745 
    746     def __call__(self, iterable):

JoblibIndexError: JoblibIndexError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/runpy.py in _run_module_as_main(mod_name='ipykernel_launcher', alter_argv=1)
    188         sys.exit(msg)
    189     main_globals = sys.modules["__main__"].__dict__
    190     if alter_argv:
    191         sys.argv[0] = mod_spec.origin
    192     return _run_code(code, main_globals, None,
--> 193                      "__main__", mod_spec)
        mod_spec = ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py')
    194 
    195 def run_module(mod_name, init_globals=None,
    196                run_name=None, alter_sys=False):
    197     """Execute a module's code without importing it

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/runpy.py in _run_code(code=<code object <module> at 0x7fde65662420, file "/...3.6/site-packages/ipykernel_launcher.py", line 5>, run_globals={'__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__cached__': '/home/pyybor/anaconda3/lib/python3.6/site-packages/__pycache__/ipykernel_launcher.cpython-36.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/home/pyybor.../python3.6/site-packages/ipykernel/kernelapp.py'>, ...}, init_globals=None, mod_name='__main__', mod_spec=ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), pkg_name='', script_name=None)
     80                        __cached__ = cached,
     81                        __doc__ = None,
     82                        __loader__ = loader,
     83                        __package__ = pkg_name,
     84                        __spec__ = mod_spec)
---> 85     exec(code, run_globals)
        code = <code object <module> at 0x7fde65662420, file "/...3.6/site-packages/ipykernel_launcher.py", line 5>
        run_globals = {'__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__cached__': '/home/pyybor/anaconda3/lib/python3.6/site-packages/__pycache__/ipykernel_launcher.cpython-36.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/home/pyybor.../python3.6/site-packages/ipykernel/kernelapp.py'>, ...}
     86     return run_globals
     87 
     88 def _run_module_code(code, init_globals=None,
     89                     mod_name=None, mod_spec=None,

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py in <module>()
     11     # This is added back by InteractiveShellApp.init_path()
     12     if sys.path[0] == '':
     13         del sys.path[0]
     14 
     15     from ipykernel import kernelapp as app
---> 16     app.launch_new_instance()

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    653 
    654         If a global instance already exists, this reinitializes and starts it
    655         """
    656         app = cls.instance(**kwargs)
    657         app.initialize(argv)
--> 658         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    659 
    660 #-----------------------------------------------------------------------------
    661 # utility functions, for convenience
    662 #-----------------------------------------------------------------------------

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    472             return self.subapp.start()
    473         if self.poller is not None:
    474             self.poller.start()
    475         self.kernel.start()
    476         try:
--> 477             ioloop.IOLoop.instance().start()
    478         except KeyboardInterrupt:
    479             pass
    480 
    481 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    172             )
    173         return loop
    174     
    175     def start(self):
    176         try:
--> 177             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    178         except ZMQError as e:
    179             if e.errno == ETERM:
    180                 # quietly return on ETERM
    181                 pass

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    883                 self._events.update(event_pairs)
    884                 while self._events:
    885                     fd, events = self._events.popitem()
    886                     try:
    887                         fd_obj, handler_func = self._handlers[fd]
--> 888                         handler_func(fd_obj, events)
        handler_func = <function wrap.<locals>.null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    889                     except (OSError, IOError) as e:
    890                         if errno_from_exception(e) == errno.EPIPE:
    891                             # Happens when the client closes the connection
    892                             pass

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    435             # dispatch events:
    436             if events & IOLoop.ERROR:
    437                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    438                 return
    439             if events & IOLoop.READ:
--> 440                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    441                 if not self.socket:
    442                     return
    443             if events & IOLoop.WRITE:
    444                 self._handle_send()

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    467                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    468         else:
    469             if self._recv_callback:
    470                 callback = self._recv_callback
    471                 # self._recv_callback = None
--> 472                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function wrap.<locals>.null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    473                 
    474         # self.update_state()
    475         
    476 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function wrap.<locals>.null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    409         close our socket."""
    410         try:
    411             # Use a NullContext to ensure that all StackContexts are run
    412             # inside our blanket exception handler rather than outside.
    413             with stack_context.NullContext():
--> 414                 callback(*args, **kwargs)
        callback = <function wrap.<locals>.null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    415         except:
    416             gen_log.error("Uncaught exception, closing connection.",
    417                           exc_info=True)
    418             # Close the socket on an uncaught exception from a user callback

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    278         if self.control_stream:
    279             self.control_stream.on_recv(self.dispatch_control, copy=False)
    280 
    281         def make_dispatcher(stream):
    282             def dispatcher(msg):
--> 283                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    284             return dispatcher
    285 
    286         for s in self.shell_streams:
    287             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {'allow_stdin': True, 'code': 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2018, 1, 25, 17, 26, 56, 820132, tzinfo=tzlocal()), 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'session': 'B2C82C6B86CB489C834937CEBD68684B', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'parent_header': {}})
    230             self.log.warn("Unknown message type: %r", msg_type)
    231         else:
    232             self.log.debug("%s: %s", msg_type, msg)
    233             self.pre_handler_hook()
    234             try:
--> 235                 handler(stream, idents, msg)
        handler = <bound method Kernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = [b'B2C82C6B86CB489C834937CEBD68684B']
        msg = {'buffers': [], 'content': {'allow_stdin': True, 'code': 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2018, 1, 25, 17, 26, 56, 820132, tzinfo=tzlocal()), 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'session': 'B2C82C6B86CB489C834937CEBD68684B', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'parent_header': {}}
    236             except Exception:
    237                 self.log.error("Exception in message handler:", exc_info=True)
    238             finally:
    239                 self.post_handler_hook()

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=[b'B2C82C6B86CB489C834937CEBD68684B'], parent={'buffers': [], 'content': {'allow_stdin': True, 'code': 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2018, 1, 25, 17, 26, 56, 820132, tzinfo=tzlocal()), 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'session': 'B2C82C6B86CB489C834937CEBD68684B', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'parent_header': {}})
    394         if not silent:
    395             self.execution_count += 1
    396             self._publish_execute_input(code, parent, self.execution_count)
    397 
    398         reply_content = self.do_execute(code, silent, store_history,
--> 399                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    400 
    401         # Flush output before sending the reply.
    402         sys.stdout.flush()
    403         sys.stderr.flush()

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code='score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    191 
    192         self._forward_input(allow_stdin)
    193 
    194         reply_content = {}
    195         try:
--> 196             res = shell.run_cell(code, store_history=store_history, silent=silent)
        res = undefined
        shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)'
        store_history = True
        silent = False
    197         finally:
    198             self._restore_input()
    199 
    200         if res.error_before_exec is not None:

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=('score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)',), **kwargs={'silent': False, 'store_history': True})
    528             )
    529         self.payload_manager.write_payload(payload)
    530 
    531     def run_cell(self, *args, **kwargs):
    532         self._last_traceback = None
--> 533         return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
        self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        args = ('score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)',)
        kwargs = {'silent': False, 'store_history': True}
    534 
    535     def _showtraceback(self, etype, evalue, stb):
    536         # try to preserve ordering of tracebacks and print statements
    537         sys.stdout.flush()

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell='score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', store_history=True, silent=False, shell_futures=True)
   2693                 self.displayhook.exec_result = result
   2694 
   2695                 # Execute the user code
   2696                 interactivity = "none" if silent else self.ast_node_interactivity
   2697                 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2698                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler object>
   2699                 
   2700                 self.last_execution_succeeded = not has_raised
   2701 
   2702                 # Reset this so later displayed values do not modify the

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Assign object>, <_ast.Assign object>, <_ast.Expr object>, <_ast.Expr object>], cell_name='<ipython-input-49-f8f29c8588e9>', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler object>, result=<ExecutionResult object at 7fde107b1d30, executi..._before_exec=None error_in_exec=None result=None>)
   2797 
   2798         try:
   2799             for i, node in enumerate(to_run_exec):
   2800                 mod = ast.Module([node])
   2801                 code = compiler(mod, cell_name, "exec")
-> 2802                 if self.run_code(code, result):
        self.run_code = <bound method InteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 0x7fde1829bf60, file "<ipython-input-49-f8f29c8588e9>", line 1>
        result = <ExecutionResult object at 7fde107b1d30, executi..._before_exec=None error_in_exec=None result=None>
   2803                     return True
   2804 
   2805             for i, node in enumerate(to_run_interactive):
   2806                 mod = ast.Interactive([node])

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x7fde1829bf60, file "<ipython-input-49-f8f29c8588e9>", line 1>, result=<ExecutionResult object at 7fde107b1d30, executi..._before_exec=None error_in_exec=None result=None>)
   2857         outflag = True  # happens in more places, so it's easier as default
   2858         try:
   2859             try:
   2860                 self.hooks.pre_run_code_hook()
   2861                 #rprint('Running code', repr(code_obj)) # dbg
-> 2862                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 0x7fde1829bf60, file "<ipython-input-49-f8f29c8588e9>", line 1>
        self.user_global_ns = {'DataFrame': <class 'pandas.core.frame.DataFrame'>, 'In': ['', "import numpy as np\nimport pandas as pd\ndf = pd.r...ay'],1)\nX_test = df_test.drop(['Day'],1)\nY = df.y", 'from sklearn.linear_model import LinearRegressio...elf).predict(X)\n        return 1 * (p > p.mean())', 'import numpy as np\nfrom pandas import DataFrame\n...mport load_iris\n\nseed = 2017\nnp.random.seed(seed)', 'from mlens.ensemble import SuperLearner\nfrom skl... ensemble.add_meta(MyClass())\n    return ensemble', 'import sklearn as sk\nsk.__version__', "get_ipython().system('pip install -U sklearn')", 'base = build_ensemble(False,[1, 3])\nbase.fit(X, ...[:5]\nprint("Input to meta learner :\\n %r" % pred)', 'def evaluate_ensemble(propagate_features):\n    "...(mean_squared_error(pred, Y.iloc[75:200].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n    "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n    "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n    "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n    "...mean_squared_error(pred, Y.iloc[75:2000].values))', ...], 'LinearRegression': <class 'sklearn.linear_model.base.LinearRegression'>, 'MyClass': <class '__main__.MyClass'>, 'Out': {5: '0.19.1', 22: '0.19.1', 30: '0.19.1', 37: '0.19.1', 40: '0.19.1', 43: '0.19.1'}, 'RandomForestRegressor': <class 'sklearn.ensemble.forest.RandomForestRegressor'>, 'SVC': <class 'sklearn.svm.classes.SVC'>, 'SuperLearner': <class 'mlens.ensemble.super_learner.SuperLearner'>, 'X':         Market  Stock        x0        x1       ...24e-05   100.000000  

[623817 rows x 13 columns], 'X_test':         Market  Stock        x0        x1       ...603e-05  104.118389  

[640430 rows x 13 columns], ...}
        self.user_ns = {'DataFrame': <class 'pandas.core.frame.DataFrame'>, 'In': ['', "import numpy as np\nimport pandas as pd\ndf = pd.r...ay'],1)\nX_test = df_test.drop(['Day'],1)\nY = df.y", 'from sklearn.linear_model import LinearRegressio...elf).predict(X)\n        return 1 * (p > p.mean())', 'import numpy as np\nfrom pandas import DataFrame\n...mport load_iris\n\nseed = 2017\nnp.random.seed(seed)', 'from mlens.ensemble import SuperLearner\nfrom skl... ensemble.add_meta(MyClass())\n    return ensemble', 'import sklearn as sk\nsk.__version__', "get_ipython().system('pip install -U sklearn')", 'base = build_ensemble(False,[1, 3])\nbase.fit(X, ...[:5]\nprint("Input to meta learner :\\n %r" % pred)', 'def evaluate_ensemble(propagate_features):\n    "...(mean_squared_error(pred, Y.iloc[75:200].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n    "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n    "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n    "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n    "...mean_squared_error(pred, Y.iloc[75:2000].values))', ...], 'LinearRegression': <class 'sklearn.linear_model.base.LinearRegression'>, 'MyClass': <class '__main__.MyClass'>, 'Out': {5: '0.19.1', 22: '0.19.1', 30: '0.19.1', 37: '0.19.1', 40: '0.19.1', 43: '0.19.1'}, 'RandomForestRegressor': <class 'sklearn.ensemble.forest.RandomForestRegressor'>, 'SVC': <class 'sklearn.svm.classes.SVC'>, 'SuperLearner': <class 'mlens.ensemble.super_learner.SuperLearner'>, 'X':         Market  Stock        x0        x1       ...24e-05   100.000000  

[623817 rows x 13 columns], 'X_test':         Market  Stock        x0        x1       ...603e-05  104.118389  

[640430 rows x 13 columns], ...}
   2863             finally:
   2864                 # Reset our crash handler in place
   2865                 sys.excepthook = old_excepthook
   2866         except SystemExit as e:

...........................................................................
/home/pyybor/g_ch/<ipython-input-49-f8f29c8588e9> in <module>()
----> 1 score_no_prep = evaluate_ensemble(None)
      2 score_prep = evaluate_ensemble([1,3])
      3 print("Test set score no feature propagation  : %.3f" % score_no_prep)
      4 print("Test set score with feature propagation: %.3f" % score_prep)

...........................................................................
/home/pyybor/g_ch/<ipython-input-46-a9bf13defd9a> in evaluate_ensemble(propagate_features=None)
      1 def evaluate_ensemble(propagate_features):
      2     """Wrapper for ensemble evaluation."""
      3     ens = build_ensemble(True, propagate_features)
----> 4     ens.fit(X.iloc[:75].values, Y.iloc[:75].values)
      5     pred = ens.predict(X.iloc[75:2000].values)
      6     #print(pred[:5])
      7     return np.sqrt(mean_squared_error(pred, Y.iloc[75:2000].values))
      8 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/ensemble/base.py in fit(self=SuperLearner(array_check=2, backend=None, folds=...scorer=None, shuffle=False,
       verbose=False), X=array([[  1.00000000e+00,   3.63000000e+02,   2....04,   2.66834227e-08,
          5.41752459e+02]]), y=array([ -1.95638686e-04,  -3.79831391e-03,  -2.9...9152394e-05,   3.94220996e-05,  -8.21904840e-06]), **kwargs={})
    511         X, y = check_inputs(X, y, self.array_check)
    512 
    513         if self.model_selection:
    514             self._id_train.fit(X)
    515 
--> 516         out = self._backend.fit(X, y, **kwargs)
        out = undefined
        self._backend.fit = <bound method Sequential.fit of Sequential(backe...rmers=[])],
   verbose=0)],
      verbose=False)>
        X = array([[  1.00000000e+00,   3.63000000e+02,   2....04,   2.66834227e-08,
          5.41752459e+02]])
        y = array([ -1.95638686e-04,  -3.79831391e-03,  -2.9...9152394e-05,   3.94220996e-05,  -8.21904840e-06])
        kwargs = {}
    517         if out is not self._backend:
    518             # fit_transform
    519             return out
    520         else:

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/ensemble/base.py in fit(self=Sequential(backend='threading', dtype=<class 'nu...ormers=[])],
   verbose=0)],
      verbose=False), X=array([[  1.00000000e+00,   3.63000000e+02,   2....04,   2.66834227e-08,
          5.41752459e+02]]), y=array([ -1.95638686e-04,  -3.79831391e-03,  -2.9...9152394e-05,   3.94220996e-05,  -8.21904840e-06]), **kwargs={})
    153 
    154         f, t0 = print_job(self, "Fitting")
    155 
    156         with ParallelProcessing(self.backend, self.n_jobs,
    157                                 max(self.verbose - 4, 0)) as manager:
--> 158             out = manager.stack(self, 'fit', X, y, **kwargs)
        out = undefined
        manager.stack = <bound method ParallelProcessing.stack of <mlens.parallel.backend.ParallelProcessing object>>
        self = Sequential(backend='threading', dtype=<class 'nu...ormers=[])],
   verbose=0)],
      verbose=False)
        X = array([[  1.00000000e+00,   3.63000000e+02,   2....04,   2.66834227e-08,
          5.41752459e+02]])
        y = array([ -1.95638686e-04,  -3.79831391e-03,  -2.9...9152394e-05,   3.94220996e-05,  -8.21904840e-06])
        kwargs = {}
    159 
    160         if self.verbose:
    161             print_time(t0, "{:<35}".format("Fit complete"), file=f)
    162 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in stack(self=<mlens.parallel.backend.ParallelProcessing object>, caller=Sequential(backend='threading', dtype=<class 'nu...ormers=[])],
   verbose=0)],
      verbose=False), job='fit', X=array([[  1.00000000e+00,   3.63000000e+02,   2....04,   2.66834227e-08,
          5.41752459e+02]]), y=array([ -1.95638686e-04,  -3.79831391e-03,  -2.9...9152394e-05,   3.94220996e-05,  -8.21904840e-06]), path=None, return_preds=False, wart_start=False, split=True, **kwargs={})
    650             Prediction array(s).
    651         """
    652         out = self.initialize(
    653             job=job, X=X, y=y, path=path, warm_start=wart_start,
    654             return_preds=return_preds, split=split, stack=True)
--> 655         return self.process(caller=caller, out=out, **kwargs)
        self.process = <bound method ParallelProcessing.process of <mlens.parallel.backend.ParallelProcessing object>>
        caller = Sequential(backend='threading', dtype=<class 'nu...ormers=[])],
   verbose=0)],
      verbose=False)
        out = {}
        kwargs = {}
    656 
    657     def process(self, caller, out, **kwargs):
    658         """Process job.
    659 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in process(self=<mlens.parallel.backend.ParallelProcessing object>, caller=Sequential(backend='threading', dtype=<class 'nu...ormers=[])],
   verbose=0)],
      verbose=False), out=None, **kwargs={})
    695                       backend=self.backend) as parallel:
    696 
    697             for task in caller:
    698                 self.job.clear()
    699 
--> 700                 self._partial_process(task, parallel, **kwargs)
        self._partial_process = <bound method ParallelProcessing._partial_proces...lens.parallel.backend.ParallelProcessing object>>
        task = Layer(backend='threading', dtype=<class 'numpy.f..._exception=True, transformers=[])],
   verbose=0)
        parallel = Parallel(n_jobs=-1)
        kwargs = {}
    701 
    702                 if task.name in return_names:
    703                     out.append(self.get_preds(dtype=_dtype(task)))
    704 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in _partial_process(self=<mlens.parallel.backend.ParallelProcessing object>, task=Layer(backend='threading', dtype=<class 'numpy.f..._exception=True, transformers=[])],
   verbose=0), parallel=Parallel(n_jobs=-1), **kwargs={})
    716         task.setup(self.job.predict_in, self.job.y, self.job.job)
    717 
    718         if not task.__no_output__:
    719             self._gen_prediction_array(task, self.job.job, self.__threading__)
    720 
--> 721         task(self.job.args(**kwargs), parallel=parallel)
        task = Layer(backend='threading', dtype=<class 'numpy.f..._exception=True, transformers=[])],
   verbose=0)
        self.job.args = <bound method Job.args of <mlens.parallel.backend.Job object>>
        kwargs = {}
        parallel = Parallel(n_jobs=-1)
    722 
    723         if not task.__no_output__ and getattr(task, 'n_feature_prop', 0):
    724             self._propagate_features(task)
    725 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/layer.py in __call__(self=Layer(backend='threading', dtype=<class 'numpy.f..._exception=True, transformers=[])],
   verbose=0), args={'auxiliary': {'P': None, 'X': array([[  1.00000000e+00,   3.63000000e+02,   2....04,   2.66834227e-08,
          5.41752459e+02]]), 'y': array([ -1.95638686e-04,  -3.79831391e-03,  -2.9...9152394e-05,   3.94220996e-05,  -8.21904840e-06])}, 'dir': [('randomforestregressor-2.0.0', <mlens.parallel.learner.IndexedEstimator object>), ('randomforestregressor-2.0.2', <mlens.parallel.learner.IndexedEstimator object>), ('randomforestregressor-1.0.2', <mlens.parallel.learner.IndexedEstimator object>)], 'job': 'fit', 'main': {'P': array([[ -6.91946599e-36,   6.62666976e-01],
   ....72381141e-05,   6.66164560e-05]], dtype=float32), 'X': array([[  1.00000000e+00,   3.63000000e+02,   2....04,   2.66834227e-08,
          5.41752459e+02]]), 'y': array([ -1.95638686e-04,  -3.79831391e-03,  -2.9...9152394e-05,   3.94220996e-05,  -8.21904840e-06])}}, parallel=Parallel(n_jobs=-1))
    147         if self.verbose >= 2:
    148             safe_print(msg.format('Learners ...'), file=f, end=e2)
    149             t1 = time()
    150 
    151         parallel(delayed(sublearner, not _threading)()
--> 152                  for learner in self.learners
        self.learners = [Learner(attr='predict', backend='threading', dty...=False,
    raise_on_exception=True, scorer=None), Learner(attr='predict', backend='threading', dty...=False,
    raise_on_exception=True, scorer=None)]
    153                  for sublearner in learner(args, 'main'))
    154 
    155         if self.verbose >= 2:
    156             print_time(t1, 'done', file=f)

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=-1), iterable=<generator object Layer.__call__.<locals>.<genexpr>>)
    788             if pre_dispatch == "all" or n_jobs == 1:
    789                 # The iterable was consumed all at once by the above for loop.
    790                 # No need to wait for async callbacks to trigger to
    791                 # consumption.
    792                 self._iterating = False
--> 793             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
    794             # Make sure that we get a last message telling us we are done
    795             elapsed_time = time.time() - self._start_time
    796             self._print('Done %3i out of %3i | elapsed: %s finished',
    797                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
IndexError                                         Thu Jan 25 17:26:56 2018
PID: 3404                   Python 3.6.3: /home/pyybor/anaconda3/bin/python
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in __call__(self=<mlens.externals.joblib.parallel.BatchedCalls object>)
    130     def __init__(self, iterator_slice):
    131         self.items = list(iterator_slice)
    132         self._size = len(self.items)
    133 
    134     def __call__(self):
--> 135         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<mlens.parallel.learner.SubLearner object>, (), {})]
    136 
    137     def __len__(self):
    138         return self._size
    139 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    130     def __init__(self, iterator_slice):
    131         self.items = list(iterator_slice)
    132         self._size = len(self.items)
    133 
    134     def __call__(self):
--> 135         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <mlens.parallel.learner.SubLearner object>
        args = ()
        kwargs = {}
    136 
    137     def __len__(self):
    138         return self._size
    139 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in __call__(self=<mlens.parallel.learner.SubLearner object>)
    120         else:
    121             self.processing_index = ''
    122 
    123     def __call__(self):
    124         """Launch job"""
--> 125         return getattr(self, self.job)()
        self = <mlens.parallel.learner.SubLearner object>
        self.job = 'fit'
    126 
    127     def fit(self, path=None):
    128         """Fit sub-learner"""
    129         if not path:

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in fit(self=<mlens.parallel.learner.SubLearner object>, path=[])
    129         if not path:
    130             path = self.path
    131         t0 = time()
    132         transformers = self._load_preprocess(path)
    133 
--> 134         self._fit(transformers)
        self._fit = <bound method SubLearner._fit of <mlens.parallel.learner.SubLearner object>>
        transformers = None
    135 
    136         if self.out_array is not None:
    137             self._predict(transformers, self.scorer is not None)
    138 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in _fit(self=<mlens.parallel.learner.SubLearner object>, transformers=None)
    175         t0 = time()
    176         if transformers:
    177             xtemp, ytemp = transformers.transform(xtemp, ytemp)
    178 
    179         # Fit estimator
--> 180         self.estimator.fit(xtemp, ytemp)
        self.estimator.fit = <bound method BaseForest.fit of RandomForestRegr... random_state=2017, verbose=0, warm_start=False)>
        xtemp = array([[  1.00000000e+00,   3.63000000e+02,   2....04,   2.66834227e-08,
          5.41752459e+02]])
        ytemp = array([ -1.95638686e-04,  -3.79831391e-03,  -2.9...9152394e-05,   3.94220996e-05,  -8.21904840e-06])
    181         self.fit_time_ = time() - t0
    182 
    183     def _load_preprocess(self, path):
    184         """Load preprocessing pipeline"""

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py in fit(self=RandomForestRegressor(bootstrap=True, criterion=..., random_state=2017, verbose=0, warm_start=False), X=array([[  1.00000000e+00,   3.63000000e+02,   2....5e-08,
          5.41752441e+02]], dtype=float32), y=array([[ -1.95638686e-04],
       [ -3.79831391e...  [  3.94220996e-05],
       [ -8.21904840e-06]]), sample_weight=None)
    311                 random_state.randint(MAX_INT, size=len(self.estimators_))
    312 
    313             trees = []
    314             for i in range(n_more_estimators):
    315                 tree = self._make_estimator(append=False,
--> 316                                             random_state=random_state)
        random_state = <mtrand.RandomState object>
    317                 trees.append(tree)
    318 
    319             # Parallel loop: we use the threading backend as the Cython code
    320             # for fitting the trees is internally releasing the Python GIL

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/base.py in _make_estimator(self=RandomForestRegressor(bootstrap=True, criterion=..., random_state=2017, verbose=0, warm_start=False), append=False, random_state=<mtrand.RandomState object>)
    122         Warning: This method should be used to properly instantiate new
    123         sub-estimators.
    124         """
    125         estimator = clone(self.base_estimator_)
    126         estimator.set_params(**dict((p, getattr(self, p))
--> 127                                     for p in self.estimator_params))
        self.estimator_params = ('criterion', 'max_depth', 'min_samples_split', 'min_samples_leaf', 'min_weight_fraction_leaf', 'max_features', 'max_leaf_nodes', 'min_impurity_decrease', 'min_impurity_split', 'random_state')
    128 
    129         if random_state is not None:
    130             _set_random_states(estimator, random_state)
    131 

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/base.py in set_params(self=DecisionTreeRegressor(criterion='mse', max_depth...resort=False, random_state=None, splitter='best'), **params={'criterion': 'mse', 'max_depth': None, 'max_features': 'auto', 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_impurity_split': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'random_state': 2017})
    260         self
    261         """
    262         if not params:
    263             # Simple optimization to gain speed (inspect is slow)
    264             return self
--> 265         valid_params = self.get_params(deep=True)
        valid_params = undefined
        self.get_params = <bound method BaseEstimator.get_params of Decisi...esort=False, random_state=None, splitter='best')>
    266 
    267         nested_params = defaultdict(dict)  # grouped by prefix
    268         for key, value in params.items():
    269             key, delim, sub_key = key.partition('__')

...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/base.py in get_params(self=DecisionTreeRegressor(criterion='mse', max_depth...resort=False, random_state=None, splitter='best'), deep=True)
    236                     value = getattr(self, key, None)
    237                 if len(w) and w[0].category == DeprecationWarning:
    238                     # if the parameter is deprecated, don't show it
    239                     continue
    240             finally:
--> 241                 warnings.filters.pop(0)
    242 
    243             # XXX: should we rather test if instance of estimator?
    244             if deep and hasattr(value, 'get_params'):
    245                 deep_items = value.get_params().items()

IndexError: pop from empty list
___________________________________________________________________________


Map different preprocess to estimators

As mentioned in the doc, we can map different preprocess to different estimators as

preprocessing_cases = {"case-1": [trans_1, trans_2],
                                          "case-2": [alt_trans_1, alt_trans_2]}

estimators = {"case-1": [est_a, est_b],
                       "case-2": [est_c, est_d]}

Doing this, do we use trans_1 for est_a and trans_2 for est_b? Then if I use

preprocessing_cases = {"case-1": [trans_1],
                                          "case-2": [alt_trans_1]}

estimators = {"case-1": [est_a, est_b],
                       "case-2": [est_c, est_d]}

Does this mean I will use trans_1 for both est_a and est_b while alt_trans_1 for both est_c and est_d?

If the first usage can already handle the mapping between preprocess to estimators in the list, why do we need the dictionary?

Add logger

Switch from print messages to logger for greater control.

Documentation

Need:

  1. A section on the use of joblib

  2. An ensemble primer

  3. Benchmarks

  4. A more full tutorial

mapping predict_proba columns to class labels

When building a model (e.g. SuperLearner) with proba=True on the meta estimator, how can I access the class labels for the model.predict() output columns.

sklearn classifiers generally expose a classes_ property with the class labels, however this doesn't appear to be available on the mlens ensemble classes.

What is the preferred way to map the prediction output columns to class labels?

Doesn't pickle scorers

Does not seem to allow pickling a scorer created manually by the user through the make_scorer interface. Issue with the underlying callable function.

Does it support Dataframe as input?

The estimator I am trying to fit accepts a pandas data frame as input in the fit method, using the column labels, however when using the SuperLearner, the data is converted to a numpy.ndarray when passing to the estimator's fit method, is there a way to preserve the column label data?

Errors in documentation

I am running the Anaconda release of python 3.6 under Windows 10. Anaconda does not yet have an mlens package. The installation instructions given on the ml-ens website (http://ml-ensemble.com/) suggest the command to install mlens is
pip install -U mlens
That causes Anaconda to issue several error messages and installation fails.
The links from ml-ensemble.com to MIT license and to Installation Details both result in 404 errors.

The command
pip install mlens
does sucessfully install mlens in the Anaconda environment.

Evaluator summary DataFrame ordering jumps

Apparently, the ordering of the metrics of the summary attribute on an Evaluator seem to be jumping around when put into a DataFrame. Likely due to some dictionary not being ordered.

Subsemble NotFittedError

I'm trying the Subsemble implementation but always get:

Fitting 1 layers
Fit complete                        | 00:00:04

Predicting 1 layers
---------------------------------------------------------------------------
NotFittedError                            Traceback (most recent call last)

[...]

NotFittedError: Layer instance (layer-1) not fitted.


I'm currently on mlens 0.2.1 and I am testing it with the example in the docs (see below) and I still get the same issue. Any ideas?

class SimplePartitioner():
    def __init__(self):
        pass

    def our_custom_function(self, X, y=None):
        """Split the data in half based on the sum of features"""
        # Labels should be numerical
        return 1 * (X.sum(axis=1) > X.sum(axis=1).mean())

# Note that the number of partitions the estimator creates *must* match the
# ``partitions`` argument passed to the Subsemble.
# The ``folds`` option is completely independent.

sub = Subsemble(partitions=2, folds=3, verbose=1)
sub.add([SVC(), LogisticRegression()],
        partition_estimator=SimplePartitioner(),
        fit_estimator=False,
        attr="our_custom_function")

sub.fit(x_train, y_train)
sub.predict(x_test)

Kernel keep dying when running lightGBM

I ran the sample kernel script from kaggle: https://www.kaggle.com/flennerhag/ml-ensemble-scikit-learn-style-ensemble-learning

The script will run without issue. But, when I add a lightGBM to the base learner, the script will run for hours without finishing the calculation. I changed the nthread=1 (from -1), and set the Evaluator to backend='threading', njobs=1, and remove the xgboost from the base learner, the kernel will die whenever running the fit method.

Here is the parameter for the lightgbm:
lgb = LGBMRegressor(objective='regression', nthread=1,seed=SEED)
'lgb':
{'learning_rate': uniform(0.02,0.04),
'num_leaves': randint(50, 60),
'n_estimators': randint(150,200),
'min_child_weight': randint(30,60)}

setup for the Evaluator:

evl = Evaluator(scorer,
cv=2,
random_state=SEED,
verbose=5,
backend='threading',
n_jobs=1
)

Please let me know if I need to provide any more info.
best
Mike

Once the Evaluator has fitted, Can I export the best ensemble directly?

Hi, mlens is a great tool! It saves my life and I use it to do kaggle competition.

The evaluator is a nice tool for model selection of meta-layer, but, evaluating an ensemble network consumes a lot of time. Can I export the best ensemble directly once the evaluator has fitted?

Now my work flow using mlens is like:

  1. build the ensemble transformer
  2. build the evaluator for meta-layer selection
  3. fit the evaluator, and find the best meta-layer for the ensemble
  4. build the ensemble with best meta-layer, and fit the ensemble again.

Step 4) is a bit duplicated and has double time expenses.

Is there anyway to avoid step 4) ? Am I using mlens correctly ?

Check if estimators has a [n_jobs] or [nthread] setting and set to 1

If estimators in layers have n_jobs>1 of nthread>1, the joblib routine is likely to crash if the user have not set the 'forkserver' start method in the main fork.

Before estimation, could easily check what the multiprocess context is, and if not 'forkserver', change all n_jobs / nthread settings to 1.

Type Error When Running Example

Hi,

I'm trying to run the Getting Started example and am hitting the following error, code and trace below:

Code:

`import numpy as np
from pandas import DataFrame
from mlens.metrics import make_scorer
from sklearn.metrics import f1_score
from sklearn.datasets import load_iris

seed = 2017
np.random.seed(seed)

f1 = make_scorer(f1_score, average='micro', greater_is_better=True)

data = load_iris()
idx = np.random.permutation(150)
X = data.data[idx]
y = data.target[idx]

from mlens.ensemble import SuperLearner
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

--- Build ---

Passing a scorer will create cv scores during fitting

ensemble = SuperLearner(scorer=f1, random_state=seed)

Build the first layer

ensemble.add([RandomForestClassifier(random_state=seed), SVC()])

Attach the final meta estimator

ensemble.add_meta(LogisticRegression())

--- Use ---

Fit ensemble

ensemble.fit(X[:75], y[:75])

Predict

preds = ensemble.predict(X[75:])
`

Error:

TypeError Traceback (most recent call last)
in ()
18
19 # Fit ensemble
---> 20 ensemble.fit(X[:75], y[:75])
21
22 # Predict

C:\Anaconda2\lib\site-packages\mlens\ensemble\base.pyc in fit(self, X, y)
714 X, y = X[idx], y[idx]
715
--> 716 self.scores_ = self.layers.fit(X, y)
717
718 return self

C:\Anaconda2\lib\site-packages\mlens\ensemble\base.pyc in fit(self, X, y, return_preds, **process_kwargs)
232 # Fit ensemble
233 try:
--> 234 processor.process()
235
236 if self.verbose:

C:\Anaconda2\lib\site-packages\mlens\parallel\manager.pyc in process(self)
216
217 for n, lyr in enumerate(self.layers.layers.values()):
--> 218 self._partial_process(n, lyr, parallel)
219
220 self.fitted = 1

C:\Anaconda2\lib\site-packages\mlens\parallel\manager.pyc in _partial_process(self, n, lyr, parallel)
306 kwargs['P'] = self.job.P[n + 1]
307
--> 308 f(**kwargs)
309
310

C:\Anaconda2\lib\site-packages\mlens\parallel\estimation.pyc in fit(self, X, y, P, dir, parallel)
198 # Load instances from cache and store as layer attributes
199 # Typically, as layer.estimators_, layer.preprocessing_
--> 200 self._assemble(dir)
201
202 if self.verbose:

C:\Anaconda2\lib\site-packages\mlens\parallel\estimation.pyc in assemble(self, dir)
93
94 if self.scorer is not None and self.layer.cls is not 'full':
---> 95 self.layer.scores
= self._build_scores(s)
96
97 def _build_scores(self, s):

C:\Anaconda2\lib\site-packages\mlens\parallel\estimation.pyc in _build_scores(self, s)
126 # Aggregate to get cross-validated mean scores
127 for k, v in scores.items():
--> 128 scores[k] = (np.mean(v), np.std(v))
129
130 return scores

C:\Anaconda2\lib\site-packages\numpy\core\fromnumeric.pyc in mean(a, axis, dtype, out, keepdims)
2940
2941 return _methods._mean(a, axis=axis, dtype=dtype,
-> 2942 out=out, **kwargs)
2943
2944

C:\Anaconda2\lib\site-packages\numpy\core_methods.pyc in _mean(a, axis, dtype, out, keepdims)
63 dtype = mu.dtype('f8')
64
---> 65 ret = umr_sum(arr, axis, dtype, out, keepdims)
66 if isinstance(ret, mu.ndarray):
67 ret = um.true_divide(

TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'

Error when propagating features from sparse matrix

I'm trying to use mlens in a system I'm developing but, based on the documentation and the code, it's not really clear to me what propagate_features values I should use given my data. Could you offer a bit of additional explanation in the tutorial so I know what should go in?

Layer and estimator info in traceback

Include layer name, case name, estimator name in trace back if estimation fails during fitting.

Need to build some sort of try-except block that retains the traceback stack so that joblib can pick it up.

Does it support different (multi-view) data input?

In multi-view learning situation, we have multiple feature sets for one object. For instance, for images and videos, color information and texture information are two different kinds of features, which can be regarded as two-view data. So, can mlens accept the input of this different data? That is to say, we would like to use SVM for color feature and KNN for texture information.

[BUG] Printing evaluator results on WIN with Py 2.7 fails for ``str`` keys.

When running the start.py jupyter notebook, codeblock 11 fails with
`

ValueError Traceback (most recent call last)
in ()
----> 1 print(evaluator.results.keys)
2 print("Score comparison with best params founds:\n\n%r" % evaluator.results)

C:\Users\sguest\Anaconda2\lib\site-packages\mlens-0.2.1-py2.7.egg\mlens\metrics\utils.pyc in repr(self)
92
93 def repr(self):
---> 94 return assemble_table(self, self.padding, self.decimals)
95
96

C:\Users\sguest\Anaconda2\lib\site-packages\mlens-0.2.1-py2.7.egg\mlens\metrics\utils.pyc in assemble_table(data, padding, decimals)
138 continue
139
--> 140 v_ = len(get_string(v, decimals))
141 if v
> max_col_len[key]:
142 max_col_len[key] = v_

C:\Users\sguest\Anaconda2\lib\site-packages\mlens-0.2.1-py2.7.egg\mlens\metrics\utils.pyc in _get_string(obj, dec)
22 """Stringify object"""
23 try:
---> 24 return '{0:.{dec}f}'.format(obj, dec=dec)
25 except TypeError:
26 return obj.str()

ValueError: Unknown format code 'f' for object of type 'str'`

scoring function support

Does the scoring function support an extra vector as input
ie A custom scoring function that takes (predicted_Y, true_Y, weight)
which the weight is used to calculate the final float score

Supervised Subsemble partitioning

Allow the user to pass an estimator instead of and integer in n_partitions and use the estimator to predict class labels on the training set.

Final K-Fold Score

One thing I have been unable to figure out is how to get a k-fold cross validation score for the whole ensemble.

I have used sklearns built in cross_valid_score but this is very slow (I think because it ends up doing cvs whilst in another cv loop!).

How can I get a final k-fold cross validation score for the final ensemble please? (great package btw :) )

Ensemble learner through hill climbing

Suppose:

  1. We initialize an ensemble with some estimators having default settings.
  2. Fit and Predict. Compute the training error.
  3. Randomly pick new parameter settings.
  4. do 2.

set a learning rate e.

while error < threshold:
a. update every parameter i with -(error(n) - error(n-1)) * (param_i(n) - param_i(n-1)) * e
b. Fit and Predict. Compute the training error.

Problem.
All parameters changes are directed based on total error score. So if some parameter a is on it's way to a good place, but some other parameter throws the scores off, a will be thrown off too.

input contains NaN, infinity or a value too large for dtype('float64'). `

File "/root/miniconda2/lib/python2.7/site-packages/mlens/ensemble/base.py", line 614, in predict X, _ = check_inputs(X, check_level=self.array_check) File "/root/miniconda2/lib/python2.7/site-packages/mlens/utils/validation.py", line 562, in check_inputs X = _check_array(X) File "/root/miniconda2/lib/python2.7/site-packages/mlens/utils/validation.py", line 510, in _check_array warn_on_dtype=False # Mute as 'dtype' is 'None' File "/root/miniconda2/lib/python2.7/site-packages/mlens/externals/sklearn/validation.py", line 388, in check_array _assert_all_finite(array) File "/root/miniconda2/lib/python2.7/site-packages/mlens/externals/sklearn/validation.py", line 46, in _assert_all_finite " or a value too large for %r." % X.dtype) ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I have checked the data which is no nan and i can train the data with single model.my code show in the below:
ensemble.add('stack',model['ex']) ensemble.add('blend',model['ex']) ensemble.add('subsemble',model['ex']) meta = "ls" ensemble.add_meta(model[meta]) ensemble.fit(X_train,y_train) preds1 = np.exp(ensemble.predict(X))

Base layer Transformer

If we could build a base layer transformer class, it would be easy to convert every ensemble class to a base transformer class.

KerasClassifier "can't pickle _thread.RLock objects" message when predicting

I'm able to fit a model that includes KerasClassifier as a model in the ensemble. However, at prediction time I get the following error. I've tried changing the backend as well as the number of jobs but to no avail. Any ideas?

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
----> 1 preds = ensemble.predict_proba(X[294:])

D:\Continuum\anaconda3\lib\site-packages\mlens\ensemble\base.py in predict_proba(self, X, **kwargs)
633 """
634 kwargs.pop('proba', None)
--> 635 return self.predict(X, proba=True, **kwargs)
636
637 def _build_layer(self, estimators, indexer, preprocessing, **kwargs):

D:\Continuum\anaconda3\lib\site-packages\mlens\ensemble\base.py in predict(self, X, **kwargs)
613 return
614 X, _ = check_inputs(X, check_level=self.array_check)
--> 615 return self._backend.predict(X, **kwargs)
616
617 def predict_proba(self, X, **kwargs):

D:\Continuum\anaconda3\lib\site-packages\mlens\ensemble\base.py in predict(self, X, **kwargs)
199 predictions from final layer.
200 """
--> 201 if not self.fitted:
202 NotFittedError("Instance not fitted.")
203

D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in fitted(self)
359 if not self.stack or not self._check_static_params():
360 return False
--> 361 return all([g.fitted for g in self.stack])
362
363 @Property

D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in (.0)
359 if not self.stack or not self._check_static_params():
360 return False
--> 361 return all([g.fitted for g in self.stack])
362
363 @Property

D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in fitted(self)
359 if not self.stack or not self._check_static_params():
360 return False
--> 361 return all([g.fitted for g in self.stack])
362
363 @Property

D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in (.0)
359 if not self.stack or not self._check_static_params():
360 return False
--> 361 return all([g.fitted for g in self.stack])
362
363 @Property

D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\handles.py in fitted(self)
256 if not self._check_static_params():
257 return False
--> 258 return all([o.fitted for o in self.learners + self.transformers])
259
260 def get_params(self, deep=True):

D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\handles.py in (.0)
256 if not self._check_static_params():
257 return False
--> 258 return all([o.fitted for o in self.learners + self.transformers])
259
260 def get_params(self, deep=True):

D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\learner.py in fitted(self)
743 # Check estimator param overlap
744 fitted = self.learner + self.sublearners
--> 745 fitted_params = fitted[0].estimator.get_params(deep=True)
746 model_estimator_params = self.estimator.get_params(deep=True)
747

D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\learner.py in estimator(self)
64 def estimator(self):
65 """Deep copy of estimator"""
---> 66 return deepcopy(self._estimator)
67
68 @estimator.setter

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.

D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:

D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.

D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:

D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.

D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:

D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:

D:\Continuum\anaconda3\lib\copy.py in _deepcopy_list(x, memo, deepcopy)
213 append = y.append
214 for a in x:
--> 215 append(deepcopy(a, memo))
216 return y
217 d[list] = _deepcopy_list

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.

D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:

D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.

D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:

D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.

D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:

D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict

D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
167 reductor = getattr(x, "reduce_ex", None)
168 if reductor:
--> 169 rv = reductor(4)
170 else:
171 reductor = getattr(x, "reduce", None)

TypeError: can't pickle _thread.RLock objects`

time series indexer

A K-fold like time series indexer for time series cross-validation. The indexer should do non-overlapping:

fold train obs test obs
0 0 1
1 0, 1 2
2 0, 1, 2 3

Add layer API

Making ensemble initiate without estimator, and add add_layer method to construct ensemble of general layer structure.

TypeError in Model Selection

Hi again, I'm working through the model selection section and I think I've hit a bug. I'm passing scipy distributions as indicated in the examples, but omitting from below as there's quite a lot.

Truncated Code:

`from mlens.metrics import make_scorer
mae_scorer = make_scorer(mean_absolute_error, greater_is_better=False)

evaluator = Evaluator(mae_scorer, cv=5, random_state=seed, verbose=1)
evaluator.evaluate(X_mean, y, estimators, params, n_iter=10)`

Error:

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
65
66 evaluator = Evaluator(mae_scorer, cv=5, random_state=seed, verbose=1)
---> 67 evaluator.evaluate(X_mean, y, estimators, params, n_iter=10)

C:\Anaconda2\lib\site-packages\mlens\model_selection\model_selection.pyc in evaluate(self, X, y, estimators, param_dicts, n_iter)
387 self.estimators = check_instances(estimators)
388 self.n_iter = n_iter
--> 389 self._param_sets(param_dicts)
390
391 if self.verbose > 0:

C:\Anaconda2\lib\site-packages\mlens\model_selection\model_selection.pyc in _param_sets(self, param_dicts)
494 # the expected param_dicts key is 'est_name'
495 for est_name, _ in self.estimators:
--> 496 self._set_params(param_dicts, est_name)
497 else:
498 # Preprocessing

C:\Anaconda2\lib\site-packages\mlens\model_selection\model_selection.pyc in _set_params(self, param_dicts, key)
478 try:
479 self.params[key] =
--> 480 self._draw_params(param_dicts[key])
481 except KeyError:
482 # No param draws desired. Set empty dict.

C:\Anaconda2\lib\site-packages\mlens\model_selection\model_selection.pyc in _draw_params(self, param_dists)
469 draws = dist.rvs(self.n_iter, random_state=self.random_state)
470
--> 471 for i, draw in enumerate(draws):
472 param_draws[i][param] = draw
473

TypeError: 'int' object is not iterable
`

Looks to me as though draws = dist.rvs(self.n_iter, random_state=self.random_state) should be draws = dist.rvs(size=self.n_iter, random_state=self.random_state) ?

Pre checks

Fit: check layer exists
Fit: check meta estimator exists
Predict: check fitted (hasattr layer_)

Allow dictionary of input arrays

Allow users to pass a dictionary of input arrays to any class that inherist mlens.parallel.base.BaseStacker, where keys should be the name of the items in the stack attribute.

For ensembles and estimators, will need some form of exception handling for check_inputs.

AttributeError: 'SuperLearner' object has no attribute 'scores_'

Hi, I have found an issue about SuperLearner.scores_.

I have fitted an SuperLearner ensemble and I wanted to check the CV scores of base learners by typing pd.DataFrame(ensemble.scores_). However, an error occurs:
AttributeError: 'SuperLearner' object has no attribute 'scores_'

This is wired. 1) I have checked my instantiated and fitted ensemble. There is indeed no scores_ attribute. 2) I've never seen this issue before.

(And what make me crushed is that I spent a long time fitting this ensemble, only to found I can't see how my base learners behave...)

Anyway, here is my code:

ensemble = SuperLearner(scorer=mean_absolute_error, folds=5, random_state=seed, n_jobs=-1, shuffle=True)

ensemble.add([('et01', ExtraTreesRegressor(n_estimators=..., max_depth=..., n_jobs=-1)),
              ('et02', ExtraTreesRegressor(n_estimators=..., max_depth=..., n_jobs=-1)),
              ('et03', ExtraTreesRegressor(n_estimators=..., max_depth=..., n_jobs=-1)),
              ('xgb01', XGBRegressor(n_estimators=..., max_depth=..., learning_rate=..., nthread=20)),
              ('xgb02', XGBRegressor(n_estimators=..., max_depth=..., learning_rate=..., nthread=20)),
              ('xgb03', XGBRegressor(n_estimators=..., learning_rate=..., max_depth=..., gamma=..., nthread=20)),
              ('rf01', RandomForestRegressor(n_estimators=..., n_jobs=-1)),
              ('rf02', RandomForestRegressor(n_estimators=..., n_jobs=-1)),
              ('rf03', RandomForestRegressor(n_estimators=..., n_jobs=-1)),
              ('ridge01', Ridge(alpha=...)),
              ('ridge02', Ridge(alpha=...)),
              ('ridge03', Ridge(alpha=...)),
              ('lasso01', Lasso(alpha=...)),
              ('lasso02', Lasso(alpha=...)),
              ('lasso03', Lasso(alpha=...)),
              ('lgbm01', LGBMRegressor(n_estimators=..., learning_rate=...)),
              ('lgbm02', LGBMRegressor(n_estimators=..., learning_rate=...)),
              ('lgbm03', LGBMRegressor(n_estimators=..., learning_rate=...)),
              ('mlp01', MLPRegressor(hidden_layer_sizes=(...,))),
              ('mlp02', MLPRegressor(hidden_layer_sizes=(...,)))
             ])

ensemble.add_meta(Ridge(alpha=..., fit_intercept=False))

ensemble.fit(X, y)

print pd.DataFrame(ensemble.scores_)

So, my question is:

Q1. Is this the problem of my code or mlens ? I'm using 0.1.6 version

Q2. If this is the problem of my code, where should I change ?

Upgrade joblib

Upgrading to joblib > 9 seems to reduce parallelism. This doesn't seem to be a big issue on large datasets, but on smaller datasets parallelization appears to be significantly lower.

Improve search engine visibility

This seems irrelevant, but I think this is a great package and should have way more interest surrounding it. One of the reasons it might get looked over is that it doesn't appear in searches on google or on github when you search for stack or stacking, where as some other ensembling/stacking packages do!

Maybe adding this key search word to the readme and the docs would help :) Hopefully this is a useful suggestion.

FYI, I noticed this whilst looking for other package to benchmarking mlens against.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.