flennerhag / mlens Goto Github PK
View Code? Open in Web Editor NEWML-Ensemble – high performance ensemble learning
Home Page: http://ml-ensemble.com
License: MIT License
ML-Ensemble – high performance ensemble learning
Home Page: http://ml-ensemble.com
License: MIT License
Thanks yours model ,this is great idea.
I use the model in mulit layer, and use pickle dump to store the train model,but when load model which have beed stored some error is found .
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/python3.5/lib/python3.5/site-packages/mlens/ensemble/base.py", line 615, in predict return self._backend.predict(X, **kwargs) File "/usr/python3.5/lib/python3.5/site-packages/mlens/ensemble/base.py", line 206, in predict out = self._predict(X, 'predict', **kwargs) File "/usr/python3.5/lib/python3.5/site-packages/mlens/ensemble/base.py", line 266, in _predict out = manager.stack(self, job, X, return_preds=r, **kwargs) File "/usr/python3.5/lib/python3.5/site-packages/mlens/parallel/backend.py", line 655, in stack return self.process(caller=caller, out=out, **kwargs) File "/usr/python3.5/lib/python3.5/site-packages/mlens/parallel/backend.py", line 700, in process self._partial_process(task, parallel, **kwargs) File "/usr/python3.5/lib/python3.5/site-packages/mlens/parallel/backend.py", line 721, in _partial_process task(self.job.args(**kwargs), parallel=parallel) File "/usr/python3.5/lib/python3.5/site-packages/mlens/parallel/layer.py", line 123, in __call__ "Layer instance (%s) not fitted." % self.name) mlens.utils.exceptions.NotFittedError: Layer instance (layer-1) not fitted.
so how can i store and use my model of trained?
While adding any model in the ensemble layer, it is throwing an error as:
RuntimeError: Cannot clone object Learner(attr='predict', backend='threading', dtype=<class 'numpy.float32'>,
estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=6, max_features='sqrt', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=300, n_jobs=1,
oob_score=True, random_state=1, verbose=0, warm_start=False),
indexer=FoldIndex(X=None, folds=2, raise_on_exception=True), n_jobs=-1,
name='randomforestclassifier', preprocess=None, proba=False,
raise_on_exception=True, scorer=make_scorer(roc_auc_score)), as the constructor does not seem to set parameter scorer
Unit testing on windows with multiprocessing
yields a warning
ResourceWarning: unclosed file <_io.BufferedReader name=4>
which is due to how mlens.parallel.ParallelProcessing.close()
handles cache destruction.
Find the cause of the warning and amend. If due to nosetests
, capture warning when running unit tests.
here is my code:
from sklearn.linear_model import LinearRegression
class MyClass(LinearRegression):
def __init__(self, **kwargs):
super(MyClass, self).__init__(**kwargs)
def fit(self, X, y):
"""Fit estimator."""
super(MyClass, self).fit(X, y)
return self
def predict(self, X):
"""Generate partition"""
p = super(MyClass, self).predict(X)
return 1 * (p > p.mean())
def build_ensemble(incl_meta, propagate_features=None):
"""Return an ensemble."""
if propagate_features:
n = len(propagate_features)
propagate_features_1 = propagate_features
propagate_features_2 = [i for i in range(n)]
else:
propagate_features_1 = propagate_features_2 = None
estimators = [RandomForestRegressor(random_state=seed, n_jobs=6), RandomForestRegressor(n_jobs=5)]
ensemble = SuperLearner()
ensemble.add(estimators, propagate_features=propagate_features_1)
ensemble.add(estimators, propagate_features=propagate_features_2)
if incl_meta:
ensemble.add_meta(MyClass())
return ensemble
base = build_ensemble(False,[1, 3])
base.fit(X, y)
pred = base.predict(X)[:5]
print("Input to meta learner :\n %r" % pred)
And here is the error output, I think the reason for the error is n_jobs selection. Any thoughts about it?
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/_parallel_backends.py in __call__(self, *args, **kwargs)
349 try:
--> 350 return self.func(*args, **kwargs)
351 except KeyboardInterrupt:
~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in __call__(self)
134 def __call__(self):
--> 135 return [func(*args, **kwargs) for func, args, kwargs in self.items]
136
~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in <listcomp>(.0)
134 def __call__(self):
--> 135 return [func(*args, **kwargs) for func, args, kwargs in self.items]
136
~/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in __call__(self)
124 """Launch job"""
--> 125 return getattr(self, self.job)()
126
~/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in fit(self, path)
133
--> 134 self._fit(transformers)
135
~/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in _fit(self, transformers)
179 # Fit estimator
--> 180 self.estimator.fit(xtemp, ytemp)
181 self.fit_time_ = time() - t0
~/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py in fit(self, X, y, sample_weight)
315 tree = self._make_estimator(append=False,
--> 316 random_state=random_state)
317 trees.append(tree)
~/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/base.py in _make_estimator(self, append, random_state)
126 estimator.set_params(**dict((p, getattr(self, p))
--> 127 for p in self.estimator_params))
128
~/anaconda3/lib/python3.6/site-packages/sklearn/base.py in set_params(self, **params)
264 return self
--> 265 valid_params = self.get_params(deep=True)
266
~/anaconda3/lib/python3.6/site-packages/sklearn/base.py in get_params(self, deep)
240 finally:
--> 241 warnings.filters.pop(0)
242
IndexError: pop from empty list
During handling of the above exception, another exception occurred:
TransportableException Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in retrieve(self)
702 if getattr(self._backend, 'supports_timeout', False):
--> 703 self._output.extend(job.get(timeout=self.timeout))
704 else:
~/anaconda3/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
643 else:
--> 644 raise self._value
645
~/anaconda3/lib/python3.6/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
118 try:
--> 119 result = (True, func(*args, **kwds))
120 except Exception as e:
~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/_parallel_backends.py in __call__(self, *args, **kwargs)
358 text = format_exc(e_type, e_value, e_tb, context=10, tb_offset=1)
--> 359 raise TransportableException(text, e_type)
360
TransportableException: TransportableException
___________________________________________________________________________
IndexError Thu Jan 25 17:26:56 2018
PID: 3404 Python 3.6.3: /home/pyybor/anaconda3/bin/python
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in __call__(self=<mlens.externals.joblib.parallel.BatchedCalls object>)
130 def __init__(self, iterator_slice):
131 self.items = list(iterator_slice)
132 self._size = len(self.items)
133
134 def __call__(self):
--> 135 return [func(*args, **kwargs) for func, args, kwargs in self.items]
self.items = [(<mlens.parallel.learner.SubLearner object>, (), {})]
136
137 def __len__(self):
138 return self._size
139
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
130 def __init__(self, iterator_slice):
131 self.items = list(iterator_slice)
132 self._size = len(self.items)
133
134 def __call__(self):
--> 135 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func = <mlens.parallel.learner.SubLearner object>
args = ()
kwargs = {}
136
137 def __len__(self):
138 return self._size
139
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in __call__(self=<mlens.parallel.learner.SubLearner object>)
120 else:
121 self.processing_index = ''
122
123 def __call__(self):
124 """Launch job"""
--> 125 return getattr(self, self.job)()
self = <mlens.parallel.learner.SubLearner object>
self.job = 'fit'
126
127 def fit(self, path=None):
128 """Fit sub-learner"""
129 if not path:
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in fit(self=<mlens.parallel.learner.SubLearner object>, path=[])
129 if not path:
130 path = self.path
131 t0 = time()
132 transformers = self._load_preprocess(path)
133
--> 134 self._fit(transformers)
self._fit = <bound method SubLearner._fit of <mlens.parallel.learner.SubLearner object>>
transformers = None
135
136 if self.out_array is not None:
137 self._predict(transformers, self.scorer is not None)
138
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in _fit(self=<mlens.parallel.learner.SubLearner object>, transformers=None)
175 t0 = time()
176 if transformers:
177 xtemp, ytemp = transformers.transform(xtemp, ytemp)
178
179 # Fit estimator
--> 180 self.estimator.fit(xtemp, ytemp)
self.estimator.fit = <bound method BaseForest.fit of RandomForestRegr... random_state=2017, verbose=0, warm_start=False)>
xtemp = array([[ 1.00000000e+00, 3.63000000e+02, 2....04, 2.66834227e-08,
5.41752459e+02]])
ytemp = array([ -1.95638686e-04, -3.79831391e-03, -2.9...9152394e-05, 3.94220996e-05, -8.21904840e-06])
181 self.fit_time_ = time() - t0
182
183 def _load_preprocess(self, path):
184 """Load preprocessing pipeline"""
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py in fit(self=RandomForestRegressor(bootstrap=True, criterion=..., random_state=2017, verbose=0, warm_start=False), X=array([[ 1.00000000e+00, 3.63000000e+02, 2....5e-08,
5.41752441e+02]], dtype=float32), y=array([[ -1.95638686e-04],
[ -3.79831391e... [ 3.94220996e-05],
[ -8.21904840e-06]]), sample_weight=None)
311 random_state.randint(MAX_INT, size=len(self.estimators_))
312
313 trees = []
314 for i in range(n_more_estimators):
315 tree = self._make_estimator(append=False,
--> 316 random_state=random_state)
random_state = <mtrand.RandomState object>
317 trees.append(tree)
318
319 # Parallel loop: we use the threading backend as the Cython code
320 # for fitting the trees is internally releasing the Python GIL
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/base.py in _make_estimator(self=RandomForestRegressor(bootstrap=True, criterion=..., random_state=2017, verbose=0, warm_start=False), append=False, random_state=<mtrand.RandomState object>)
122 Warning: This method should be used to properly instantiate new
123 sub-estimators.
124 """
125 estimator = clone(self.base_estimator_)
126 estimator.set_params(**dict((p, getattr(self, p))
--> 127 for p in self.estimator_params))
self.estimator_params = ('criterion', 'max_depth', 'min_samples_split', 'min_samples_leaf', 'min_weight_fraction_leaf', 'max_features', 'max_leaf_nodes', 'min_impurity_decrease', 'min_impurity_split', 'random_state')
128
129 if random_state is not None:
130 _set_random_states(estimator, random_state)
131
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/base.py in set_params(self=DecisionTreeRegressor(criterion='mse', max_depth...resort=False, random_state=None, splitter='best'), **params={'criterion': 'mse', 'max_depth': None, 'max_features': 'auto', 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_impurity_split': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'random_state': 2017})
260 self
261 """
262 if not params:
263 # Simple optimization to gain speed (inspect is slow)
264 return self
--> 265 valid_params = self.get_params(deep=True)
valid_params = undefined
self.get_params = <bound method BaseEstimator.get_params of Decisi...esort=False, random_state=None, splitter='best')>
266
267 nested_params = defaultdict(dict) # grouped by prefix
268 for key, value in params.items():
269 key, delim, sub_key = key.partition('__')
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/base.py in get_params(self=DecisionTreeRegressor(criterion='mse', max_depth...resort=False, random_state=None, splitter='best'), deep=True)
236 value = getattr(self, key, None)
237 if len(w) and w[0].category == DeprecationWarning:
238 # if the parameter is deprecated, don't show it
239 continue
240 finally:
--> 241 warnings.filters.pop(0)
242
243 # XXX: should we rather test if instance of estimator?
244 if deep and hasattr(value, 'get_params'):
245 deep_items = value.get_params().items()
IndexError: pop from empty list
___________________________________________________________________________
During handling of the above exception, another exception occurred:
JoblibIndexError Traceback (most recent call last)
<ipython-input-49-f8f29c8588e9> in <module>()
----> 1 score_no_prep = evaluate_ensemble(None)
2 score_prep = evaluate_ensemble([1,3])
3 print("Test set score no feature propagation : %.3f" % score_no_prep)
4 print("Test set score with feature propagation: %.3f" % score_prep)
<ipython-input-46-a9bf13defd9a> in evaluate_ensemble(propagate_features)
2 """Wrapper for ensemble evaluation."""
3 ens = build_ensemble(True, propagate_features)
----> 4 ens.fit(X.iloc[:75].values, Y.iloc[:75].values)
5 pred = ens.predict(X.iloc[75:2000].values)
6 #print(pred[:5])
~/anaconda3/lib/python3.6/site-packages/mlens/ensemble/base.py in fit(self, X, y, **kwargs)
514 self._id_train.fit(X)
515
--> 516 out = self._backend.fit(X, y, **kwargs)
517 if out is not self._backend:
518 # fit_transform
~/anaconda3/lib/python3.6/site-packages/mlens/ensemble/base.py in fit(self, X, y, **kwargs)
156 with ParallelProcessing(self.backend, self.n_jobs,
157 max(self.verbose - 4, 0)) as manager:
--> 158 out = manager.stack(self, 'fit', X, y, **kwargs)
159
160 if self.verbose:
~/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in stack(self, caller, job, X, y, path, return_preds, wart_start, split, **kwargs)
653 job=job, X=X, y=y, path=path, warm_start=wart_start,
654 return_preds=return_preds, split=split, stack=True)
--> 655 return self.process(caller=caller, out=out, **kwargs)
656
657 def process(self, caller, out, **kwargs):
~/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in process(self, caller, out, **kwargs)
698 self.job.clear()
699
--> 700 self._partial_process(task, parallel, **kwargs)
701
702 if task.name in return_names:
~/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in _partial_process(self, task, parallel, **kwargs)
719 self._gen_prediction_array(task, self.job.job, self.__threading__)
720
--> 721 task(self.job.args(**kwargs), parallel=parallel)
722
723 if not task.__no_output__ and getattr(task, 'n_feature_prop', 0):
~/anaconda3/lib/python3.6/site-packages/mlens/parallel/layer.py in __call__(self, args, parallel)
150
151 parallel(delayed(sublearner, not _threading)()
--> 152 for learner in self.learners
153 for sublearner in learner(args, 'main'))
154
~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in __call__(self, iterable)
791 # consumption.
792 self._iterating = False
--> 793 self.retrieve()
794 # Make sure that we get a last message telling us we are done
795 elapsed_time = time.time() - self._start_time
~/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in retrieve(self)
742 exception = exception_type(report)
743
--> 744 raise exception
745
746 def __call__(self, iterable):
JoblibIndexError: JoblibIndexError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/runpy.py in _run_module_as_main(mod_name='ipykernel_launcher', alter_argv=1)
188 sys.exit(msg)
189 main_globals = sys.modules["__main__"].__dict__
190 if alter_argv:
191 sys.argv[0] = mod_spec.origin
192 return _run_code(code, main_globals, None,
--> 193 "__main__", mod_spec)
mod_spec = ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py')
194
195 def run_module(mod_name, init_globals=None,
196 run_name=None, alter_sys=False):
197 """Execute a module's code without importing it
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/runpy.py in _run_code(code=<code object <module> at 0x7fde65662420, file "/...3.6/site-packages/ipykernel_launcher.py", line 5>, run_globals={'__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__cached__': '/home/pyybor/anaconda3/lib/python3.6/site-packages/__pycache__/ipykernel_launcher.cpython-36.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/home/pyybor.../python3.6/site-packages/ipykernel/kernelapp.py'>, ...}, init_globals=None, mod_name='__main__', mod_spec=ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), pkg_name='', script_name=None)
80 __cached__ = cached,
81 __doc__ = None,
82 __loader__ = loader,
83 __package__ = pkg_name,
84 __spec__ = mod_spec)
---> 85 exec(code, run_globals)
code = <code object <module> at 0x7fde65662420, file "/...3.6/site-packages/ipykernel_launcher.py", line 5>
run_globals = {'__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__cached__': '/home/pyybor/anaconda3/lib/python3.6/site-packages/__pycache__/ipykernel_launcher.cpython-36.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/home/pyybor.../python3.6/site-packages/ipykernel/kernelapp.py'>, ...}
86 return run_globals
87
88 def _run_module_code(code, init_globals=None,
89 mod_name=None, mod_spec=None,
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py in <module>()
11 # This is added back by InteractiveShellApp.init_path()
12 if sys.path[0] == '':
13 del sys.path[0]
14
15 from ipykernel import kernelapp as app
---> 16 app.launch_new_instance()
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
653
654 If a global instance already exists, this reinitializes and starts it
655 """
656 app = cls.instance(**kwargs)
657 app.initialize(argv)
--> 658 app.start()
app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
659
660 #-----------------------------------------------------------------------------
661 # utility functions, for convenience
662 #-----------------------------------------------------------------------------
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
472 return self.subapp.start()
473 if self.poller is not None:
474 self.poller.start()
475 self.kernel.start()
476 try:
--> 477 ioloop.IOLoop.instance().start()
478 except KeyboardInterrupt:
479 pass
480
481 launch_new_instance = IPKernelApp.launch_instance
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
172 )
173 return loop
174
175 def start(self):
176 try:
--> 177 super(ZMQIOLoop, self).start()
self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
178 except ZMQError as e:
179 if e.errno == ETERM:
180 # quietly return on ETERM
181 pass
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
883 self._events.update(event_pairs)
884 while self._events:
885 fd, events = self._events.popitem()
886 try:
887 fd_obj, handler_func = self._handlers[fd]
--> 888 handler_func(fd_obj, events)
handler_func = <function wrap.<locals>.null_wrapper>
fd_obj = <zmq.sugar.socket.Socket object>
events = 1
889 except (OSError, IOError) as e:
890 if errno_from_exception(e) == errno.EPIPE:
891 # Happens when the client closes the connection
892 pass
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
272 # Fast path when there are no active contexts.
273 def null_wrapper(*args, **kwargs):
274 try:
275 current_state = _state.contexts
276 _state.contexts = cap_contexts[0]
--> 277 return fn(*args, **kwargs)
args = (<zmq.sugar.socket.Socket object>, 1)
kwargs = {}
278 finally:
279 _state.contexts = current_state
280 null_wrapper._wrapped = True
281 return null_wrapper
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
435 # dispatch events:
436 if events & IOLoop.ERROR:
437 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
438 return
439 if events & IOLoop.READ:
--> 440 self._handle_recv()
self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
441 if not self.socket:
442 return
443 if events & IOLoop.WRITE:
444 self._handle_send()
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
467 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
468 else:
469 if self._recv_callback:
470 callback = self._recv_callback
471 # self._recv_callback = None
--> 472 self._run_callback(callback, msg)
self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
callback = <function wrap.<locals>.null_wrapper>
msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
473
474 # self.update_state()
475
476
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function wrap.<locals>.null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
409 close our socket."""
410 try:
411 # Use a NullContext to ensure that all StackContexts are run
412 # inside our blanket exception handler rather than outside.
413 with stack_context.NullContext():
--> 414 callback(*args, **kwargs)
callback = <function wrap.<locals>.null_wrapper>
args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
kwargs = {}
415 except:
416 gen_log.error("Uncaught exception, closing connection.",
417 exc_info=True)
418 # Close the socket on an uncaught exception from a user callback
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
272 # Fast path when there are no active contexts.
273 def null_wrapper(*args, **kwargs):
274 try:
275 current_state = _state.contexts
276 _state.contexts = cap_contexts[0]
--> 277 return fn(*args, **kwargs)
args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
kwargs = {}
278 finally:
279 _state.contexts = current_state
280 null_wrapper._wrapped = True
281 return null_wrapper
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
278 if self.control_stream:
279 self.control_stream.on_recv(self.dispatch_control, copy=False)
280
281 def make_dispatcher(stream):
282 def dispatcher(msg):
--> 283 return self.dispatch_shell(stream, msg)
msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
284 return dispatcher
285
286 for s in self.shell_streams:
287 s.on_recv(make_dispatcher(s), copy=False)
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {'allow_stdin': True, 'code': 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2018, 1, 25, 17, 26, 56, 820132, tzinfo=tzlocal()), 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'session': 'B2C82C6B86CB489C834937CEBD68684B', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'parent_header': {}})
230 self.log.warn("Unknown message type: %r", msg_type)
231 else:
232 self.log.debug("%s: %s", msg_type, msg)
233 self.pre_handler_hook()
234 try:
--> 235 handler(stream, idents, msg)
handler = <bound method Kernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
stream = <zmq.eventloop.zmqstream.ZMQStream object>
idents = [b'B2C82C6B86CB489C834937CEBD68684B']
msg = {'buffers': [], 'content': {'allow_stdin': True, 'code': 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2018, 1, 25, 17, 26, 56, 820132, tzinfo=tzlocal()), 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'session': 'B2C82C6B86CB489C834937CEBD68684B', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'parent_header': {}}
236 except Exception:
237 self.log.error("Exception in message handler:", exc_info=True)
238 finally:
239 self.post_handler_hook()
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=[b'B2C82C6B86CB489C834937CEBD68684B'], parent={'buffers': [], 'content': {'allow_stdin': True, 'code': 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2018, 1, 25, 17, 26, 56, 820132, tzinfo=tzlocal()), 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'session': 'B2C82C6B86CB489C834937CEBD68684B', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '97F4C39E677248CE8B483E3539F82292', 'msg_type': 'execute_request', 'parent_header': {}})
394 if not silent:
395 self.execution_count += 1
396 self._publish_execute_input(code, parent, self.execution_count)
397
398 reply_content = self.do_execute(code, silent, store_history,
--> 399 user_expressions, allow_stdin)
user_expressions = {}
allow_stdin = True
400
401 # Flush output before sending the reply.
402 sys.stdout.flush()
403 sys.stderr.flush()
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code='score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', silent=False, store_history=True, user_expressions={}, allow_stdin=True)
191
192 self._forward_input(allow_stdin)
193
194 reply_content = {}
195 try:
--> 196 res = shell.run_cell(code, store_history=store_history, silent=silent)
res = undefined
shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
code = 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)'
store_history = True
silent = False
197 finally:
198 self._restore_input()
199
200 if res.error_before_exec is not None:
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=('score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)',), **kwargs={'silent': False, 'store_history': True})
528 )
529 self.payload_manager.write_payload(payload)
530
531 def run_cell(self, *args, **kwargs):
532 self._last_traceback = None
--> 533 return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
args = ('score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)',)
kwargs = {'silent': False, 'store_history': True}
534
535 def _showtraceback(self, etype, evalue, stb):
536 # try to preserve ordering of tracebacks and print statements
537 sys.stdout.flush()
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell='score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', store_history=True, silent=False, shell_futures=True)
2693 self.displayhook.exec_result = result
2694
2695 # Execute the user code
2696 interactivity = "none" if silent else self.ast_node_interactivity
2697 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2698 interactivity=interactivity, compiler=compiler, result=result)
interactivity = 'last_expr'
compiler = <IPython.core.compilerop.CachingCompiler object>
2699
2700 self.last_execution_succeeded = not has_raised
2701
2702 # Reset this so later displayed values do not modify the
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Assign object>, <_ast.Assign object>, <_ast.Expr object>, <_ast.Expr object>], cell_name='<ipython-input-49-f8f29c8588e9>', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler object>, result=<ExecutionResult object at 7fde107b1d30, executi..._before_exec=None error_in_exec=None result=None>)
2797
2798 try:
2799 for i, node in enumerate(to_run_exec):
2800 mod = ast.Module([node])
2801 code = compiler(mod, cell_name, "exec")
-> 2802 if self.run_code(code, result):
self.run_code = <bound method InteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
code = <code object <module> at 0x7fde1829bf60, file "<ipython-input-49-f8f29c8588e9>", line 1>
result = <ExecutionResult object at 7fde107b1d30, executi..._before_exec=None error_in_exec=None result=None>
2803 return True
2804
2805 for i, node in enumerate(to_run_interactive):
2806 mod = ast.Interactive([node])
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x7fde1829bf60, file "<ipython-input-49-f8f29c8588e9>", line 1>, result=<ExecutionResult object at 7fde107b1d30, executi..._before_exec=None error_in_exec=None result=None>)
2857 outflag = True # happens in more places, so it's easier as default
2858 try:
2859 try:
2860 self.hooks.pre_run_code_hook()
2861 #rprint('Running code', repr(code_obj)) # dbg
-> 2862 exec(code_obj, self.user_global_ns, self.user_ns)
code_obj = <code object <module> at 0x7fde1829bf60, file "<ipython-input-49-f8f29c8588e9>", line 1>
self.user_global_ns = {'DataFrame': <class 'pandas.core.frame.DataFrame'>, 'In': ['', "import numpy as np\nimport pandas as pd\ndf = pd.r...ay'],1)\nX_test = df_test.drop(['Day'],1)\nY = df.y", 'from sklearn.linear_model import LinearRegressio...elf).predict(X)\n return 1 * (p > p.mean())', 'import numpy as np\nfrom pandas import DataFrame\n...mport load_iris\n\nseed = 2017\nnp.random.seed(seed)', 'from mlens.ensemble import SuperLearner\nfrom skl... ensemble.add_meta(MyClass())\n return ensemble', 'import sklearn as sk\nsk.__version__', "get_ipython().system('pip install -U sklearn')", 'base = build_ensemble(False,[1, 3])\nbase.fit(X, ...[:5]\nprint("Input to meta learner :\\n %r" % pred)', 'def evaluate_ensemble(propagate_features):\n "...(mean_squared_error(pred, Y.iloc[75:200].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n "...mean_squared_error(pred, Y.iloc[75:2000].values))', ...], 'LinearRegression': <class 'sklearn.linear_model.base.LinearRegression'>, 'MyClass': <class '__main__.MyClass'>, 'Out': {5: '0.19.1', 22: '0.19.1', 30: '0.19.1', 37: '0.19.1', 40: '0.19.1', 43: '0.19.1'}, 'RandomForestRegressor': <class 'sklearn.ensemble.forest.RandomForestRegressor'>, 'SVC': <class 'sklearn.svm.classes.SVC'>, 'SuperLearner': <class 'mlens.ensemble.super_learner.SuperLearner'>, 'X': Market Stock x0 x1 ...24e-05 100.000000
[623817 rows x 13 columns], 'X_test': Market Stock x0 x1 ...603e-05 104.118389
[640430 rows x 13 columns], ...}
self.user_ns = {'DataFrame': <class 'pandas.core.frame.DataFrame'>, 'In': ['', "import numpy as np\nimport pandas as pd\ndf = pd.r...ay'],1)\nX_test = df_test.drop(['Day'],1)\nY = df.y", 'from sklearn.linear_model import LinearRegressio...elf).predict(X)\n return 1 * (p > p.mean())', 'import numpy as np\nfrom pandas import DataFrame\n...mport load_iris\n\nseed = 2017\nnp.random.seed(seed)', 'from mlens.ensemble import SuperLearner\nfrom skl... ensemble.add_meta(MyClass())\n return ensemble', 'import sklearn as sk\nsk.__version__', "get_ipython().system('pip install -U sklearn')", 'base = build_ensemble(False,[1, 3])\nbase.fit(X, ...[:5]\nprint("Input to meta learner :\\n %r" % pred)', 'def evaluate_ensemble(propagate_features):\n "...(mean_squared_error(pred, Y.iloc[75:200].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n "...mean_squared_error(pred, Y.iloc[75:2000].values))', 'score_no_prep = evaluate_ensemble(None)\nscore_pr...ore with feature propagation: %.3f" % score_prep)', 'def evaluate_ensemble(propagate_features):\n "...mean_squared_error(pred, Y.iloc[75:2000].values))', ...], 'LinearRegression': <class 'sklearn.linear_model.base.LinearRegression'>, 'MyClass': <class '__main__.MyClass'>, 'Out': {5: '0.19.1', 22: '0.19.1', 30: '0.19.1', 37: '0.19.1', 40: '0.19.1', 43: '0.19.1'}, 'RandomForestRegressor': <class 'sklearn.ensemble.forest.RandomForestRegressor'>, 'SVC': <class 'sklearn.svm.classes.SVC'>, 'SuperLearner': <class 'mlens.ensemble.super_learner.SuperLearner'>, 'X': Market Stock x0 x1 ...24e-05 100.000000
[623817 rows x 13 columns], 'X_test': Market Stock x0 x1 ...603e-05 104.118389
[640430 rows x 13 columns], ...}
2863 finally:
2864 # Reset our crash handler in place
2865 sys.excepthook = old_excepthook
2866 except SystemExit as e:
...........................................................................
/home/pyybor/g_ch/<ipython-input-49-f8f29c8588e9> in <module>()
----> 1 score_no_prep = evaluate_ensemble(None)
2 score_prep = evaluate_ensemble([1,3])
3 print("Test set score no feature propagation : %.3f" % score_no_prep)
4 print("Test set score with feature propagation: %.3f" % score_prep)
...........................................................................
/home/pyybor/g_ch/<ipython-input-46-a9bf13defd9a> in evaluate_ensemble(propagate_features=None)
1 def evaluate_ensemble(propagate_features):
2 """Wrapper for ensemble evaluation."""
3 ens = build_ensemble(True, propagate_features)
----> 4 ens.fit(X.iloc[:75].values, Y.iloc[:75].values)
5 pred = ens.predict(X.iloc[75:2000].values)
6 #print(pred[:5])
7 return np.sqrt(mean_squared_error(pred, Y.iloc[75:2000].values))
8
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/ensemble/base.py in fit(self=SuperLearner(array_check=2, backend=None, folds=...scorer=None, shuffle=False,
verbose=False), X=array([[ 1.00000000e+00, 3.63000000e+02, 2....04, 2.66834227e-08,
5.41752459e+02]]), y=array([ -1.95638686e-04, -3.79831391e-03, -2.9...9152394e-05, 3.94220996e-05, -8.21904840e-06]), **kwargs={})
511 X, y = check_inputs(X, y, self.array_check)
512
513 if self.model_selection:
514 self._id_train.fit(X)
515
--> 516 out = self._backend.fit(X, y, **kwargs)
out = undefined
self._backend.fit = <bound method Sequential.fit of Sequential(backe...rmers=[])],
verbose=0)],
verbose=False)>
X = array([[ 1.00000000e+00, 3.63000000e+02, 2....04, 2.66834227e-08,
5.41752459e+02]])
y = array([ -1.95638686e-04, -3.79831391e-03, -2.9...9152394e-05, 3.94220996e-05, -8.21904840e-06])
kwargs = {}
517 if out is not self._backend:
518 # fit_transform
519 return out
520 else:
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/ensemble/base.py in fit(self=Sequential(backend='threading', dtype=<class 'nu...ormers=[])],
verbose=0)],
verbose=False), X=array([[ 1.00000000e+00, 3.63000000e+02, 2....04, 2.66834227e-08,
5.41752459e+02]]), y=array([ -1.95638686e-04, -3.79831391e-03, -2.9...9152394e-05, 3.94220996e-05, -8.21904840e-06]), **kwargs={})
153
154 f, t0 = print_job(self, "Fitting")
155
156 with ParallelProcessing(self.backend, self.n_jobs,
157 max(self.verbose - 4, 0)) as manager:
--> 158 out = manager.stack(self, 'fit', X, y, **kwargs)
out = undefined
manager.stack = <bound method ParallelProcessing.stack of <mlens.parallel.backend.ParallelProcessing object>>
self = Sequential(backend='threading', dtype=<class 'nu...ormers=[])],
verbose=0)],
verbose=False)
X = array([[ 1.00000000e+00, 3.63000000e+02, 2....04, 2.66834227e-08,
5.41752459e+02]])
y = array([ -1.95638686e-04, -3.79831391e-03, -2.9...9152394e-05, 3.94220996e-05, -8.21904840e-06])
kwargs = {}
159
160 if self.verbose:
161 print_time(t0, "{:<35}".format("Fit complete"), file=f)
162
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in stack(self=<mlens.parallel.backend.ParallelProcessing object>, caller=Sequential(backend='threading', dtype=<class 'nu...ormers=[])],
verbose=0)],
verbose=False), job='fit', X=array([[ 1.00000000e+00, 3.63000000e+02, 2....04, 2.66834227e-08,
5.41752459e+02]]), y=array([ -1.95638686e-04, -3.79831391e-03, -2.9...9152394e-05, 3.94220996e-05, -8.21904840e-06]), path=None, return_preds=False, wart_start=False, split=True, **kwargs={})
650 Prediction array(s).
651 """
652 out = self.initialize(
653 job=job, X=X, y=y, path=path, warm_start=wart_start,
654 return_preds=return_preds, split=split, stack=True)
--> 655 return self.process(caller=caller, out=out, **kwargs)
self.process = <bound method ParallelProcessing.process of <mlens.parallel.backend.ParallelProcessing object>>
caller = Sequential(backend='threading', dtype=<class 'nu...ormers=[])],
verbose=0)],
verbose=False)
out = {}
kwargs = {}
656
657 def process(self, caller, out, **kwargs):
658 """Process job.
659
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in process(self=<mlens.parallel.backend.ParallelProcessing object>, caller=Sequential(backend='threading', dtype=<class 'nu...ormers=[])],
verbose=0)],
verbose=False), out=None, **kwargs={})
695 backend=self.backend) as parallel:
696
697 for task in caller:
698 self.job.clear()
699
--> 700 self._partial_process(task, parallel, **kwargs)
self._partial_process = <bound method ParallelProcessing._partial_proces...lens.parallel.backend.ParallelProcessing object>>
task = Layer(backend='threading', dtype=<class 'numpy.f..._exception=True, transformers=[])],
verbose=0)
parallel = Parallel(n_jobs=-1)
kwargs = {}
701
702 if task.name in return_names:
703 out.append(self.get_preds(dtype=_dtype(task)))
704
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/backend.py in _partial_process(self=<mlens.parallel.backend.ParallelProcessing object>, task=Layer(backend='threading', dtype=<class 'numpy.f..._exception=True, transformers=[])],
verbose=0), parallel=Parallel(n_jobs=-1), **kwargs={})
716 task.setup(self.job.predict_in, self.job.y, self.job.job)
717
718 if not task.__no_output__:
719 self._gen_prediction_array(task, self.job.job, self.__threading__)
720
--> 721 task(self.job.args(**kwargs), parallel=parallel)
task = Layer(backend='threading', dtype=<class 'numpy.f..._exception=True, transformers=[])],
verbose=0)
self.job.args = <bound method Job.args of <mlens.parallel.backend.Job object>>
kwargs = {}
parallel = Parallel(n_jobs=-1)
722
723 if not task.__no_output__ and getattr(task, 'n_feature_prop', 0):
724 self._propagate_features(task)
725
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/layer.py in __call__(self=Layer(backend='threading', dtype=<class 'numpy.f..._exception=True, transformers=[])],
verbose=0), args={'auxiliary': {'P': None, 'X': array([[ 1.00000000e+00, 3.63000000e+02, 2....04, 2.66834227e-08,
5.41752459e+02]]), 'y': array([ -1.95638686e-04, -3.79831391e-03, -2.9...9152394e-05, 3.94220996e-05, -8.21904840e-06])}, 'dir': [('randomforestregressor-2.0.0', <mlens.parallel.learner.IndexedEstimator object>), ('randomforestregressor-2.0.2', <mlens.parallel.learner.IndexedEstimator object>), ('randomforestregressor-1.0.2', <mlens.parallel.learner.IndexedEstimator object>)], 'job': 'fit', 'main': {'P': array([[ -6.91946599e-36, 6.62666976e-01],
....72381141e-05, 6.66164560e-05]], dtype=float32), 'X': array([[ 1.00000000e+00, 3.63000000e+02, 2....04, 2.66834227e-08,
5.41752459e+02]]), 'y': array([ -1.95638686e-04, -3.79831391e-03, -2.9...9152394e-05, 3.94220996e-05, -8.21904840e-06])}}, parallel=Parallel(n_jobs=-1))
147 if self.verbose >= 2:
148 safe_print(msg.format('Learners ...'), file=f, end=e2)
149 t1 = time()
150
151 parallel(delayed(sublearner, not _threading)()
--> 152 for learner in self.learners
self.learners = [Learner(attr='predict', backend='threading', dty...=False,
raise_on_exception=True, scorer=None), Learner(attr='predict', backend='threading', dty...=False,
raise_on_exception=True, scorer=None)]
153 for sublearner in learner(args, 'main'))
154
155 if self.verbose >= 2:
156 print_time(t1, 'done', file=f)
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=-1), iterable=<generator object Layer.__call__.<locals>.<genexpr>>)
788 if pre_dispatch == "all" or n_jobs == 1:
789 # The iterable was consumed all at once by the above for loop.
790 # No need to wait for async callbacks to trigger to
791 # consumption.
792 self._iterating = False
--> 793 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
794 # Make sure that we get a last message telling us we are done
795 elapsed_time = time.time() - self._start_time
796 self._print('Done %3i out of %3i | elapsed: %s finished',
797 (len(self._output), len(self._output),
---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
IndexError Thu Jan 25 17:26:56 2018
PID: 3404 Python 3.6.3: /home/pyybor/anaconda3/bin/python
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in __call__(self=<mlens.externals.joblib.parallel.BatchedCalls object>)
130 def __init__(self, iterator_slice):
131 self.items = list(iterator_slice)
132 self._size = len(self.items)
133
134 def __call__(self):
--> 135 return [func(*args, **kwargs) for func, args, kwargs in self.items]
self.items = [(<mlens.parallel.learner.SubLearner object>, (), {})]
136
137 def __len__(self):
138 return self._size
139
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
130 def __init__(self, iterator_slice):
131 self.items = list(iterator_slice)
132 self._size = len(self.items)
133
134 def __call__(self):
--> 135 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func = <mlens.parallel.learner.SubLearner object>
args = ()
kwargs = {}
136
137 def __len__(self):
138 return self._size
139
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in __call__(self=<mlens.parallel.learner.SubLearner object>)
120 else:
121 self.processing_index = ''
122
123 def __call__(self):
124 """Launch job"""
--> 125 return getattr(self, self.job)()
self = <mlens.parallel.learner.SubLearner object>
self.job = 'fit'
126
127 def fit(self, path=None):
128 """Fit sub-learner"""
129 if not path:
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in fit(self=<mlens.parallel.learner.SubLearner object>, path=[])
129 if not path:
130 path = self.path
131 t0 = time()
132 transformers = self._load_preprocess(path)
133
--> 134 self._fit(transformers)
self._fit = <bound method SubLearner._fit of <mlens.parallel.learner.SubLearner object>>
transformers = None
135
136 if self.out_array is not None:
137 self._predict(transformers, self.scorer is not None)
138
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/mlens/parallel/learner.py in _fit(self=<mlens.parallel.learner.SubLearner object>, transformers=None)
175 t0 = time()
176 if transformers:
177 xtemp, ytemp = transformers.transform(xtemp, ytemp)
178
179 # Fit estimator
--> 180 self.estimator.fit(xtemp, ytemp)
self.estimator.fit = <bound method BaseForest.fit of RandomForestRegr... random_state=2017, verbose=0, warm_start=False)>
xtemp = array([[ 1.00000000e+00, 3.63000000e+02, 2....04, 2.66834227e-08,
5.41752459e+02]])
ytemp = array([ -1.95638686e-04, -3.79831391e-03, -2.9...9152394e-05, 3.94220996e-05, -8.21904840e-06])
181 self.fit_time_ = time() - t0
182
183 def _load_preprocess(self, path):
184 """Load preprocessing pipeline"""
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py in fit(self=RandomForestRegressor(bootstrap=True, criterion=..., random_state=2017, verbose=0, warm_start=False), X=array([[ 1.00000000e+00, 3.63000000e+02, 2....5e-08,
5.41752441e+02]], dtype=float32), y=array([[ -1.95638686e-04],
[ -3.79831391e... [ 3.94220996e-05],
[ -8.21904840e-06]]), sample_weight=None)
311 random_state.randint(MAX_INT, size=len(self.estimators_))
312
313 trees = []
314 for i in range(n_more_estimators):
315 tree = self._make_estimator(append=False,
--> 316 random_state=random_state)
random_state = <mtrand.RandomState object>
317 trees.append(tree)
318
319 # Parallel loop: we use the threading backend as the Cython code
320 # for fitting the trees is internally releasing the Python GIL
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/base.py in _make_estimator(self=RandomForestRegressor(bootstrap=True, criterion=..., random_state=2017, verbose=0, warm_start=False), append=False, random_state=<mtrand.RandomState object>)
122 Warning: This method should be used to properly instantiate new
123 sub-estimators.
124 """
125 estimator = clone(self.base_estimator_)
126 estimator.set_params(**dict((p, getattr(self, p))
--> 127 for p in self.estimator_params))
self.estimator_params = ('criterion', 'max_depth', 'min_samples_split', 'min_samples_leaf', 'min_weight_fraction_leaf', 'max_features', 'max_leaf_nodes', 'min_impurity_decrease', 'min_impurity_split', 'random_state')
128
129 if random_state is not None:
130 _set_random_states(estimator, random_state)
131
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/base.py in set_params(self=DecisionTreeRegressor(criterion='mse', max_depth...resort=False, random_state=None, splitter='best'), **params={'criterion': 'mse', 'max_depth': None, 'max_features': 'auto', 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_impurity_split': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'random_state': 2017})
260 self
261 """
262 if not params:
263 # Simple optimization to gain speed (inspect is slow)
264 return self
--> 265 valid_params = self.get_params(deep=True)
valid_params = undefined
self.get_params = <bound method BaseEstimator.get_params of Decisi...esort=False, random_state=None, splitter='best')>
266
267 nested_params = defaultdict(dict) # grouped by prefix
268 for key, value in params.items():
269 key, delim, sub_key = key.partition('__')
...........................................................................
/home/pyybor/anaconda3/lib/python3.6/site-packages/sklearn/base.py in get_params(self=DecisionTreeRegressor(criterion='mse', max_depth...resort=False, random_state=None, splitter='best'), deep=True)
236 value = getattr(self, key, None)
237 if len(w) and w[0].category == DeprecationWarning:
238 # if the parameter is deprecated, don't show it
239 continue
240 finally:
--> 241 warnings.filters.pop(0)
242
243 # XXX: should we rather test if instance of estimator?
244 if deep and hasattr(value, 'get_params'):
245 deep_items = value.get_params().items()
IndexError: pop from empty list
___________________________________________________________________________
As mentioned in the doc, we can map different preprocess to different estimators as
preprocessing_cases = {"case-1": [trans_1, trans_2],
"case-2": [alt_trans_1, alt_trans_2]}
estimators = {"case-1": [est_a, est_b],
"case-2": [est_c, est_d]}
Doing this, do we use trans_1 for est_a and trans_2 for est_b? Then if I use
preprocessing_cases = {"case-1": [trans_1],
"case-2": [alt_trans_1]}
estimators = {"case-1": [est_a, est_b],
"case-2": [est_c, est_d]}
Does this mean I will use trans_1 for both est_a and est_b while alt_trans_1 for both est_c and est_d?
If the first usage can already handle the mapping between preprocess to estimators in the list, why do we need the dictionary?
Switch from print messages to logger for greater control.
Need:
A section on the use of joblib
An ensemble primer
Benchmarks
A more full tutorial
When building a model (e.g. SuperLearner) with proba=True
on the meta estimator, how can I access the class labels for the model.predict()
output columns.
sklearn classifiers generally expose a classes_
property with the class labels, however this doesn't appear to be available on the mlens ensemble classes.
What is the preferred way to map the prediction output columns to class labels?
When training a model, how to use sample_weight? I haven't seen weight parameter in fit function.
Does not seem to allow pickling a scorer created manually by the user through the make_scorer interface. Issue with the underlying callable function.
The estimator I am trying to fit accepts a pandas data frame as input in the fit method, using the column labels, however when using the SuperLearner, the data is converted to a numpy.ndarray when passing to the estimator's fit method, is there a way to preserve the column label data?
I am running the Anaconda release of python 3.6 under Windows 10. Anaconda does not yet have an mlens package. The installation instructions given on the ml-ens website (http://ml-ensemble.com/) suggest the command to install mlens is
pip install -U mlens
That causes Anaconda to issue several error messages and installation fails.
The links from ml-ensemble.com to MIT license and to Installation Details both result in 404 errors.
The command
pip install mlens
does sucessfully install mlens in the Anaconda environment.
Apparently, the ordering of the metrics of the summary
attribute on an Evaluator
seem to be jumping around when put into a DataFrame
. Likely due to some dictionary not being ordered.
Occasionally key error on Windows for param draws. Refractor and debug.
I'm trying the Subsemble implementation but always get:
Fitting 1 layers
Fit complete | 00:00:04
Predicting 1 layers
---------------------------------------------------------------------------
NotFittedError Traceback (most recent call last)
[...]
NotFittedError: Layer instance (layer-1) not fitted.
I'm currently on mlens 0.2.1 and I am testing it with the example in the docs (see below) and I still get the same issue. Any ideas?
class SimplePartitioner():
def __init__(self):
pass
def our_custom_function(self, X, y=None):
"""Split the data in half based on the sum of features"""
# Labels should be numerical
return 1 * (X.sum(axis=1) > X.sum(axis=1).mean())
# Note that the number of partitions the estimator creates *must* match the
# ``partitions`` argument passed to the Subsemble.
# The ``folds`` option is completely independent.
sub = Subsemble(partitions=2, folds=3, verbose=1)
sub.add([SVC(), LogisticRegression()],
partition_estimator=SimplePartitioner(),
fit_estimator=False,
attr="our_custom_function")
sub.fit(x_train, y_train)
sub.predict(x_test)
I ran the sample kernel script from kaggle: https://www.kaggle.com/flennerhag/ml-ensemble-scikit-learn-style-ensemble-learning
The script will run without issue. But, when I add a lightGBM to the base learner, the script will run for hours without finishing the calculation. I changed the nthread=1 (from -1), and set the Evaluator to backend='threading', njobs=1, and remove the xgboost from the base learner, the kernel will die whenever running the fit method.
Here is the parameter for the lightgbm:
lgb = LGBMRegressor(objective='regression', nthread=1,seed=SEED)
'lgb':
{'learning_rate': uniform(0.02,0.04),
'num_leaves': randint(50, 60),
'n_estimators': randint(150,200),
'min_child_weight': randint(30,60)}
setup for the Evaluator:
evl = Evaluator(scorer,
cv=2,
random_state=SEED,
verbose=5,
backend='threading',
n_jobs=1
)
Please let me know if I need to provide any more info.
best
Mike
Hi, mlens is a great tool! It saves my life and I use it to do kaggle competition.
The evaluator is a nice tool for model selection of meta-layer, but, evaluating an ensemble network consumes a lot of time. Can I export the best ensemble directly once the evaluator has fitted?
Now my work flow using mlens is like:
Step 4) is a bit duplicated and has double time expenses.
Is there anyway to avoid step 4) ? Am I using mlens correctly ?
Requires adding isinstance
test in parallel
modules to ensure right fold object is used for splits.
If estimators in layers have n_jobs>1
of nthread>1
, the joblib routine is likely to crash if the user have not set the 'forkserver'
start method in the main fork.
Before estimation, could easily check what the multiprocess context is, and if not 'forkserver'
, change all n_jobs
/ nthread
settings to 1
.
Hi,
I'm trying to run the Getting Started example and am hitting the following error, code and trace below:
Code:
`import numpy as np
from pandas import DataFrame
from mlens.metrics import make_scorer
from sklearn.metrics import f1_score
from sklearn.datasets import load_iris
seed = 2017
np.random.seed(seed)
f1 = make_scorer(f1_score, average='micro', greater_is_better=True)
data = load_iris()
idx = np.random.permutation(150)
X = data.data[idx]
y = data.target[idx]
from mlens.ensemble import SuperLearner
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
ensemble = SuperLearner(scorer=f1, random_state=seed)
ensemble.add([RandomForestClassifier(random_state=seed), SVC()])
ensemble.add_meta(LogisticRegression())
ensemble.fit(X[:75], y[:75])
preds = ensemble.predict(X[75:])
`
TypeError Traceback (most recent call last)
in ()
18
19 # Fit ensemble
---> 20 ensemble.fit(X[:75], y[:75])
21
22 # Predict
C:\Anaconda2\lib\site-packages\mlens\ensemble\base.pyc in fit(self, X, y)
714 X, y = X[idx], y[idx]
715
--> 716 self.scores_ = self.layers.fit(X, y)
717
718 return self
C:\Anaconda2\lib\site-packages\mlens\ensemble\base.pyc in fit(self, X, y, return_preds, **process_kwargs)
232 # Fit ensemble
233 try:
--> 234 processor.process()
235
236 if self.verbose:
C:\Anaconda2\lib\site-packages\mlens\parallel\manager.pyc in process(self)
216
217 for n, lyr in enumerate(self.layers.layers.values()):
--> 218 self._partial_process(n, lyr, parallel)
219
220 self.fitted = 1
C:\Anaconda2\lib\site-packages\mlens\parallel\manager.pyc in _partial_process(self, n, lyr, parallel)
306 kwargs['P'] = self.job.P[n + 1]
307
--> 308 f(**kwargs)
309
310
C:\Anaconda2\lib\site-packages\mlens\parallel\estimation.pyc in fit(self, X, y, P, dir, parallel)
198 # Load instances from cache and store as layer attributes
199 # Typically, as layer.estimators_, layer.preprocessing_
--> 200 self._assemble(dir)
201
202 if self.verbose:
C:\Anaconda2\lib\site-packages\mlens\parallel\estimation.pyc in assemble(self, dir)
93
94 if self.scorer is not None and self.layer.cls is not 'full':
---> 95 self.layer.scores = self._build_scores(s)
96
97 def _build_scores(self, s):
C:\Anaconda2\lib\site-packages\mlens\parallel\estimation.pyc in _build_scores(self, s)
126 # Aggregate to get cross-validated mean scores
127 for k, v in scores.items():
--> 128 scores[k] = (np.mean(v), np.std(v))
129
130 return scores
C:\Anaconda2\lib\site-packages\numpy\core\fromnumeric.pyc in mean(a, axis, dtype, out, keepdims)
2940
2941 return _methods._mean(a, axis=axis, dtype=dtype,
-> 2942 out=out, **kwargs)
2943
2944
C:\Anaconda2\lib\site-packages\numpy\core_methods.pyc in _mean(a, axis, dtype, out, keepdims)
63 dtype = mu.dtype('f8')
64
---> 65 ret = umr_sum(arr, axis, dtype, out, keepdims)
66 if isinstance(ret, mu.ndarray):
67 ret = um.true_divide(
TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'
Need some benchmarks to show the power of ensembles!
I'm trying to use mlens in a system I'm developing but, based on the documentation and the code, it's not really clear to me what propagate_features
values I should use given my data. Could you offer a bit of additional explanation in the tutorial so I know what should go in?
Allows rapid deployment of other ensemble types on backend base estimator fitting
Include layer name, case name, estimator name in trace back if estimation fails during fitting.
Need to build some sort of try-except block that retains the traceback stack so that joblib can pick it up.
predict_proba just does same thing like predict does
In multi-view learning situation, we have multiple feature sets for one object. For instance, for images and videos, color information and texture information are two different kinds of features, which can be regarded as two-view data. So, can mlens accept the input of this different data? That is to say, we would like to use SVM for color feature and KNN for texture information.
ValueError Traceback (most recent call last)
in ()
----> 1 print(evaluator.results.keys)
2 print("Score comparison with best params founds:\n\n%r" % evaluator.results)
C:\Users\sguest\Anaconda2\lib\site-packages\mlens-0.2.1-py2.7.egg\mlens\metrics\utils.pyc in repr(self)
92
93 def repr(self):
---> 94 return assemble_table(self, self.padding, self.decimals)
95
96
C:\Users\sguest\Anaconda2\lib\site-packages\mlens-0.2.1-py2.7.egg\mlens\metrics\utils.pyc in assemble_table(data, padding, decimals)
138 continue
139
--> 140 v_ = len(get_string(v, decimals))
141 if v > max_col_len[key]:
142 max_col_len[key] = v_
C:\Users\sguest\Anaconda2\lib\site-packages\mlens-0.2.1-py2.7.egg\mlens\metrics\utils.pyc in _get_string(obj, dec)
22 """Stringify object"""
23 try:
---> 24 return '{0:.{dec}f}'.format(obj, dec=dec)
25 except TypeError:
26 return obj.str()
ValueError: Unknown format code 'f' for object of type 'str'`
I just wonder whether the Nth layer is built on only the prediction from the models of the (N-1)th layer or it will also use the original raw input data.
Does the scoring function support an extra vector as input
ie A custom scoring function that takes (predicted_Y, true_Y, weight)
which the weight is used to calculate the final float score
Allow the user to pass an estimator instead of and integer in n_partitions
and use the estimator to predict class labels on the training set.
One thing I have been unable to figure out is how to get a k-fold cross validation score for the whole ensemble.
I have used sklearns built in cross_valid_score but this is very slow (I think because it ends up doing cvs whilst in another cv loop!).
How can I get a final k-fold cross validation score for the final ensemble please? (great package btw :) )
Passing a scorer to a Subsemble results in an error during estimation (when setting scores).
Code snippets with example use cases to get new users started.
Suppose:
set a learning rate e
.
while error < threshold:
a. update every parameter i
with -(error(n) - error(n-1)) * (param_i(n) - param_i(n-1)) * e
b. Fit and Predict. Compute the training error.
Problem.
All parameters changes are directed based on total error score. So if some parameter a
is on it's way to a good place, but some other parameter throws the scores off, a
will be thrown off too.
File "/root/miniconda2/lib/python2.7/site-packages/mlens/ensemble/base.py", line 614, in predict X, _ = check_inputs(X, check_level=self.array_check) File "/root/miniconda2/lib/python2.7/site-packages/mlens/utils/validation.py", line 562, in check_inputs X = _check_array(X) File "/root/miniconda2/lib/python2.7/site-packages/mlens/utils/validation.py", line 510, in _check_array warn_on_dtype=False # Mute as 'dtype' is 'None' File "/root/miniconda2/lib/python2.7/site-packages/mlens/externals/sklearn/validation.py", line 388, in check_array _assert_all_finite(array) File "/root/miniconda2/lib/python2.7/site-packages/mlens/externals/sklearn/validation.py", line 46, in _assert_all_finite " or a value too large for %r." % X.dtype) ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I have checked the data which is no nan and i can train the data with single model.my code show in the below:
ensemble.add('stack',model['ex']) ensemble.add('blend',model['ex']) ensemble.add('subsemble',model['ex']) meta = "ls" ensemble.add_meta(model[meta]) ensemble.fit(X_train,y_train) preds1 = np.exp(ensemble.predict(X))
If we could build a base layer transformer class, it would be easy to convert every ensemble class to a base transformer class.
I'm able to fit a model that includes KerasClassifier as a model in the ensemble. However, at prediction time I get the following error. I've tried changing the backend as well as the number of jobs but to no avail. Any ideas?
`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
----> 1 preds = ensemble.predict_proba(X[294:])
D:\Continuum\anaconda3\lib\site-packages\mlens\ensemble\base.py in predict_proba(self, X, **kwargs)
633 """
634 kwargs.pop('proba', None)
--> 635 return self.predict(X, proba=True, **kwargs)
636
637 def _build_layer(self, estimators, indexer, preprocessing, **kwargs):
D:\Continuum\anaconda3\lib\site-packages\mlens\ensemble\base.py in predict(self, X, **kwargs)
613 return
614 X, _ = check_inputs(X, check_level=self.array_check)
--> 615 return self._backend.predict(X, **kwargs)
616
617 def predict_proba(self, X, **kwargs):
D:\Continuum\anaconda3\lib\site-packages\mlens\ensemble\base.py in predict(self, X, **kwargs)
199 predictions from final layer.
200 """
--> 201 if not self.fitted:
202 NotFittedError("Instance not fitted.")
203
D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in fitted(self)
359 if not self.stack or not self._check_static_params():
360 return False
--> 361 return all([g.fitted for g in self.stack])
362
363 @Property
D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in (.0)
359 if not self.stack or not self._check_static_params():
360 return False
--> 361 return all([g.fitted for g in self.stack])
362
363 @Property
D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in fitted(self)
359 if not self.stack or not self._check_static_params():
360 return False
--> 361 return all([g.fitted for g in self.stack])
362
363 @Property
D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in (.0)
359 if not self.stack or not self._check_static_params():
360 return False
--> 361 return all([g.fitted for g in self.stack])
362
363 @Property
D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\handles.py in fitted(self)
256 if not self._check_static_params():
257 return False
--> 258 return all([o.fitted for o in self.learners + self.transformers])
259
260 def get_params(self, deep=True):
D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\handles.py in (.0)
256 if not self._check_static_params():
257 return False
--> 258 return all([o.fitted for o in self.learners + self.transformers])
259
260 def get_params(self, deep=True):
D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\learner.py in fitted(self)
743 # Check estimator param overlap
744 fitted = self.learner + self.sublearners
--> 745 fitted_params = fitted[0].estimator.get_params(deep=True)
746 model_estimator_params = self.estimator.get_params(deep=True)
747
D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\learner.py in estimator(self)
64 def estimator(self):
65 """Deep copy of estimator"""
---> 66 return deepcopy(self._estimator)
67
68 @estimator.setter
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.
D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:
D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.
D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:
D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.
D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:
D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:
D:\Continuum\anaconda3\lib\copy.py in _deepcopy_list(x, memo, deepcopy)
213 append = y.append
214 for a in x:
--> 215 append(deepcopy(a, memo))
216 return y
217 d[list] = _deepcopy_list
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.
D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:
D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.
D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:
D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
178 y = x
179 else:
--> 180 y = _reconstruct(x, memo, *rv)
181
182 # If is its own copy, don't memoize.
D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
278 if state is not None:
279 if deep:
--> 280 state = deepcopy(state, memo)
281 if hasattr(y, 'setstate'):
282 y.setstate(state)
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
148 copier = _deepcopy_dispatch.get(cls)
149 if copier:
--> 150 y = copier(x, memo)
151 else:
152 try:
D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy)
238 memo[id(x)] = y
239 for key, value in x.items():
--> 240 y[deepcopy(key, memo)] = deepcopy(value, memo)
241 return y
242 d[dict] = _deepcopy_dict
D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil)
167 reductor = getattr(x, "reduce_ex", None)
168 if reductor:
--> 169 rv = reductor(4)
170 else:
171 reductor = getattr(x, "reduce", None)
TypeError: can't pickle _thread.RLock objects`
To avoid excessive memory strain. Will require changing how parallel fitting of base estimators behave.
Ensure that parallel
pulls through and behaves as if failed estimator had never been passed.
A K-fold like time series indexer for time series cross-validation. The indexer should do non-overlapping:
fold | train obs | test obs |
---|---|---|
0 | 0 | 1 |
1 | 0, 1 | 2 |
2 | 0, 1, 2 | 3 |
Will different layers of models share the same split? Or, we need to specify the split.
Thanks
Making ensemble initiate without estimator, and add add_layer method to construct ensemble of general layer structure.
Hi again, I'm working through the model selection section and I think I've hit a bug. I'm passing scipy distributions as indicated in the examples, but omitting from below as there's quite a lot.
Truncated Code:
`from mlens.metrics import make_scorer
mae_scorer = make_scorer(mean_absolute_error, greater_is_better=False)
evaluator = Evaluator(mae_scorer, cv=5, random_state=seed, verbose=1)
evaluator.evaluate(X_mean, y, estimators, params, n_iter=10)`
Error:
`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
65
66 evaluator = Evaluator(mae_scorer, cv=5, random_state=seed, verbose=1)
---> 67 evaluator.evaluate(X_mean, y, estimators, params, n_iter=10)
C:\Anaconda2\lib\site-packages\mlens\model_selection\model_selection.pyc in evaluate(self, X, y, estimators, param_dicts, n_iter)
387 self.estimators = check_instances(estimators)
388 self.n_iter = n_iter
--> 389 self._param_sets(param_dicts)
390
391 if self.verbose > 0:
C:\Anaconda2\lib\site-packages\mlens\model_selection\model_selection.pyc in _param_sets(self, param_dicts)
494 # the expected param_dicts key is 'est_name'
495 for est_name, _ in self.estimators:
--> 496 self._set_params(param_dicts, est_name)
497 else:
498 # Preprocessing
C:\Anaconda2\lib\site-packages\mlens\model_selection\model_selection.pyc in _set_params(self, param_dicts, key)
478 try:
479 self.params[key] =
--> 480 self._draw_params(param_dicts[key])
481 except KeyError:
482 # No param draws desired. Set empty dict.
C:\Anaconda2\lib\site-packages\mlens\model_selection\model_selection.pyc in _draw_params(self, param_dists)
469 draws = dist.rvs(self.n_iter, random_state=self.random_state)
470
--> 471 for i, draw in enumerate(draws):
472 param_draws[i][param] = draw
473
TypeError: 'int' object is not iterable
`
Looks to me as though draws = dist.rvs(self.n_iter, random_state=self.random_state)
should be draws = dist.rvs(size=self.n_iter, random_state=self.random_state)
?
Fit: check layer exists
Fit: check meta estimator exists
Predict: check fitted (hasattr layer_)
Allow users to pass a dictionary of input arrays to any class that inherist mlens.parallel.base.BaseStacker
, where keys should be the name of the items in the stack
attribute.
For ensembles and estimators, will need some form of exception handling for check_inputs
.
Hi, I have found an issue about SuperLearner.scores_.
I have fitted an SuperLearner ensemble and I wanted to check the CV scores of base learners by typing pd.DataFrame(ensemble.scores_)
. However, an error occurs:
AttributeError: 'SuperLearner' object has no attribute 'scores_'
This is wired. 1) I have checked my instantiated and fitted ensemble
. There is indeed no scores_
attribute. 2) I've never seen this issue before.
(And what make me crushed is that I spent a long time fitting this ensemble, only to found I can't see how my base learners behave...)
Anyway, here is my code:
ensemble = SuperLearner(scorer=mean_absolute_error, folds=5, random_state=seed, n_jobs=-1, shuffle=True)
ensemble.add([('et01', ExtraTreesRegressor(n_estimators=..., max_depth=..., n_jobs=-1)),
('et02', ExtraTreesRegressor(n_estimators=..., max_depth=..., n_jobs=-1)),
('et03', ExtraTreesRegressor(n_estimators=..., max_depth=..., n_jobs=-1)),
('xgb01', XGBRegressor(n_estimators=..., max_depth=..., learning_rate=..., nthread=20)),
('xgb02', XGBRegressor(n_estimators=..., max_depth=..., learning_rate=..., nthread=20)),
('xgb03', XGBRegressor(n_estimators=..., learning_rate=..., max_depth=..., gamma=..., nthread=20)),
('rf01', RandomForestRegressor(n_estimators=..., n_jobs=-1)),
('rf02', RandomForestRegressor(n_estimators=..., n_jobs=-1)),
('rf03', RandomForestRegressor(n_estimators=..., n_jobs=-1)),
('ridge01', Ridge(alpha=...)),
('ridge02', Ridge(alpha=...)),
('ridge03', Ridge(alpha=...)),
('lasso01', Lasso(alpha=...)),
('lasso02', Lasso(alpha=...)),
('lasso03', Lasso(alpha=...)),
('lgbm01', LGBMRegressor(n_estimators=..., learning_rate=...)),
('lgbm02', LGBMRegressor(n_estimators=..., learning_rate=...)),
('lgbm03', LGBMRegressor(n_estimators=..., learning_rate=...)),
('mlp01', MLPRegressor(hidden_layer_sizes=(...,))),
('mlp02', MLPRegressor(hidden_layer_sizes=(...,)))
])
ensemble.add_meta(Ridge(alpha=..., fit_intercept=False))
ensemble.fit(X, y)
print pd.DataFrame(ensemble.scores_)
So, my question is:
Q1. Is this the problem of my code or mlens ? I'm using 0.1.6 version
Q2. If this is the problem of my code, where should I change ?
Upgrading to joblib > 9 seems to reduce parallelism. This doesn't seem to be a big issue on large datasets, but on smaller datasets parallelization appears to be significantly lower.
Landscape code quality hooks doesn't seem to be working. Need a new code quality CI.
This seems irrelevant, but I think this is a great package and should have way more interest surrounding it. One of the reasons it might get looked over is that it doesn't appear in searches on google or on github when you search for stack
or stacking
, where as some other ensembling/stacking packages do!
Maybe adding this key search word to the readme and the docs would help :) Hopefully this is a useful suggestion.
FYI, I noticed this whilst looking for other package to benchmarking mlens against.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.