Coder Social home page Coder Social logo

embedder's People

Contributors

dkn22 avatar lukyanenkomax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

embedder's Issues

Setting as_df=True in embedder.transoform causes an error

I tried to set as_df=True in fit_transform but it causes an error:
Traceback (most recent call last):
File "emb_test.py", line 157, in
emb_data=embedder.fit_transform(data_encoded,y,as_df=True)
File "/opt/conda/lib/python3.6/site-packages/embedder/classification.py", line 73, in fit_transform
return self.transform(X, as_df=as_df)
File "/opt/conda/lib/python3.6/site-packages/embedder/base.py", line 136, in transform
names = [var + '_{}'.format(x) for x in range(emb_dim)
NameError: name 'emb_dim' is not defined
It seems like it is some kind of issue with list comprehensions.
This issue can be fixed by replacing lines 136 and 137 with
names = [var + '_{}'.format(x) for var, emb_dim in sizes for x in range(emb_dim)]
Next error after performing this fix is:
Traceback (most recent call last):
File "emb_test.py", line 157, in
emb_data=embedder.fit_transform(data_encoded,y,as_df=True)
File "/opt/conda/lib/python3.6/site-packages/embedder/classification.py", line 73, in fit_transform
return self.transform(X, as_df=as_df)
File "/opt/conda/lib/python3.6/site-packages/embedder/base.py", line 140, in transform
embedded = pd.DataFrame(embedded, columns=names)
NameError: name 'pd' is not defined
It can be fixed by adding import pandas as pd in the head of base.py

Returned embbeded values

Hey, thanks for the wrapper.
Are the returned values after embedding sorted.
For example - the DayOfWeek column in rossman dataset can have values ranging from 1 to 7. So, after the embedding is done, i get a matrix which contains the embedded values, is it safe to assume that the first embedded value corresponds to DayOfWeek=1, second to DayOfWeek=2 and so on...

Thanks

train, test split

How would you do a train/test split with the pipeline? I seem to get an error if I run the xgboost regression when passing the X_train and X_test like this after encoding:

X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.2, random_state=2001)
xgb_train = xgboost.DMatrix(X_train, label=y_train)
xgb_test = xgboost.DMatrix(X_test, label=y_test)

eval_set = [(X_train, y_train), (X_test, y_test)]
xgb_model.fit(X_train, y_train, model__eval_set=eval_set, model__verbose=True,
              model__early_stopping_rounds=50);

The error tells me that the variable names 'f0', etc. (which correspond to the embedded categories) can't be found in list of variables that that weren't encoded.

MXNetError while fitting Embedder

In trying to run the following flow:

from embedder.preprocessing import (categorize,pick_emb_dim, encode_categorical)
categorical_variable_count = categorize(X)
dict_embedding_size = pick_emb_dim(categorical_variable_count, max_dim=50)
X_encoded, encoders = encode_categorical(X)
embedder = Embedder(dict_embedding_size, model_json=None)
embedder.fit(X_encoded, y)

getting the exception:

---------------------------------------------------------------------------
MXNetError                                Traceback (most recent call last)
<ipython-input-11-0ccee6a2b185> in <module>
      5 from embedder.classification import Embedder
      6 embedder = Embedder(dict_embedding_size, model_json=None)
----> 7 embedder.fit(X_encoded, y)
      8 
      9 print('PreProcessing time: '+ str(datetime.now() - start_time))

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/embedder/classification.py in fit(self, X, y, batch_size, epochs, checkpoint, early_stop)
     31         '''
     32 
---> 33         nnet = self._create_model(X, model_json=self.model_json)
     34 
     35         nnet.compile(loss='binary_crossentropy',

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/embedder/base.py in _create_model(self, X, model_json)
     85         if model_json is None:
     86             if hasattr(self, '_default_nnet'):
---> 87                 nnet = self._default_nnet(X)
     88             else:
     89                 raise ValueError('No model architecture provided.')

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/embedder/classification.py in _default_nnet(self, X)
     97         flatten = concatenate(flatten_layers, axis=-1)
     98 
---> 99         fc1 = Dense(1000, kernel_initializer='normal')(flatten)
    100         fc1 = Activation('relu')(fc1)
    101         # fc1 = BatchNormalization(fc1)

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/keras/engine/base_layer.py in __call__(self, inputs, **kwargs)
    468             # Actually call the layer,
    469             # collecting output(s), mask(s), and shape(s).
--> 470             output = self.call(inputs, **kwargs)
    471             output_mask = self.compute_mask(inputs, previous_mask)
    472 

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/keras/layers/core.py in call(self, inputs)
    891         output = K.dot(inputs, self.kernel)
    892         if self.use_bias:
--> 893             output = K.bias_add(output, self.bias, data_format='channels_last')
    894         if self.activation is not None:
    895             output = self.activation(output)

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/keras/backend/mxnet_backend.py in func_wrapper(*args, **kwargs)
     92                 # Create Train Symbol
     93                 set_learning_phase(1)
---> 94                 train_symbol = func(*args, **kwargs)
     95                 # Create Test Symbol
     96                 set_learning_phase(0)

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/keras/backend/mxnet_backend.py in bias_add(x, bias, data_format)
   3980         raise ValueError('MXNet Backend: Unknown data_format ' + str(data_format))
   3981     bias_shape = int_shape(bias)
-> 3982     x_dim = ndim(x)
   3983     if len(bias_shape) != 1 and len(bias_shape) != x_dim - 1:
   3984         raise ValueError('MXNet Backend: Unexpected bias dimensions %d, expect to be 1 or %d dimensions'

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/keras/backend/mxnet_backend.py in ndim(x)
    533     ```
    534     """
--> 535     shape = x.shape
    536     if shape is not None:
    537         return len(shape)

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/keras/backend/mxnet_backend.py in shape(self)
   4393     @property
   4394     def shape(self):
-> 4395         return self._get_shape()
   4396 
   4397     def eval(self):

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/keras/backend/mxnet_backend.py in _get_shape(self)
   4402             return self._keras_shape
   4403         else:
-> 4404             _, out_shape, _ = self.symbol.infer_shape_partial()
   4405             return out_shape[0]
   4406 

~/anaconda3/envs/mxnet_latest_p37/cpu/lib/python3.7/site-packages/mxnet/symbol/symbol.py in infer_shape_partial(self, *args, **kwargs)
   1175             The order is same as the order of list_auxiliary_states().
   1176         """
-> 1177         return self._infer_shape_impl(True, *args, **kwargs)
   1178 
   1179     def _infer_shape_impl(self, partial, *args, **kwargs):

~/anaconda3/envs/mxnet_latest_p37/cpu/lib/python3.7/site-packages/mxnet/symbol/symbol.py in _infer_shape_impl(self, partial, *args, **kwargs)
   1263                 ctypes.byref(aux_shape_ndim),
   1264                 ctypes.byref(aux_shape_data),
-> 1265                 ctypes.byref(complete)))
   1266         if complete.value != 0:
   1267             arg_shapes = [tuple(arg_shape_data[i][:arg_shape_ndim[i]])

~/anaconda3/envs/mxnet_latest_p37/cpu/lib/python3.7/site-packages/mxnet/base.py in check_call(ret)
    244     """
    245     if ret != 0:
--> 246         raise get_last_ffi_error()
    247 
    248 

MXNetError: MXNetError: Error in operator dot0: [19:57:48] src/operator/tensor/./dot-inl.h:1241: Check failed: L[!Ta].Size() == R[Tb].Size() (76 vs. 292) : dot shape error: [-1,76] X [292,1000]

Have you seen this before?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.