Coder Social home page Coder Social logo

keras-pandas's People

Contributors

bjherger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras-pandas's Issues

requirements.txt missing from manifest, setup.py fails

$ pip install keras-pandas
Collecting keras-pandas
  Using cached https://files.pythonhosted.org/packages/b6/4f/cd2e9c9d25024bc76d8806966bc128d4f24e37d7fb64d6ab8f7ed9422601/keras-pandas-1.3.3.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-v828ze62/keras-pandas/setup.py", line 12, in <module>
        with open('requirements.txt') as f:
    FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-v828ze62/keras-pandas/

Fails on python 2.7 and 3.5.2

requirements.txt seems to be missing from the tarball. You probably need to explicitly create a MANIFEST.in to pick it up.

tf.keras instead of keras

Hi,

Thanks for the time you are putting to make life easier between pandas and keras!
I was checking out the code and the current issues and I saw that you are using the original keras module and not the version inside TensorFlow core.

Is there any reason in particular to this? If not, what would be your thoughts about changing this?

Add support for Categorical data types

Transformations

  • Categorical data types have null handling
  • Categorical data types are normalized
  • Automater can appropriately transform categorical-only dataframes

Modeling

  • Automater can produce input nubs for categorical-only dataframes
  • Automater can produce output nub for categorical-only dataframes

Code base should still meet existing unittests, including those for numeric data types.

Video tutorial

Create a screen capture tutorial, w/ voice over, showing how to use project.

Contributing.md

Move contributing info from README to a separate contributing.md file

Consistent examples

Consistent examples, including:

  • Train / test / validate split
  • Examples for all supported data types
  • Add example requirement to new data type workflow in contributing.md

Add CI/CD PyPi links

  • README & setup.py should have CI/CD and PyPi links
  • README should have about author section (?)

Examples

This project should have at least two examples

Add links to README

Add following links to README:

  • Source Code
  • Documentation
  • PyPi registration
  • CI / travis (?)

Add support for numeric data types

Acceptance criteria:

  • Numeric data types have null handling
  • Numeric data types are normalized
  • Automater can appropriately transform numeric-only dataframes
  • Automater can produce input nubs for numeric-only dataframes

Create README

Create README file, with the following sections:

  • Quick start
  • Project purpose
  • Installation guide
  • Guiding principles
  • Contributing

Spaces and parentheses in columns cause "not a valid scope name"

If Dataframe column names have spaces or parentheses, the fit() function raises an exception:

import pandas
import keras_pandas.Automater

data_good = pandas.DataFrame({'length': [1.0]})
data_bad = pandas.DataFrame({'length (cm)': [1.0]})

auto_good = keras_pandas.Automater.Automater(numerical_vars=['length'])
auto_bad = keras_pandas.Automater.Automater(numerical_vars=['length (cm)'])

auto_good.fit(data_good)
auto_bad.fit(data_bad)

Full traceback:

>>> auto_bad.fit(data_bad)
/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/constants.py:33: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  transformed = input_dataframe[variable].as_matrix()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/Automater.py", line 92, in fit
    input_layers, input_nub = self._create_input_nub(self._variable_type_dict, input_variables_df)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/Automater.py", line 272, in _create_input_nub
    variable_input, variable_input_nub_tip = variable_type_handler(variable, input_dataframe)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/constants.py", line 42, in input_nub_numeric_handler
    input_layer = keras.Input(shape=(input_sequence_length,), dtype='float32', name='input_{}'.format(variable))
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/engine/input_layer.py", line 178, in Input
    input_tensor=tensor)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/engine/input_layer.py", line 87, in __init__
    name=self.name)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 517, in placeholder
    x = tf.placeholder(dtype, shape=shape, name=name)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1745, in placeholder
    return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5020, in placeholder
    "Placeholder", dtype=dtype, shape=shape, name=name)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 394, in _apply_op_helper
    with g.as_default(), ops.name_scope(name) as scope:
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6040, in __enter__
    return self._name_scope.__enter__()
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4004, in name_scope
    raise ValueError("'%s' is not a valid scope name" % name)
ValueError: 'input_length (cm)' is not a valid scope name

I just fell over this using the Iris dataset from Sklearn. If it's an intended constraint it should be documented, but I suspect it's not.

This is using Python 3.6.5, keras-pandas 2.2.0, pandas 0.23.4.

Read the docs versioning

Read the docs seems to only capture latest. It would be helpful to keep documentation for all versions.

Address transient issue

Transiet unittest issue

======================================================================
FAIL: test_transform_no_response (testtext.TestText)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/bjherger/keras-pandas/tests/testtext.py", line 46, in test_transform_no_response
    self.assertCountEqual([2, 3, 4, 5], list(X[0][0]))
AssertionError: Element counts were not equal:
First has 1, Second has 0:  5
First has 0, Second has 1:  894

----------------------------------------------------------------------

Consistent variable -> var type mapper

There should be a single function, which:

  • Validates that the variable is in the set of available variables
  • Provides the variable type for that variable

Rename test files

Rename test files to be consistent w/ Google's Python Style Guide

No Module 'sklearn_pandas' for Automater

When trying to replicate the example, I receive this error... Looks like it is not finding the class sklearn_pandas in Automator

from keras import Model
from keras.layers import Dense

from keras_pandas.Automater import Automater
from keras_pandas.lib import load_titanic

observations = load_titanic()

# Transform the data set, using keras_pandas
categorical_vars = ['pclass', 'sex', 'survived']
numerical_vars = ['age', 'siblings_spouses_aboard', 'parents_children_aboard', 'fare']
text_vars = ['name']

auto = Automater(categorical_vars=categorical_vars, numerical_vars=numerical_vars, text_vars=text_vars,
 response_var='survived')
X, y = auto.fit_transform(observations)

# Start model with provided input nub
x = auto.input_nub

# Fill in your own hidden layers
x = Dense(32)(x)
x = Dense(32, activation='relu')(x)
x = Dense(32)(x)

# End model with provided output nub
x = auto.output_nub(x)

model = Model(inputs=auto.input_layers, outputs=x)
model.compile(optimizer='Adam', loss=auto.loss, metrics=['accuracy'])

# Train model
model.fit(X, y, epochs=4, validation_split=.2)

The traceback:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-509-d3a513b032e9> in <module>()
      2 from keras.layers import Dense
      3 
----> 4 from keras_pandas.Automater import Automater
      5 from keras_pandas.lib import load_titanic
      6 

~\Documents\python_hub\lib\site-packages\keras_pandas\Automater.py in <module>()
      5 from keras.engine import Layer
      6 from keras.layers import Concatenate, Dense
----> 7 from sklearn_pandas import DataFrameMapper
      8 
      9 from keras_pandas import constants, lib

ModuleNotFoundError: No module named 'sklearn_pandas'

Datatype level classes

Currently, a single datatype will have information in multiple locations:

Current state

Required

  • Automater.init for passing in variable list
  • constants.py for default pipeline
  • constants.py for input handler
  • constants.py for input handler lookup

Optional: Output data type

  • constants.py Suggested loss
  • Automater._create_output_nub for creating an output nub
  • Automater.inverse_transform_output for inverse transforming the output data type.

Future state

This is absurd, and difficult to support / maintain. Another path might be to create an interface class, VariableTypeHandler, which includes the following methods:

  • init
  • default_transformation_pipeline
  • input_nub_generator
  • output_nub_generator (optional)
  • output_inverse_transform (optional)
  • output_suggested_loss (optional)

Boolean input handler

Boolean input handler should be the same as categorical, or boolean types should be removed

Time series support

Build out time series support:

  • Follow new data type workflow, described in contributing.md
  • Can be based on text var handlers (No text preprocessing, similar padding, similar input nub)

Support for variable list structure & checking

Acceptance criteria:

  • Confirm that variables appear in only one variable list
  • Add variable lists from __init__ to internal state
  • Implement _check_input_dataframe_columns_, to check that the input dataframe has the required columns

Failing text test

Find and fix transient issue:

======================================================================
FAIL: test_transform_no_response (testtext.TestText)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/bjherger/keras-pandas/tests/testtext.py", line 46, in test_transform_no_response
    self.assertCountEqual([2, 3, 4, 5], list(X[0][0]))
AssertionError: Element counts were not equal:
First has 1, Second has 0:  5
First has 0, Second has 1:  43

----------------------------------------------------------------------
Ran 23 tests in 84.650s

Timestamp output layer

Support for timestamp output type, via a default timestamp output layer

  • Probably just a single node dense layer, predicting the time in epoch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.