rushter / mlalgorithms Goto Github PK

View Code? Open in Web Editor NEW

10.7K 423.0 1.8K 11.64 MB

Minimal and clean examples of machine learning algorithms implementations

License: MIT License

Python 99.86% Dockerfile 0.14%

machine-learning deep-learning neural-networks machine-learning-algorithms python

mlalgorithms's Introduction

Machine learning algorithms

A collection of minimal and clean implementations of machine learning algorithms.

Why?

This project is targeting people who want to learn internals of ml algorithms or implement them from scratch.
The code is much easier to follow than the optimized libraries and easier to play with.
All algorithms are implemented in Python, using numpy, scipy and autograd.

Implemented:

Installation

        git clone https://github.com/rushter/MLAlgorithms
        cd MLAlgorithms
        pip install scipy numpy
        python setup.py develop

How to run examples without installation

        cd MLAlgorithms
        python -m examples.linear_models

How to run examples within Docker

        cd MLAlgorithms
        docker build -t mlalgorithms .
        docker run --rm -it mlalgorithms bash
        python -m examples.linear_models

Contributing

Your contributions are always welcome!
Feel free to improve existing code, documentation or implement new algorithm.
Please open an issue to propose your changes if they are big enough.

mlalgorithms's People

Contributors

Stargazers

Watchers

Forkers

ahillard dmartinalbo datahack-ru chaecramb bodidze gpernelle rae83 jmrinaldi misrcrocodile satpreetsingh ddbs nathania dutn158 tngamemo zmallen mathkann robustfengbin awesome-archive easonlv chenchaodev gzzgz wellwang xiaahui fangbinwei toashishagarwal-zz irwenqiang longhao001 jizhihang adrianhust little1tow stevenlol joeywangjun kustomzone prashanthreddyburri tbfly googlepeng wuqixiaobai lhy20 wanghanbin ollie314 kinshukbasu mohanl empia hitluobin wenqianwang mohits10 cpt-jenning ntorkildson bryanbocao cbuie coll3ctions jeff-lewis memorysaver benjamesbabala dim25 keunwoochoi olliethomas snowdj gtostock fxiao7 loganfreeman prokopyev pkan0583 gth158a wanjinchang tspannhw hammingcube stephanesbizzera mlh14 laisun minhpqn iamsile keyky woodstone121 ml-ai-nlp-ir hhy5277 pawanadh wrightxyy sktgintokii dmadan86 vyraun kenhollandwhy zhanghan328 64json dongqing7 tuannh99 bigeyedestroyer dpraimeyuu endika fage2016 realalexbarge tbrittoborges phonbopit bitumin coddinglxf sotsugov hbcbh1999 homoni manjufy nakinkosb

mlalgorithms's Issues

Questions about linear_models.py

https://github.com/rushter/MLAlgorithms/blob/master/mla/linear_models.py#L53
When calculating L2 regularization, the formula used is (0.5 * self.C) * (w[1:] ** 2).mean(), but according to https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c, it should be their sum.
https://github.com/rushter/MLAlgorithms/blob/master/mla/linear_models.py#L130
The sigmoid function used is 0.5 * (np.tanh(x) + 1), rather than the traditional 1/(1+np.exp(-x)), what's the motivation of doing so?
Thanks in advance.

PCA: the calculation of proportion of vairance

In https://github.com/rushter/MLAlgorithms/blob/master/mla/pca.py#L48, the variance ratio of a component is calculated as the square of its eigenvalue divided by the sum of all squared eigenvalues.
But according to https://stats.stackexchange.com/questions/31908/what-is-percentage-of-variance-in-pca, the proportion of variance of a component should be its eigenvector divided by the sum of all eigenvectors. Any ideas?

Meta Tags

I suggest you to used this website for a good meta tags

https://megatags.co/

i suggest you this also for meta tags

https://www.seoptimer.com/meta-tag-generator

RNN output value calculation

According to the Andrew Ng's deep learning course (a for hidden state, y for output value):

We get output values by multiplying the hidden state by a weight matrix Wya, adding bias by onto it, and then go through an activation function.

But from

MLAlgorithms/mla/neuralnet/layers/recurrent/rnn.py

Lines 55 to 63 in 6e383f7

    
           for i in range(n_timesteps): 
        
               states[:, i, :] = np.tanh(np.dot(X[:, i, :], p['W']) + np.dot(states[:, i - 1, :], p['U']) + p['b']) 
        
           self.states = states 
        
           self.hprev = states[:, n_timesteps - 1, :].copy() 
        
           if self.return_sequences: 
        
               return states[:, 0:-1, :] 
        
           else: 
        
               return states[:, -2, :]

, it seems the hidden state is directly returned.

@rushter Can you please give your reference of RNN or confirm it as a bug? If it's a bug, I'd like to create a PR to fix it. 😄

Why am I getting overflows?

Hi! I don't know why I'm getting overflows training a NeuralNet. Actually I don't know if the problem is related to this project or to autograd. I think @mattjj could help, too.

With this sample data I'm getting overflows in exp and power, getting NaN as predictions. I can't understand where the problem is.

With Python3 you can execute this:

from io import StringIO

import pandas as pd

from mla.neuralnet import NeuralNet
from mla.neuralnet.layers import Activation, Dense
from mla.neuralnet.optimizers import Adam
from mla.neuralnet.parameters import Parameters

X_train_string = '-0.10410  6.106   6.232   97.50 -0.17490  0.16960 -0.4665  0.4242  289.4  723.8  2.6  2.6\n0.51340  5.517  12.160   90.83 -0.07771  0.07383 -0.2153  0.2133  361.6  706.9  2.6  2.6\n0.40280  5.359   9.403   79.45 -0.05685  0.06042 -0.2095  0.1873  331.9  676.0  2.6  2.6\n-0.02972  5.821   8.682   70.49 -0.05592  0.04690 -0.2113  0.2078  323.2  649.6  2.6  2.6\n-1.06600  6.893   3.798  132.00 -0.11060  0.10940 -0.3516  0.3494  245.4  800.6  2.6  2.6\n-0.66730  5.795   4.824   63.08 -0.04957  0.05160 -0.1446  0.1654  265.7  626.0  2.6  2.6\n0.11930  8.804  18.630  253.90 -0.17100  0.17270 -0.3759  0.3657  416.9  995.8  2.6  2.6'

y_train_string = '164.0\n156.5\n134.5\n142.5\n195.6\n134.2\n272.0'

X_test_string = '-0.1041  6.106   6.232  97.50 -0.17490  0.16960 -0.4665  0.4242  289.4  723.8  2.6  2.6\n0.5134  5.517  12.160  90.83 -0.07771  0.07383 -0.2153  0.2133  361.6  706.9  2.6  2.6\n0.4028  5.359   9.403  79.45 -0.05685  0.06042 -0.2095  0.1873  331.9  676.0  2.6  2.6'

y_test_string = '164.0\n156.5\n134.5'

X_train = pd.read_csv(StringIO(X_train_string), header=None, delim_whitespace=True)
y_train = pd.read_csv(StringIO(y_train_string), header=None, delim_whitespace=True)
X_test = pd.read_csv(StringIO(X_test_string), header=None, delim_whitespace=True)
y_test = pd.read_csv(StringIO(y_test_string), header=None, delim_whitespace=True)

net = NeuralNet(layers=[                    Dense(20, Parameters(init='normal')),
                                            Activation('sigmoid'),
                                            Dense(1),
                                        ],
                                        loss='mse',
                                        optimizer=Adam(),
                                        metric='mse',
                                        batch_size=256,
                                        max_epochs=200)

net.fit(X_train,y_train)

predictions = net.predict(X_test)

I'm getting this

/home/antonio/.conda/envs/intervalML/lib/python3.6/site-packages/autograd/core.py:81: RuntimeWarning: overflow encountered in exp
  result_value = self.fun(*argvals, **kwargs)
/home/antonio/.conda/envs/intervalML/lib/python3.6/site-packages/autograd/core.py:81: RuntimeWarning: overflow encountered in power
  result_value = self.fun(*argvals, **kwargs)
/home/antonio/.conda/envs/intervalML/lib/python3.6/site-packages/autograd/core.py:81: RuntimeWarning: invalid value encountered in multiply
  result_value = self.fun(*argvals, **kwargs)

So predictions is full NaN...

In [2]: predictions
Out[2]: 
array([[ nan],
       [ nan],
       [ nan]])

Where's the problem?

Thank you! Your projects are really useful :)

'c' argument looks like a single numeric RGB or RGBA sequence,

Clean work!

Some examples return the following:

'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
<Figure size 640x480 with 1 Axes>

try the following in a notebook in google colab to reproduce:

from google.colab import drive
drive.mount('/content/drive')
!git clone https://github.com/rushter/MLAlgorithms "/content/drive/My Drive/cloned/MLAlgorithms/"
%cd "/content/drive/My Drive/cloned/MLAlgorithms/"
!python setup.py develop

!python -m examples.kmeans

Best regards,
D

do you have examples for Factorization machines

do you have examples for
Factorization machines
like
How to run examples without installation
cd MLAlgorithms
python -m examples.linear_models

about linear_models.py

The function of _add_intercept(X) in BasicRegression returns np.concatenate([b, X], axis=1).
I don't figure out the reason since I thought it should be np.concatenate([X,b], axis=1) to match with the function of _add_penalty.
please check it.

error

There exists a problem about the function ''ax.scatter()" in kmeans.py.
TypeError: scatter() got multiple values for argument 'c'

unit tests

as we start to add more dimensionality reduction algorithms (eg t-sne, pca, sparse projections), should we include a separate set of unit tests?

it could be something along the lines of "test_reduction", to complement test_classification and test_regression.

lmk

I would like to implement GANs

Project dependencies may have API risk issues

Hi, In MLAlgorithms, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

tqdm
matplotlib>=1.5.1
numpy>=1.11.1
scikit-learn>=0.18
scipy>=0.18.0
seaborn>=0.7.1
autograd>=1.1.7
gym

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project,
The version constraint of dependency gym can be changed to >=0.6.0,<=0.22.0.

The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

In version gym-0.5.7, the API gym.wrappers.Monitor whch is used by the current project in mla/rl/dqn.py is missing.

The calling methods from the gym

gym.wrappers.Monitor

The calling methods from the all methods

self.update
mla.knn.KNNClassifier
f_entropy
sklearn.metrics.accuracy_score
self.env.render
ax.scatter
self._sample.sum
axes.scatter
classification
sample
mla.linear_models.LinearRegression
output.append
mla.svm.svm.SVM
numpy.sign
self.cost_func
autograd.numpy.concatenate
autograd.numpy.ones
itertools.combinations
mla.ensemble.tree.Tree.train
self.arg_max.flatten
mla.neuralnet.layers.Dropout
y.x.dist.cdist.self.gamma.np.exp.flatten
self._params.reshape
numpy.ceil
self._closest
col.transpose.reshape.reshape
i.self.assignments.sum
self._find_bounds
load
X.copy.copy
numpy.bincount
self.init_grad
self.y.take
KMeans_and_GMM
gym.wrappers.Monitor
numpy.random.random
self.distance_func
numpy.dot.sum
sentences.append
classification_error
autograd.numpy.zeros_like
numpy.asarray
self._find_splits
self._factor_step
get_filename.open.read
grid_gaussian_pdf
numpy.zeros.sum
mla.ensemble.gbm.GradientBoostingRegressor.fit
X_train.shape.X_train.reshape.astype
self.likelihood.append
loss_history.append
sklearn.model_selection.train_test_split
self.loss.grad
self._backward_pass.reshape
mla.linear_models.LogisticRegression.predict
self._grad
col.transpose.reshape
mla.ensemble.gbm.GradientBoostingRegressor
self._binary_search
numpy.cumsum
os.path.dirname
self.responsibilities.sum
self.activation_d
affines.clip.clip
mla.metrics.accuracy
zip
math.sqrt
mla.svm.svm.SVM.fit
network.error
self.X.min
mla.svm.kernerls.Linear
matplotlib.pyplot.plot
autograd.numpy.pad
self.left_child._train
mla.neuralnet.regularizers.L2
mla.neuralnet.layers.recurrent.LSTM
i.delta.sum
closest.self.clusters.append
i.y.X.dot
autograd.numpy.zeros_like.keys
self.model.fit
logging.info
min
gains.clip.clip
cov.mean.grid_array.multivariate_normal.pdf.reshape
self.optimizer.optimize
self.FMClassifier.super.fit
self.sigmoid
predicted.actual.self.hess.sum
layer.setup
numpy.max
numpy.eye
self._E_step
numpy.clip
matplotlib.pyplot.scatter
width.height.n_images.out.reshape.transpose
print
numpy.apply_along_axis
numpy.dot
mla.datasets.load_mnist
numpy.zeros
isinstance
mla.knn.KNNRegressor
self.theta.flatten
uniform
self.X.max
sigmoid
mla.pca.PCA.transform
mla.kmeans.KMeans.predict
get_filename
type
i.self.clusters.remove
scipy.linalg.svd
layer.parameters.keys
self.kernel
mla.neuralnet.NeuralNet
numpy.atleast_2d
AttributeError
numpy.random.multinomial
mla.naive_bayes.NaiveBayesClassifier
self.optimizer.setup
squared_log_error
mla.metrics.distance.l2_distance
y.append
mla.utils.batch_iterator
absolute_error
n_channels.out_width.out_height.n_images.out.reshape.transpose
y_test.flatten
predicted.actual.self.grad.sum
numpy.zeros.mean
preds.np.asarray.astype
autograd.numpy.logaddexp
delta.transpose.transpose
self.transform
scipy.stats.entropy
numpy.random.seed
numpy.unique
mla.rl.dqn.DQN.init_environment
y.reshape.reshape
self.hprev.copy
numpy.random.shuffle
logging.getLogger
tree.train
mla.rl.dqn.DQN.init_model
self._is_converged
self._params.init
sys.stdout.flush
self._get_pairwise_affinities
f.read.split
w.np.abs.sum
mla.ensemble.base.split_dataset
autograd.numpy.mean
mean_squared_log_error
self._dist_from_centers
mla.neuralnet.loss.get_loss
delta.transpose.reshape
X.reshape
self._backward_pass
collections.Counter
self.dense.backward_pass
logging.getLogger.setLevel
int
bool
x.y.self._pdf.np.log.sum
self._calculate_leaf_value
os.path.join
self.metric
mla.neuralnet.optimizers.Adam
mla.knn.KNNRegressor.predict
mla.knn.KNNClassifier.predict
self._get_centroid
gym.make
mla.ensemble.base.xgb_criterion
delta.self.col.T.np.dot.transpose
self._forward_pass
autograd.numpy.dot
self.loss_grad.mean
assignment.self.means.self.X.T.dot
self.sigmoid_d
delta.reshape
matplotlib.pyplot.subplots
tree.predict_row
mla.neuralnet.layers.get_activation
self._choose_next_center
layer.shape
mla.neuralnet.layers.Flatten
array.array
mla.tsne.TSNE.fit_transform
setuptools.setup
numpy.random.normal
function
mla.knn.KNNRegressor.fit
self._params.update_grad
self._params.keys
hasattr
scipy.spatial.distance.cdist
self._q_distribution
numpy.cov
X_train.shape.X_train.reshape.astype.reshape
seaborn.set
addition_problem
str.format
axis_X.flatten
autograd.numpy.sign
numpy.random.choice
numpy.arange
self.env.step
reversed
Tree
numpy.zeros_like
mla.neuralnet.NeuralNet.fit
self._cost.append
mla.ensemble.random_forest.RandomForestClassifier
numpy.random.random_sample
layer.parameters.step
name.self.regularizers
axis_Y.flatten
x.dot
numpy.log2
autograd.numpy.linalg.norm
autograd.numpy.clip
autograd.numpy.repeat
mla.metrics.metrics.accuracy
network.shuffle_dataset
numpy.fill_diagonal
mla.svm.svm.SVM.predict
Dense
self.train_epoch
regression
itertools.islice
cmap
self.predict
self.FMRegressor.super.fit
self.criterion
numpy.sum
self._find_bprop_entry
self.right_child.predict_row
axes.set_title
self._cost
self.loss_grad
self.X.resp.sum
numpy.take
numpy.reshape
self.init_cost
numpy.prod
self.grad
print_curve
super
mla.rl.dqn.DQN.train
model.predict.flatten
autograd.numpy.full
self.RandomForestClassifier.super.__init__
mla.linear_models.LogisticRegression.fit
numpy.maximum
logging.basicConfig
float
self._sample
self.sigmoid_d.sum
get_filename.open.read.lower
self._get_likelihood
autograd.numpy.arange
mla.ensemble.gbm.GradientBoostingRegressor.predict
self.shape
mla.kmeans.KMeans.fit
mla.rbm.RBM
autograd.numpy.maximum
list
numpy.mean
mla.datasets.load_nietzsche
mla.rbm.RBM.fit
sigmoid.sum
numpy.concatenate
mla.neuralnet.activations.get_activation
autograd.elementwise_grad
self._params.setup_weights
ax.contour
self._params.init_grad
random.sample
self.dense.setup
numpy.random.rand
losses.append
self._train
mla.ensemble.base.split
logging.getLogger.info
mla.neuralnet.activations.sigmoid
self._decompose
numpy.linalg.svd
ValueError
n_timesteps.self.states.copy
cost_d
moving_average
self._init_weights
w.sum
f_width.f_height.n_channels.out_width.out_height.n_images.columns.reshape.transpose
self.init
mla.tsne.TSNE
tree.predict
self.model.predict
convoltuion_shape
self._predict
random.seed
neighbors_targets.Counter.most_common
self.predict_row
autograd.numpy.random.seed
check_data
model
numpy.exp
LeastSquaresLoss
sum
sklearn.cross_validation.train_test_split
mla.knn.KNNClassifier.fit
globals
X_test.shape.X_test.reshape.astype
self.centroids.append
mla.ensemble.random_forest.RandomForestRegressor.fit
self._M_step
struct.unpack
x.strip.replace
layer.forward_pass
numpy.abs
make_clusters
collections.defaultdict
numpy.linalg.norm
self._add_penalty
n_timesteps.self.outputs.copy
os.path.abspath
numpy.linalg.eig
X.transpose
random.randint
self.random_index
params.append
mla.ensemble.random_forest.RandomForestRegressor.predict
mla.ensemble.tree.Tree
NotImplementedError
target.keys
numpy.packbits
network.update
self.dense.forward_pass
mla.utils.one_hot
autograd.numpy.array
self.activation
self._find_splits.add
squared_distances.sum
name.self.constraints.clip
image_to_column
super.__init__
seaborn.color_palette
model.predict.max
batch.sum
autograd.numpy.abs
enumerate
batch.sum.np.asarray.squeeze
n_timesteps.states.copy
mla.rl.dqn.DQN.play
self.replay.append
mla.linear_models.LogisticRegression
model.predict.min
resp.sum
numpy.ones_like
self._error
pooling_shape
setuptools.find_packages
autograd.numpy.sum
sklearn.datasets.make_regression
self._setup_input
mla.rl.dqn.DQN
numpy.packbits.astype
format
self.covs.append
mla.naive_bayes.NaiveBayesClassifier.fit
autograd.numpy.max
time.time
scipy.special.expit
self.trees.append
mla.gaussian_mixture.GaussianMixture
predicted.actual.sum
self.clip
numpy.round
X_c.var
self.loss.approximate
self._get_weighted_likelihood.argmax
numpy.full
layer.backward_pass
loss.gain
numpy.where
delta.transpose.flatten
str
self.responsibilities.sum.sum
self.RandomForestRegressor.super.__init__
self.GradientBoostingClassifier.super.fit
sorted
numpy.meshgrid
mla.neuralnet.layers.MaxPooling
mla.utils.one_hot.flatten
self.GradientBoostingRegressor.super.fit
f.read
self.left_child.predict_row
mla.naive_bayes.NaiveBayesClassifier.predict
numpy.sqrt
mla.neuralnet.parameters.Parameters
sklearn.datasets.make_classification
numpy.copy
max
self.env.close
autograd.numpy.prod
x.startswith
self._get_weighted_likelihood.sum
mla.metrics.metrics.mean_squared_error
autograd.numpy.argmax
self.activation_d.sum
p.keys
scipy.stats.multivariate_normal.pdf
self.weight
self.inner_init
sklearn.datasets.make_blobs
random.choice
squared_error
normal
autograd.numpy.exp
self._initialize
mla.metrics.distance.euclidean_distance
autograd.numpy.random.normal
numpy.array
numpy.random.randint
open.close
numpy.random.randn
addition_dataset
self.loss.transform
matplotlib.pyplot.show
next_chars.append
open.read
mla.kmeans.KMeans.plot
delta.reshape.reshape
self._add_intercept.dot
autograd.numpy.log
delta.self.col.T.np.dot.transpose.reshape
X.transpose.reshape
self.env.reset
mla.pca.PCA
len
x.strip
self._find_best_split
_glorot_fan
numpy.ones
X_c.mean
get_split_mask
mla.ensemble.tree.Tree.predict
self.X.take
autograd.numpy.random.uniform
self.replay.pop
autograd.numpy.amax
self._initialize_centroids
s_squared.sum
col.transpose.reshape.transpose
mla.neuralnet.constraints.MaxNorm
self.errors.append
self._add_intercept
mla.metrics.metrics.get_metric
numpy.random.uniform
f_width.f_height.n_channels.out_width.out_height.n_images.columns.reshape.transpose.reshape
self.hess
numpy.amax
numpy.log
mla.neuralnet.layers.Dense
cols.rows.i.ind.cols.rows.i.ind.img.array.reshape
autograd.numpy.sqrt
numpy.argmax
autograd.numpy.clip.argmax
y_max.reshape.reshape
column_to_image
X_test.shape.X_test.reshape.astype.reshape
autograd.grad
mla.gaussian_mixture.GaussianMixture.plot
t.y.astype
tqdm.tqdm
logging.debug
self.weights.sum
mla.pca.PCA.fit
self._predict_x
self.v.x.dot.dot
mla.neuralnet.layers.TimeDistributedDense
self._predict_row
mla.neuralnet.layers.Convolution
sys.stdout.write
mean_squared_error
autograd.numpy.tanh
self._gradient_descent
mla.neuralnet.layers.Activation
self._setup_layers
kmeans_example
sklearn.metrics.roc_auc_score
LogisticLoss
Q.clip.clip
numpy.empty
mla.neuralnet.optimizers.RMSprop
mla.neuralnet.initializations.get_initializer
mla.ensemble.gbm.GradientBoostingClassifier
self._pdf
mla.neuralnet.NeuralNet.predict
mla.neuralnet.activations.softmax
codecs.open
self.fprop
self._get_predictions
actual.argmax.argmax
mla.gaussian_mixture.GaussianMixture.fit
autograd.numpy.zeros
self.loss
mla.kmeans.KMeans
self._get_weighted_likelihood
set
mla.neuralnet.optimizers.Adadelta
autograd.numpy.max.reshape
self.aggregate
mla.ensemble.random_forest.RandomForestRegressor
C.W.H.N.out_flat.reshape.transpose
self.right_child._train
open
mla.svm.kernerls.RBF
range
self._assign

@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.

may you share example for MLAlgorithms/mla/fm.py

it is really great code , only how to run it
MLAlgorithms/mla/fm.py
may you add example , as you did for other models?

Import by filename not supported

I cloned the repo and tried running the example in README.md without explicit installation;

python -m examples.linear_models

I get the following result (using Mac/Anaconda/Python 2.7x):

/anaconda/bin/python: Import by filename is not supported.

Any ideas as to why the standard example won't work?

Implementing linear regression from scratch with gradient descent without any sklearn libraries.

As it is used predefined libraries I would like to add a module with linear regression implementation from scratch so that users can understand how they are implemented and It would be implemented as a class so it can be used similar to the sklearn library style.

ImportError : No module name mla

When I run nnet_convnet_mnist.py in the IDE , it throw the error.

Thank you @rushter

About function 'f_entropy' in ensemble.base

Excuse me but I think there is something wrong with the function f_entropy in ensemble.base since you are using scipy.stats.entropy to calculate the information entropy directly. This function receives an array which represents a distribution, but in the tree class, when you use call f_entropy you are actually feeding an array of labels to scipy.stats.entropy, this will lead to a different result.

Implement Gradient

I notice you use a 3rd party module to evaluate the gradient of your cost function in your GD routine. What was the reasoning behind this and why not implement it?

Decision Tree Algorithm

Can I work on this issue?

Add missing neural network optimizers

Currently, we are missing:

NAdam
Adamax
Adabound

unhashable type - errors when using the GBM.predict() method after running train_test_split with a pandas DataFrame

Wanted to first say thanks for these examples!

If you're using the GBM example (https://github.com/rushter/MLAlgorithms/blob/master/examples/gbm.py) after splitting a DataFrame you will get an error saying:

unhashable type

Debugging this it looks like it's in the tree's predict method:

MLAlgorithms/mla/ensemble/tree.py

Line 150 in 7fe1fea

result[i] = self.predict_row(X[i, :])

Some sample tests to reproduce it:

import pandas as pd
x = pd.DataFrame({
                    "Alpha" : [1, 2, 3],
                    "Beta" : [4, 5, 6]
                })
i = 0
print x[i, :]


$ python ./debugging.py 
Traceback (most recent call last):
  File "./debugging.py", line 8, in <module>
    print x[i, :]
  File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 2057, in __getitem__
    return self._getitem_column(key)
  File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 2064, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/lib64/python2.7/site-packages/pandas/core/generic.py", line 1384, in _get_item_cache
    res = cache.get(item)
TypeError: unhashable type
$

vs a version that converts the DataFrame before selecting indices:

import pandas as pd
x = pd.DataFrame({
                    "Alpha" : [1, 2, 3],
                    "Beta" : [4, 5, 6]
                })
i = 0
print x.values[i, :]

$ python ./debugging.py 
[1 4]
$

Not sure if there should be a comment/another example or code that can detect it's a DataFrame vs a numpy type...

For those hitting the same issue, here's how I got it working:

model = GradientBoostingClassifier(n_estimators=50, max_depth=4,
                                       max_features=8, learning_rate=0.1)
model.fit(X_train.values, Y_train.values)
predictions = model.predict(X_test.values)

Typo in filename

https://github.com/rushter/MLAlgorithms/blob/master/mla/svm/kernerls.py

Shouldn't it be "kernels"? (remove the "r" before the final "l").

Add Recommender System algorithms

I want to add some of the algorithms used on recommendation systems. I have completed my graduation recently and in my final paper I wrote a recommender system, an hybrid implementation, using some simple algorithms. I like to share them in your repo if you permit.

Ml

Missing Random Forest

I would like to implement random forest

Dead link in optimizers.py docstring

MLAlgorithms/mla/neuralnet/optimizers.py

Line 13 in 3c8e16b

    
               Gradient descent optimization algorithms  http://sebastianruder.com/optimizing-gradient-descent/index.html

mla/base/base.py predict function

I think there's a problem here.

def predict(self, X=None):
        if not isinstance(X, np.ndarray):
            X = np.array(X)
        if self.X is not None or not self.fit_required:
            return self._predict(X)
        else:
            raise ValueError("You must call `fit` before `predict`")

The following code shows that self.X is not None is always true.

>>> X=None
>>> isinstance(X,np.ndarray)
False
>>> X=np.array(X)
>>> X
array(None, dtype=object)
>>> X is not None
True

random_forest.py multi-classification

Change n_classes=4, and run classification()
got error:
File "G:\ml\MLAlgorithms-master\mla\ensemble\tree.py", line 149, in predict_row
return self.outocome
AttributeError: 'Tree' object has no attribute 'outocome'

How can fix this error?

	for i in range(n_timesteps):
	states[:, i, :] = np.tanh(np.dot(X[:, i, :], p['W']) + np.dot(states[:, i - 1, :], p['U']) + p['b'])

	self.states = states
	self.hprev = states[:, n_timesteps - 1, :].copy()
	if self.return_sequences:
	return states[:, 0:-1, :]
	else:
	return states[:, -2, :]