- Twitter: @rushter
- Blog: https://rushter.com/blog/
rushter / mlalgorithms Goto Github PK
View Code? Open in Web Editor NEWMinimal and clean examples of machine learning algorithms implementations
License: MIT License
Minimal and clean examples of machine learning algorithms implementations
License: MIT License
Hi! I don't know why I'm getting overflows training a NeuralNet. Actually I don't know if the problem is related to this project or to autograd. I think @mattjj could help, too.
With this sample data I'm getting overflows in exp and power, getting NaN as predictions. I can't understand where the problem is.
With Python3 you can execute this:
from io import StringIO
import pandas as pd
from mla.neuralnet import NeuralNet
from mla.neuralnet.layers import Activation, Dense
from mla.neuralnet.optimizers import Adam
from mla.neuralnet.parameters import Parameters
X_train_string = '-0.10410 6.106 6.232 97.50 -0.17490 0.16960 -0.4665 0.4242 289.4 723.8 2.6 2.6\n0.51340 5.517 12.160 90.83 -0.07771 0.07383 -0.2153 0.2133 361.6 706.9 2.6 2.6\n0.40280 5.359 9.403 79.45 -0.05685 0.06042 -0.2095 0.1873 331.9 676.0 2.6 2.6\n-0.02972 5.821 8.682 70.49 -0.05592 0.04690 -0.2113 0.2078 323.2 649.6 2.6 2.6\n-1.06600 6.893 3.798 132.00 -0.11060 0.10940 -0.3516 0.3494 245.4 800.6 2.6 2.6\n-0.66730 5.795 4.824 63.08 -0.04957 0.05160 -0.1446 0.1654 265.7 626.0 2.6 2.6\n0.11930 8.804 18.630 253.90 -0.17100 0.17270 -0.3759 0.3657 416.9 995.8 2.6 2.6'
y_train_string = '164.0\n156.5\n134.5\n142.5\n195.6\n134.2\n272.0'
X_test_string = '-0.1041 6.106 6.232 97.50 -0.17490 0.16960 -0.4665 0.4242 289.4 723.8 2.6 2.6\n0.5134 5.517 12.160 90.83 -0.07771 0.07383 -0.2153 0.2133 361.6 706.9 2.6 2.6\n0.4028 5.359 9.403 79.45 -0.05685 0.06042 -0.2095 0.1873 331.9 676.0 2.6 2.6'
y_test_string = '164.0\n156.5\n134.5'
X_train = pd.read_csv(StringIO(X_train_string), header=None, delim_whitespace=True)
y_train = pd.read_csv(StringIO(y_train_string), header=None, delim_whitespace=True)
X_test = pd.read_csv(StringIO(X_test_string), header=None, delim_whitespace=True)
y_test = pd.read_csv(StringIO(y_test_string), header=None, delim_whitespace=True)
net = NeuralNet(layers=[ Dense(20, Parameters(init='normal')),
Activation('sigmoid'),
Dense(1),
],
loss='mse',
optimizer=Adam(),
metric='mse',
batch_size=256,
max_epochs=200)
net.fit(X_train,y_train)
predictions = net.predict(X_test)
I'm getting this
/home/antonio/.conda/envs/intervalML/lib/python3.6/site-packages/autograd/core.py:81: RuntimeWarning: overflow encountered in exp
result_value = self.fun(*argvals, **kwargs)
/home/antonio/.conda/envs/intervalML/lib/python3.6/site-packages/autograd/core.py:81: RuntimeWarning: overflow encountered in power
result_value = self.fun(*argvals, **kwargs)
/home/antonio/.conda/envs/intervalML/lib/python3.6/site-packages/autograd/core.py:81: RuntimeWarning: invalid value encountered in multiply
result_value = self.fun(*argvals, **kwargs)
So predictions
is full NaN...
In [2]: predictions
Out[2]:
array([[ nan],
[ nan],
[ nan]])
Where's the problem?
Thank you! Your projects are really useful :)
According to the Andrew Ng's deep learning course (a for hidden state, y for output value):
We get output values by multiplying the hidden state by a weight matrix Wya, adding bias by onto it, and then go through an activation function.
But from
MLAlgorithms/mla/neuralnet/layers/recurrent/rnn.py
Lines 55 to 63 in 6e383f7
@rushter Can you please give your reference of RNN or confirm it as a bug? If it's a bug, I'd like to create a PR to fix it. ๐
Hi, In MLAlgorithms, inappropriate dependency versioning constraints can cause risks.
Below are the dependencies and version constraints that the project is using
tqdm
matplotlib>=1.5.1
numpy>=1.11.1
scikit-learn>=0.18
scipy>=0.18.0
seaborn>=0.7.1
autograd>=1.1.7
gym
The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.
After further analysis, in this project,
The version constraint of dependency gym can be changed to >=0.6.0,<=0.22.0.
The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.
The invocation of the current project includes all the following methods.
In version gym-0.5.7, the API gym.wrappers.Monitor whch is used by the current project in mla/rl/dqn.py is missing.
gym.wrappers.Monitor
self.update mla.knn.KNNClassifier f_entropy sklearn.metrics.accuracy_score self.env.render ax.scatter self._sample.sum axes.scatter classification sample mla.linear_models.LinearRegression output.append mla.svm.svm.SVM numpy.sign self.cost_func autograd.numpy.concatenate autograd.numpy.ones itertools.combinations mla.ensemble.tree.Tree.train self.arg_max.flatten mla.neuralnet.layers.Dropout y.x.dist.cdist.self.gamma.np.exp.flatten self._params.reshape numpy.ceil self._closest col.transpose.reshape.reshape i.self.assignments.sum self._find_bounds load X.copy.copy numpy.bincount self.init_grad self.y.take KMeans_and_GMM gym.wrappers.Monitor numpy.random.random self.distance_func numpy.dot.sum sentences.append classification_error autograd.numpy.zeros_like numpy.asarray self._find_splits self._factor_step get_filename.open.read grid_gaussian_pdf numpy.zeros.sum mla.ensemble.gbm.GradientBoostingRegressor.fit X_train.shape.X_train.reshape.astype self.likelihood.append loss_history.append sklearn.model_selection.train_test_split self.loss.grad self._backward_pass.reshape mla.linear_models.LogisticRegression.predict self._grad col.transpose.reshape mla.ensemble.gbm.GradientBoostingRegressor self._binary_search numpy.cumsum os.path.dirname self.responsibilities.sum self.activation_d affines.clip.clip mla.metrics.accuracy zip math.sqrt mla.svm.svm.SVM.fit network.error self.X.min mla.svm.kernerls.Linear matplotlib.pyplot.plot autograd.numpy.pad self.left_child._train mla.neuralnet.regularizers.L2 mla.neuralnet.layers.recurrent.LSTM i.delta.sum closest.self.clusters.append i.y.X.dot autograd.numpy.zeros_like.keys self.model.fit logging.info min gains.clip.clip cov.mean.grid_array.multivariate_normal.pdf.reshape self.optimizer.optimize self.FMClassifier.super.fit self.sigmoid predicted.actual.self.hess.sum layer.setup numpy.max numpy.eye self._E_step numpy.clip matplotlib.pyplot.scatter width.height.n_images.out.reshape.transpose print numpy.apply_along_axis numpy.dot mla.datasets.load_mnist numpy.zeros isinstance mla.knn.KNNRegressor self.theta.flatten uniform self.X.max sigmoid mla.pca.PCA.transform mla.kmeans.KMeans.predict get_filename type i.self.clusters.remove scipy.linalg.svd layer.parameters.keys self.kernel mla.neuralnet.NeuralNet numpy.atleast_2d AttributeError numpy.random.multinomial mla.naive_bayes.NaiveBayesClassifier self.optimizer.setup squared_log_error mla.metrics.distance.l2_distance y.append mla.utils.batch_iterator absolute_error n_channels.out_width.out_height.n_images.out.reshape.transpose y_test.flatten predicted.actual.self.grad.sum numpy.zeros.mean preds.np.asarray.astype autograd.numpy.logaddexp delta.transpose.transpose self.transform scipy.stats.entropy numpy.random.seed numpy.unique mla.rl.dqn.DQN.init_environment y.reshape.reshape self.hprev.copy numpy.random.shuffle logging.getLogger tree.train mla.rl.dqn.DQN.init_model self._is_converged self._params.init sys.stdout.flush self._get_pairwise_affinities f.read.split w.np.abs.sum mla.ensemble.base.split_dataset autograd.numpy.mean mean_squared_log_error self._dist_from_centers mla.neuralnet.loss.get_loss delta.transpose.reshape X.reshape self._backward_pass collections.Counter self.dense.backward_pass logging.getLogger.setLevel int bool x.y.self._pdf.np.log.sum self._calculate_leaf_value os.path.join self.metric mla.neuralnet.optimizers.Adam mla.knn.KNNRegressor.predict mla.knn.KNNClassifier.predict self._get_centroid gym.make mla.ensemble.base.xgb_criterion delta.self.col.T.np.dot.transpose self._forward_pass autograd.numpy.dot self.loss_grad.mean assignment.self.means.self.X.T.dot self.sigmoid_d delta.reshape matplotlib.pyplot.subplots tree.predict_row mla.neuralnet.layers.get_activation self._choose_next_center layer.shape mla.neuralnet.layers.Flatten array.array mla.tsne.TSNE.fit_transform setuptools.setup numpy.random.normal function mla.knn.KNNRegressor.fit self._params.update_grad self._params.keys hasattr scipy.spatial.distance.cdist self._q_distribution numpy.cov X_train.shape.X_train.reshape.astype.reshape seaborn.set addition_problem str.format axis_X.flatten autograd.numpy.sign numpy.random.choice numpy.arange self.env.step reversed Tree numpy.zeros_like mla.neuralnet.NeuralNet.fit self._cost.append mla.ensemble.random_forest.RandomForestClassifier numpy.random.random_sample layer.parameters.step name.self.regularizers axis_Y.flatten x.dot numpy.log2 autograd.numpy.linalg.norm autograd.numpy.clip autograd.numpy.repeat mla.metrics.metrics.accuracy network.shuffle_dataset numpy.fill_diagonal mla.svm.svm.SVM.predict Dense self.train_epoch regression itertools.islice cmap self.predict self.FMRegressor.super.fit self.criterion numpy.sum self._find_bprop_entry self.right_child.predict_row axes.set_title self._cost self.loss_grad self.X.resp.sum numpy.take numpy.reshape self.init_cost numpy.prod self.grad print_curve super mla.rl.dqn.DQN.train model.predict.flatten autograd.numpy.full self.RandomForestClassifier.super.__init__ mla.linear_models.LogisticRegression.fit numpy.maximum logging.basicConfig float self._sample self.sigmoid_d.sum get_filename.open.read.lower self._get_likelihood autograd.numpy.arange mla.ensemble.gbm.GradientBoostingRegressor.predict self.shape mla.kmeans.KMeans.fit mla.rbm.RBM autograd.numpy.maximum list numpy.mean mla.datasets.load_nietzsche mla.rbm.RBM.fit sigmoid.sum numpy.concatenate mla.neuralnet.activations.get_activation autograd.elementwise_grad self._params.setup_weights ax.contour self._params.init_grad random.sample self.dense.setup numpy.random.rand losses.append self._train mla.ensemble.base.split logging.getLogger.info mla.neuralnet.activations.sigmoid self._decompose numpy.linalg.svd ValueError n_timesteps.self.states.copy cost_d moving_average self._init_weights w.sum f_width.f_height.n_channels.out_width.out_height.n_images.columns.reshape.transpose self.init mla.tsne.TSNE tree.predict self.model.predict convoltuion_shape self._predict random.seed neighbors_targets.Counter.most_common self.predict_row autograd.numpy.random.seed check_data model numpy.exp LeastSquaresLoss sum sklearn.cross_validation.train_test_split mla.knn.KNNClassifier.fit globals X_test.shape.X_test.reshape.astype self.centroids.append mla.ensemble.random_forest.RandomForestRegressor.fit self._M_step struct.unpack x.strip.replace layer.forward_pass numpy.abs make_clusters collections.defaultdict numpy.linalg.norm self._add_penalty n_timesteps.self.outputs.copy os.path.abspath numpy.linalg.eig X.transpose random.randint self.random_index params.append mla.ensemble.random_forest.RandomForestRegressor.predict mla.ensemble.tree.Tree NotImplementedError target.keys numpy.packbits network.update self.dense.forward_pass mla.utils.one_hot autograd.numpy.array self.activation self._find_splits.add squared_distances.sum name.self.constraints.clip image_to_column super.__init__ seaborn.color_palette model.predict.max batch.sum autograd.numpy.abs enumerate batch.sum.np.asarray.squeeze n_timesteps.states.copy mla.rl.dqn.DQN.play self.replay.append mla.linear_models.LogisticRegression model.predict.min resp.sum numpy.ones_like self._error pooling_shape setuptools.find_packages autograd.numpy.sum sklearn.datasets.make_regression self._setup_input mla.rl.dqn.DQN numpy.packbits.astype format self.covs.append mla.naive_bayes.NaiveBayesClassifier.fit autograd.numpy.max time.time scipy.special.expit self.trees.append mla.gaussian_mixture.GaussianMixture predicted.actual.sum self.clip numpy.round X_c.var self.loss.approximate self._get_weighted_likelihood.argmax numpy.full layer.backward_pass loss.gain numpy.where delta.transpose.flatten str self.responsibilities.sum.sum self.RandomForestRegressor.super.__init__ self.GradientBoostingClassifier.super.fit sorted numpy.meshgrid mla.neuralnet.layers.MaxPooling mla.utils.one_hot.flatten self.GradientBoostingRegressor.super.fit f.read self.left_child.predict_row mla.naive_bayes.NaiveBayesClassifier.predict numpy.sqrt mla.neuralnet.parameters.Parameters sklearn.datasets.make_classification numpy.copy max self.env.close autograd.numpy.prod x.startswith self._get_weighted_likelihood.sum mla.metrics.metrics.mean_squared_error autograd.numpy.argmax self.activation_d.sum p.keys scipy.stats.multivariate_normal.pdf self.weight self.inner_init sklearn.datasets.make_blobs random.choice squared_error normal autograd.numpy.exp self._initialize mla.metrics.distance.euclidean_distance autograd.numpy.random.normal numpy.array numpy.random.randint open.close numpy.random.randn addition_dataset self.loss.transform matplotlib.pyplot.show next_chars.append open.read mla.kmeans.KMeans.plot delta.reshape.reshape self._add_intercept.dot autograd.numpy.log delta.self.col.T.np.dot.transpose.reshape X.transpose.reshape self.env.reset mla.pca.PCA len x.strip self._find_best_split _glorot_fan numpy.ones X_c.mean get_split_mask mla.ensemble.tree.Tree.predict self.X.take autograd.numpy.random.uniform self.replay.pop autograd.numpy.amax self._initialize_centroids s_squared.sum col.transpose.reshape.transpose mla.neuralnet.constraints.MaxNorm self.errors.append self._add_intercept mla.metrics.metrics.get_metric numpy.random.uniform f_width.f_height.n_channels.out_width.out_height.n_images.columns.reshape.transpose.reshape self.hess numpy.amax numpy.log mla.neuralnet.layers.Dense cols.rows.i.ind.cols.rows.i.ind.img.array.reshape autograd.numpy.sqrt numpy.argmax autograd.numpy.clip.argmax y_max.reshape.reshape column_to_image X_test.shape.X_test.reshape.astype.reshape autograd.grad mla.gaussian_mixture.GaussianMixture.plot t.y.astype tqdm.tqdm logging.debug self.weights.sum mla.pca.PCA.fit self._predict_x self.v.x.dot.dot mla.neuralnet.layers.TimeDistributedDense self._predict_row mla.neuralnet.layers.Convolution sys.stdout.write mean_squared_error autograd.numpy.tanh self._gradient_descent mla.neuralnet.layers.Activation self._setup_layers kmeans_example sklearn.metrics.roc_auc_score LogisticLoss Q.clip.clip numpy.empty mla.neuralnet.optimizers.RMSprop mla.neuralnet.initializations.get_initializer mla.ensemble.gbm.GradientBoostingClassifier self._pdf mla.neuralnet.NeuralNet.predict mla.neuralnet.activations.softmax codecs.open self.fprop self._get_predictions actual.argmax.argmax mla.gaussian_mixture.GaussianMixture.fit autograd.numpy.zeros self.loss mla.kmeans.KMeans self._get_weighted_likelihood set mla.neuralnet.optimizers.Adadelta autograd.numpy.max.reshape self.aggregate mla.ensemble.random_forest.RandomForestRegressor C.W.H.N.out_flat.reshape.transpose self.right_child._train open mla.svm.kernerls.RBF range self._assign
@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.
as we start to add more dimensionality reduction algorithms (eg t-sne, pca, sparse projections), should we include a separate set of unit tests?
it could be something along the lines of "test_reduction", to complement test_classification and test_regression.
lmk
Change n_classes=4, and run classification()
got error:
File "G:\ml\MLAlgorithms-master\mla\ensemble\tree.py", line 149, in predict_row
return self.outocome
AttributeError: 'Tree' object has no attribute 'outocome'
How can fix this error?
Wanted to first say thanks for these examples!
If you're using the GBM example (https://github.com/rushter/MLAlgorithms/blob/master/examples/gbm.py) after splitting a DataFrame you will get an error saying:
unhashable type
Debugging this it looks like it's in the tree's predict method:
MLAlgorithms/mla/ensemble/tree.py
Line 150 in 7fe1fea
Some sample tests to reproduce it:
import pandas as pd
x = pd.DataFrame({
"Alpha" : [1, 2, 3],
"Beta" : [4, 5, 6]
})
i = 0
print x[i, :]
$ python ./debugging.py
Traceback (most recent call last):
File "./debugging.py", line 8, in <module>
print x[i, :]
File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 2057, in __getitem__
return self._getitem_column(key)
File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 2064, in _getitem_column
return self._get_item_cache(key)
File "/usr/lib64/python2.7/site-packages/pandas/core/generic.py", line 1384, in _get_item_cache
res = cache.get(item)
TypeError: unhashable type
$
vs a version that converts the DataFrame before selecting indices:
import pandas as pd
x = pd.DataFrame({
"Alpha" : [1, 2, 3],
"Beta" : [4, 5, 6]
})
i = 0
print x.values[i, :]
$ python ./debugging.py
[1 4]
$
Not sure if there should be a comment/another example or code that can detect it's a DataFrame vs a numpy type...
For those hitting the same issue, here's how I got it working:
model = GradientBoostingClassifier(n_estimators=50, max_depth=4,
max_features=8, learning_rate=0.1)
model.fit(X_train.values, Y_train.values)
predictions = model.predict(X_test.values)
https://github.com/rushter/MLAlgorithms/blob/master/mla/svm/kernerls.py
Shouldn't it be "kernels"? (remove the "r" before the final "l").
In https://github.com/rushter/MLAlgorithms/blob/master/mla/pca.py#L48, the variance ratio of a component is calculated as the square of its eigenvalue divided by the sum of all squared eigenvalues.
But according to https://stats.stackexchange.com/questions/31908/what-is-percentage-of-variance-in-pca, the proportion of variance of a component should be its eigenvector divided by the sum of all eigenvectors. Any ideas?
(0.5 * self.C) * (w[1:] ** 2).mean()
, but according to https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c, it should be their sum.0.5 * (np.tanh(x) + 1)
, rather than the traditional 1/(1+np.exp(-x))
, what's the motivation of doing so?Can I work on this issue?
I notice you use a 3rd party module to evaluate the gradient of your cost function in your GD routine. What was the reasoning behind this and why not implement it?
There exists a problem about the function ''ax.scatter()" in kmeans.py.
TypeError: scatter() got multiple values for argument 'c'
The function of _add_intercept(X) in BasicRegression returns np.concatenate([b, X], axis=1).
I don't figure out the reason since I thought it should be np.concatenate([X,b], axis=1) to match with the function of _add_penalty.
please check it.
do you have examples for
Factorization machines
like
How to run examples without installation
cd MLAlgorithms
python -m examples.linear_models
I suggest you to used this website for a good meta tags
i suggest you this also for meta tags
I would like to implement random forest
As it is used predefined libraries I would like to add a module with linear regression implementation from scratch so that users can understand how they are implemented and It would be implemented as a class so it can be used similar to the sklearn library style.
Clean work!
Some examples return the following:
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
<Figure size 640x480 with 1 Axes>
try the following in a notebook in google colab to reproduce:
from google.colab import drive
drive.mount('/content/drive')
!git clone https://github.com/rushter/MLAlgorithms "/content/drive/My Drive/cloned/MLAlgorithms/"
%cd "/content/drive/My Drive/cloned/MLAlgorithms/"
!python setup.py develop
!python -m examples.kmeans
Best regards,
D
I want to add some of the algorithms used on recommendation systems. I have completed my graduation recently and in my final paper I wrote a recommender system, an hybrid implementation, using some simple algorithms. I like to share them in your repo if you permit.
Currently, we are missing:
I cloned the repo and tried running the example in README.md without explicit installation;
python -m examples.linear_models
I get the following result (using Mac/Anaconda/Python 2.7x):
/anaconda/bin/python: Import by filename is not supported.
Any ideas as to why the standard example won't work?
Excuse me but I think there is something wrong with the function f_entropy
in ensemble.base
since you are using scipy.stats.entropy
to calculate the information entropy directly. This function receives an array which represents a distribution, but in the tree
class, when you use call f_entropy
you are actually feeding an array of labels to scipy.stats.entropy
, this will lead to a different result.
it is really great code , only how to run it
MLAlgorithms/mla/fm.py
may you add example , as you did for other models?
MLAlgorithms/mla/neuralnet/optimizers.py
Line 13 in 3c8e16b
I think there's a problem here.
def predict(self, X=None):
if not isinstance(X, np.ndarray):
X = np.array(X)
if self.X is not None or not self.fit_required:
return self._predict(X)
else:
raise ValueError("You must call `fit` before `predict`")
The following code shows that self.X is not None is always true.
>>> X=None
>>> isinstance(X,np.ndarray)
False
>>> X=np.array(X)
>>> X
array(None, dtype=object)
>>> X is not None
True
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.