Coder Social home page Coder Social logo

mlfromscratch's Introduction

Hi I'm Patrick

I'm a Software Engineer, YouTuber, and Developer Advocate

  • 250K+ subscribers on YouTube
  • I create free educational content about Python and Machine Learning
  • On my channel you find FREE courses about Python, PyTorch, TensorFlow, and much more
  • I also post articles on my Website
  • If you want to connet with more Python and ML enthusiasts you can join my Discord server
  • For most tutorials the corresponding code is here on GitHub

If you enjoy my content, I would be very happy if you subscribe to my channel ๐Ÿ˜Š


Popular YouTube Videos ๐Ÿ“บ

โžก๏ธ More videos...


My GitHub Stats ๐Ÿ’ป

Top Langs

Patrick's GitHub stats


Connect With Me:

codeSTACKr.com codeSTACKr | YouTube codeSTACKr | Twitter codeSTACKr | Instagram codeSTACKr | Instagram



Support Me!

You can show support by starring my repos, liking and sharing my videos, and subscribing to my channel.

If you really, really, really enjoy my work, you can also support me on Patreon.

Thank you all so much ๐Ÿ™

mlfromscratch's People

Contributors

dependabot[bot] avatar janasunrise avatar patrickloeber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlfromscratch's Issues

euclidean distance should be sqrt((x1-x2)**2+(y1-y2)**2))

euclidean distance should be sqrt((x1-x2)**2+(y1-y2)**2))
or in more advanced way
euclid_dist= np.linalg.norm(np.array(feature)-np.array(predict))

def knn(data,predict,k=3):
if len(data)>=3:
warnings.warn("len should be less than 3")
distances=[]
for group in data:
for feature in data[group]:

        euclid= np.linalg.norm(np.array(feature)-np.array(predict))
      
        distances.append([euclid,group])
vote=[i[1]for i in sorted(distances)[:k]]
vote_result=Counter(vote).most_common(1)[0][0]

return vote_result

Explanation of `get_hyperplane_value`

Hi, I find your tutorials on SVM very helpful but I do not understand the get_hyperplane_value method in svm_tests.py. May I have an explanation of it? Thank you!

IndexError: index 6 is out of bounds for axis 0 with size 6

Hi!

I tried yours naive bayes classifier on
https://archive.ics.uci.edu/ml/machine-learning-databases/glass/glass.data

n_classes = len(self._classes) returns classes like [ 1,2,3,4,5,6 ] unlike in iris [0,1,2]

In loop
for c in self._classes:
X_c = X[y==c]
self._mean[c, :] = X_c.mean(axis=0)
self._var[c, :] = X_c.var(axis=0)
self._priors[c] = X_c.shape[0] / float(n_samples)

I will try to go self._mean[6, ;] which will be out of boundry.
Shouldn't it be
for index, c in enumerate(self._classes):
with index insteed of c in calculations?

Clustering Predict a single tuple

Hey guys, I am understanding how clustering is working but how to save this model and predict on a new tuple. I want to put this model into production, hence the hassle.

Project dependencies may have API risk issues

Hi, In MLfromscratch, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

numpy==1.22.0
scikit-learn==0.24.2
matplotlib==3.4.2
pandas==1.2.4

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project,
The version constraint of dependency numpy can be changed to >=1.8.0,<=1.23.0rc3.
The version constraint of dependency matplotlib can be changed to >=1.3.0,<=3.0.3.
The version constraint of dependency pandas can be changed to >=0.4.0,<=1.2.5.

The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the numpy
numpy.linalg.inv
numpy.linalg.eig
The calling methods from the matplotlib
matplotlib.colors.ListedColormap
The calling methods from the pandas
pandas.read_csv
The calling methods from the all methods
numpy.argwhere
self._grow_tree
self._best_criteria
self.plot
numpy.unique
numpy.amin
LDA.transform
RandomForest.predict
numpy.mean
range
numpy.exp
numpy.argsort
numpy.dot
sklearn.datasets.make_blobs
df.fillna.fillna
self._create_clusters
self._traverse_tree
numpy.log
self._approximation
numpy.sign
matplotlib.pyplot.figure
self._is_converged
numpy.linalg.eig
numpy.where
NaiveBayes
matplotlib.pyplot.show
numpy.sum
DecisionTree
mean_overall.mean_c.reshape.dot
SVM
matplotlib.colors.ListedColormap
SW.np.linalg.inv.dot
numpy.empty
csv.reader
centroid_idx.clusters.append
most_common_label
numpy.argmax
sklearn.datasets.make_classification
ax.scatter
matplotlib.pyplot.cm.get_cmap
matplotlib.pyplot.figure.add_subplot
KNN.predict
numpy.genfromtxt
bootstrap_sample
Node
LinearRegression
self._predict
fig.add_subplot.plot
Adaboost.fit
LinearRegression.predict
Perceptron.predict
enumerate
list
SVM.fit
Adaboost.predict
KMeans.predict
node.is_leaf_node
numpy.sqrt
self.trees.append
sum
matplotlib.pyplot.plot
numpy.swapaxes
self._pdf
DecisionTree.predict
numpy.random.seed
self._information_gain
matplotlib.pyplot.xlabel
KNN.fit
numpy.amax
DecisionStump
Perceptron
len
posteriors.append
numpy.log2
numpy.argmin
numpy.linalg.inv
self.clfs.append
self._get_cluster_labels
Perceptron.fit
numpy.cov
abs
accuracy
LogisticRegression.predict
numpy.array
mean_c.X_c.T.dot
visualize_svm
numpy.bincount
decision_tree.DecisionTree.fit
float
entropy
RandomForest.fit
sklearn.datasets.make_regression
mean_overall.mean_c.reshape
sklearn.datasets.load_iris
LinearRegression.fit
mean_squared_error
NaiveBayes.fit
KMeans.plot
PCA.transform
k_neighbor_labels.Counter.most_common
numpy.loadtxt
cmap
self._sigmoid
RandomForest
decision_tree.DecisionTree
numpy.zeros
sklearn.model_selection.train_test_split
self._split
pandas.read_csv
X_c.mean
X_c.var
self._get_centroids
df.fillna.to_numpy
LDA
fig.add_subplot.set_ylim
split_thresh.X_column.np.argwhere.flatten
collections.Counter.most_common
numpy.full
euclidean_distance
decision_tree.DecisionTree.predict
min
matplotlib.pyplot.scatter
self._most_common_label
print
get_hyperplane_value
matplotlib.pyplot.ylabel
PCA
Adaboost
numpy.corrcoef
self.activation_func
matplotlib.pyplot.subplots
numpy.ones
r2_score
matplotlib.pyplot.get_cmap
LogisticRegression.fit
KNN
open
sklearn.datasets.load_breast_cancer
NaiveBayes.predict
numpy.random.choice
DecisionTree.fit
self._closest_centroid
matplotlib.pyplot.colorbar
collections.Counter
KMeans
LDA.fit
PCA.fit
LogisticRegression

@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.

AdaBoost suggestions

Thanks much for putting this material together!

Looking at Lucky 13: AdaBoost. A few items are a bit unclear for us newbies.

First, in the fit() method, there is just a single pass over the data, X, while the original Freund & Schapire, 1995 paper suggests looping for T iterations, refitting the classifiers on each pass based on the evolving weights. Looks like the version here is based on Zhu, et al 2009. Might be worth a few words to explain the source of the algorithm, and also why this version needs to make only one pass over the samples.

Second, just from a learning perspective, it would be great to provide a data set that mimics the illustrations in the video, just so we can verify that things work as expected. For extra credit, use MatPlotLib to create the decision boundary visualization from the video.

Third, it might be worthwhile pointing out refinements a real design would need. For example, here are the decision stubs created from the test code. Notice that feature 23 is used twice: same polarity, just different threshold. Is this a limitation of this simple example, or actually a useful quirk of AdaBoost?

0: {'polarity': -1, 'feature_idx': 27, 'threshold': 0.1424, 'alpha': 1.2271759901553476}
1: {'polarity': -1, 'feature_idx': 23, 'threshold': 728.3, 'alpha': 0.9273811402788633}
2: {'polarity': -1, 'feature_idx': 1, 'threshold': 19.98, 'alpha': 0.7916733128875748}
3: {'polarity': -1, 'feature_idx': 23, 'threshold': 876.5, 'alpha': 0.6099992009200025}
4: {'polarity': -1, 'feature_idx': 26, 'threshold': 0.2177, 'alpha': 0.5775069918855832}

Repository has no License

Thanks for the wonderful work on this repository! Unfortunately your code does not have a license, could you please add one so it's clear whether we are allowed to reuse it for teaching students etc.?

Thank you!

Multiclass SVM classifier

Hello,
Could you please provide an example for the implementation of multiclass SVM classifier from scratch?
Thanks!

Regression tree

hey,
Could you please provide an implementation of Regression tree from scratch?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.