Coder Social home page Coder Social logo

cthorey / cs231 Goto Github PK

View Code? Open in Web Editor NEW
587.0 587.0 288.0 28.2 MB

My corrections for the Standford class assingments CS231n - Convolutional Neural Networks for Visual Recognition

Shell 0.01% Python 1.73% Jupyter Notebook 98.24% TeX 0.01% Emacs Lisp 0.01%

cs231's People

Contributors

cthorey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cs231's Issues

Can't view the svm.ipynb file in assignment one

I can't view the svm.ipynb file on Github in assignment one. While opening on jupyter notebook it give this error.

Unreadable Notebook: C:\Users\Elixir\Documents\Github\CS231\assignment1\svm.ipynb TypeError("argument of type 'NoneType' is not iterable")

Shouldn't Batch Norm Derivatives, be normalized by batch_size

Hey @cthorey ,

I recently went through your batch normalization tutorial here: What does gradient flowing through ... . First of, thank you so much for such an amazing post about batch normalization, I was implementing batch normalization in a FC-DNN but could find only few resources which give code and also derivations like your blog. Even though I was successful in my implementation, my derivations for the affine transformations were slightly off, your post clarified few bugs I had.

Although I have one question about the derivatives of beta and gamma here: CS231/assignment2/cs231n/layers.py / . I was wondering whether,

dbeta values should be normalized by the batch size of the training like so:

dbeta = np.sum(dout, axis=0) / batch_size

similarly, dgamma:

dgamma = np.sum(va2 * dva3, axis=0) / batch_size

In the implementation that I did, I was using full training set ( a very naive implementation ) , and once I found derivatives of gamma and beta, I always divided them by the number of rows in the training set. The results I got were really identical to same architecture built by keras:

Beta_Dist

Gamma_Dist

I looked at notes of CS231 and several other implementations of batch norm online, all of them were not dividing gradients of gamma and beta by the batch_size, could you please give your thoughts on why that should be the case.

I feel they should be divided in order to normalize the gradients, I also tried not dividing the gradients of my beta and gamma, and as expected their gradients exploded and diverged from optimum values ( my distributions and keras' for beta and gamma were way off ) ... I understand that if I use an entire training set, then almost always I have to divide by train size , but I feel it should also be the case when using mini-batches as well. Curious to know your thoughts :)

Thanks for your time again ! :D

assignment1 KNN classifier L2 distance

The functions
compute_distances_two_loops
compute_distances_one_loops
compute_distances_no_loops
all are required to compute the L2 distance so the dists matrix should be prepended by np.sqrt in all the cases

Non linearity for h0 in rnn.py

please correct me if i am wrong:
Feature to h1 becomes one single affine transformation without a non linearity in between. There should be a non linearity on h0 before it is passed to rnn cell

lstm_step_backward line# 343 += is mistake or intend?

Your lstm_step_backward line# 343 is written as

# Backprop into step 5
dnext_c += o * (1 - np.tanh(next_c)**2) * dnext_h

which means,
dnext_c = dnext_c + o * (1 - np.tanh(next_c)**2) * dnext_h

But, from your lstm_step_forward line# 304:
next_h = o * np.tanh(next_c)

I think dnext_c from line# 304 is just
dnext_c = o * (1 - np.tanh(next_c)**2) * dnext_h

If you intend it, what am I wrong?

Full Connected Net gradient issue

When full connected net has more than 3 layers, the backporp gradient and numerical gradient show significant difference. This issue could be reproduced in the Dropout.ipynb (in the cell Fully-connected nets with Dropout).

N, D, H1, H2, C = 2, 15, 20, 30, 10
X = np.random.randn(N, D)
y = np.random.randint(C, size=(N,))
s = np.random.randint(1)
for dropout in [0, 0.25, 1.0]:
  print 'Running check with dropout = ', dropout
  model = FullyConnectedNet([H1, 10,10,10,10,10,H2], input_dim=D, num_classes=C,
                            weight_scale=5e-2, dtype=np.float64,
                            dropout=dropout, seed=s)

  loss, grads = model.loss(X, y)
  print 'Initial loss: ', loss

  for name in sorted(grads):
    f = lambda _: model.loss(X, y)[0]
    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)
    print '%s relative error: %.2e' % (name, rel_error(grad_num, grads[name]))
  print

The output of this would be:

Running check with dropout =  0
Initial loss:  2.30258505897
W1 relative error: 2.41e-03
W2 relative error: 1.21e-03
W3 relative error: 1.60e-03
W4 relative error: 2.15e-03
W5 relative error: 1.75e-03
W6 relative error: 2.10e-03
W7 relative error: 1.89e-03
W8 relative error: 1.37e-03
b1 relative error: 1.76e-03
b2 relative error: 1.69e-02
b3 relative error: 6.03e-01
b4 relative error: 1.00e+00
b5 relative error: 1.00e+00
b6 relative error: 1.00e+00
b7 relative error: 1.00e+00
b8 relative error: 7.83e-11

Running check with dropout =  0.25
We use dropout with p =0.250000
Initial loss:  2.30258509299
W1 relative error: 0.00e+00
W2 relative error: 0.00e+00
W3 relative error: 0.00e+00
W4 relative error: 0.00e+00
W5 relative error: 0.00e+00
W6 relative error: 0.00e+00
W7 relative error: 0.00e+00
W8 relative error: 0.00e+00
b1 relative error: 0.00e+00
b2 relative error: 0.00e+00
b3 relative error: 0.00e+00
b4 relative error: 0.00e+00
b5 relative error: 1.00e+00
b6 relative error: 1.00e+00
b7 relative error: 1.00e+00
b8 relative error: 6.99e-11

Running check with dropout =  1.0
We use dropout with p =1.000000
Initial loss:  2.30258510213
W1 relative error: 3.55e-03
W2 relative error: 2.40e-03
W3 relative error: 2.44e-03
W4 relative error: 1.94e-03
W5 relative error: 1.98e-03
W6 relative error: 1.89e-03
W7 relative error: 2.13e-03
W8 relative error: 2.68e-03
b1 relative error: 2.36e-03
b2 relative error: 6.30e-04
b3 relative error: 7.33e-02
b4 relative error: 2.98e-01
b5 relative error: 1.00e+00
b6 relative error: 1.00e+00
b7 relative error: 1.00e+00
b8 relative error: 1.44e-10

I have tried several random seeds on this and it seems the bias gradient on the last layer are always correct. And the bias error will be extremely large since last hidden layer. However, error on W seems to be correct all the time. I firstly noticed this weird thing on my own implementation and it seems that the same thing occurs in your implementation. Any ideas?

assignment1 k_nearest_neighbor.py memory overflow

in function compute_distances_no_loops(self, X)

T = np.sum(X**2,axis = 1)
F = np.sum(self.X_train**2,axis = 1).T
F = np.tile(F,(500,5000))
FT = X.dot(self.X_train.T)
print T.shape,F.shape,FT.shape,X.shape,self.X_train.shape
dists = T+F-2*FT`

the code

F = np.tile(F,(500,5000))

will make a matrix which contain 500 * 5000 * 5000 elements
may be the code should be modified like

T = np.reshape(np.sum(X**2,axis = 1),(num_test,1))
F = np.sum(self.X_train**2,axis = 1).T
F = np.tile(F,(num_test,1))
FT = X.dot(self.X_train.T)
print T.shape,F.shape,FT.shape,X.shape,self.X_train.shape
dists = T+F-2*FT`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.