cthorey / cs231 Goto Github PK

View Code? Open in Web Editor NEW

587.0 587.0 288.0 28.2 MB

My corrections for the Standford class assingments CS231n - Convolutional Neural Networks for Visual Recognition

Shell 0.01% Python 1.73% Jupyter Notebook 98.24% TeX 0.01% Emacs Lisp 0.01%

cs231's People

Contributors

Stargazers

Watchers

Forkers

amiltonwong lizhangzhan pringleslion zvikush90 cheeyeo nmf-learning fuxicv wfwei metricle nikkicat123 xzoo2013 oasisyang shaoweipng fulquan applesleam minfengucas ungar7 changshuaiwei shaktidhar xavierlinnow kentchun33333 xinyifang cuicaihao wuqixiaobai prasad9 yuan39 swapnil-ss billy-addepar aaxwaz yakzan spidderk marskahn decayale miyainnyc dingchaoz jxlin miki0782 mehdimashayekhi hellojialee zhj930924 lukeboudreau goongong erkang anuragvermaknn youngger gauravk97 sanchitaggarwal ewanlee braveapple aaaaalucard gutsy-robot bityangke smwade why950708 johnsonice gouxiayibu k29 nova0930 ysun4 selmahfo svmer csf0429 tcksand chakpongchung aditya-desai xiaoerlaigeid facemachine ml-ai-nlp-ir blitu12345 neo4reo zjuxwg0401 19ai naejin l2ior dshen1 yanivshe gurucharan94 rickppd chienyiwang niketdoshi seanreed1111 huixudeng dashankadesilva purblue10 ddl0 zhiranroad chenming317 kunlqt ss5211 yult0821 rudy-kh leejessica hoodpanther alibaheri computervisionwhy sdmhans chicm xhappy yanfulin amiremadz

cs231's Issues

assignment1 KNN classifier

In order to predict labels. Just simple use like
y_predict[i]=np.argmax(np.bincount(closest_y))

Can't view the svm.ipynb file in assignment one

I can't view the svm.ipynb file on Github in assignment one. While opening on jupyter notebook it give this error.

Unreadable Notebook: C:\Users\Elixir\Documents\Github\CS231\assignment1\svm.ipynb TypeError("argument of type 'NoneType' is not iterable")

Shouldn't Batch Norm Derivatives, be normalized by batch_size

Hey @cthorey ,

I recently went through your batch normalization tutorial here: What does gradient flowing through ... . First of, thank you so much for such an amazing post about batch normalization, I was implementing batch normalization in a FC-DNN but could find only few resources which give code and also derivations like your blog. Even though I was successful in my implementation, my derivations for the affine transformations were slightly off, your post clarified few bugs I had.

Although I have one question about the derivatives of beta and gamma here: CS231/assignment2/cs231n/layers.py / . I was wondering whether,

dbeta values should be normalized by the batch size of the training like so:

dbeta = np.sum(dout, axis=0) / batch_size

similarly, dgamma:

dgamma = np.sum(va2 * dva3, axis=0) / batch_size

In the implementation that I did, I was using full training set ( a very naive implementation ) , and once I found derivatives of gamma and beta, I always divided them by the number of rows in the training set. The results I got were really identical to same architecture built by keras:

I looked at notes of CS231 and several other implementations of batch norm online, all of them were not dividing gradients of gamma and beta by the batch_size, could you please give your thoughts on why that should be the case.

I feel they should be divided in order to normalize the gradients, I also tried not dividing the gradients of my beta and gamma, and as expected their gradients exploded and diverged from optimum values ( my distributions and keras' for beta and gamma were way off ) ... I understand that if I use an entire training set, then almost always I have to divide by train size , but I feel it should also be the case when using mini-batches as well. Curious to know your thoughts :)

Thanks for your time again ! :D

assignment1 KNN classifier L2 distance

The functions
compute_distances_two_loops
compute_distances_one_loops
compute_distances_no_loops
all are required to compute the L2 distance so the dists matrix should be prepended by np.sqrt in all the cases

Typo: 'mode' should be a key rather than value.

If you change it to key you will see significantly slower conversion. Looks like this typo comes from the original class though.

CS231/assignment2/cs231n/classifiers/cnn.py

Line 143 in 11f0521

bn_param[mode] = mode

Should be:

    bn_param['mode'] = mode

Non linearity for h0 in rnn.py

please correct me if i am wrong:
Feature to h1 becomes one single affine transformation without a non linearity in between. There should be a non linearity on h0 before it is passed to rnn cell

lstm_step_backward line# 343 += is mistake or intend?

Your lstm_step_backward line# 343 is written as

# Backprop into step 5
dnext_c += o * (1 - np.tanh(next_c)**2) * dnext_h

which means,
dnext_c = dnext_c + o * (1 - np.tanh(next_c)**2) * dnext_h

But, from your lstm_step_forward line# 304:
next_h = o * np.tanh(next_c)

I think dnext_c from line# 304 is just
dnext_c = o * (1 - np.tanh(next_c)**2) * dnext_h

If you intend it, what am I wrong?

Full Connected Net gradient issue

When full connected net has more than 3 layers, the backporp gradient and numerical gradient show significant difference. This issue could be reproduced in the Dropout.ipynb (in the cell Fully-connected nets with Dropout).

N, D, H1, H2, C = 2, 15, 20, 30, 10
X = np.random.randn(N, D)
y = np.random.randint(C, size=(N,))
s = np.random.randint(1)
for dropout in [0, 0.25, 1.0]:
  print 'Running check with dropout = ', dropout
  model = FullyConnectedNet([H1, 10,10,10,10,10,H2], input_dim=D, num_classes=C,
                            weight_scale=5e-2, dtype=np.float64,
                            dropout=dropout, seed=s)

  loss, grads = model.loss(X, y)
  print 'Initial loss: ', loss

  for name in sorted(grads):
    f = lambda _: model.loss(X, y)[0]
    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)
    print '%s relative error: %.2e' % (name, rel_error(grad_num, grads[name]))
  print

The output of this would be:

Running check with dropout =  0
Initial loss:  2.30258505897
W1 relative error: 2.41e-03
W2 relative error: 1.21e-03
W3 relative error: 1.60e-03
W4 relative error: 2.15e-03
W5 relative error: 1.75e-03
W6 relative error: 2.10e-03
W7 relative error: 1.89e-03
W8 relative error: 1.37e-03
b1 relative error: 1.76e-03
b2 relative error: 1.69e-02
b3 relative error: 6.03e-01
b4 relative error: 1.00e+00
b5 relative error: 1.00e+00
b6 relative error: 1.00e+00
b7 relative error: 1.00e+00
b8 relative error: 7.83e-11

Running check with dropout =  0.25
We use dropout with p =0.250000
Initial loss:  2.30258509299
W1 relative error: 0.00e+00
W2 relative error: 0.00e+00
W3 relative error: 0.00e+00
W4 relative error: 0.00e+00
W5 relative error: 0.00e+00
W6 relative error: 0.00e+00
W7 relative error: 0.00e+00
W8 relative error: 0.00e+00
b1 relative error: 0.00e+00
b2 relative error: 0.00e+00
b3 relative error: 0.00e+00
b4 relative error: 0.00e+00
b5 relative error: 1.00e+00
b6 relative error: 1.00e+00
b7 relative error: 1.00e+00
b8 relative error: 6.99e-11

Running check with dropout =  1.0
We use dropout with p =1.000000
Initial loss:  2.30258510213
W1 relative error: 3.55e-03
W2 relative error: 2.40e-03
W3 relative error: 2.44e-03
W4 relative error: 1.94e-03
W5 relative error: 1.98e-03
W6 relative error: 1.89e-03
W7 relative error: 2.13e-03
W8 relative error: 2.68e-03
b1 relative error: 2.36e-03
b2 relative error: 6.30e-04
b3 relative error: 7.33e-02
b4 relative error: 2.98e-01
b5 relative error: 1.00e+00
b6 relative error: 1.00e+00
b7 relative error: 1.00e+00
b8 relative error: 1.44e-10

I have tried several random seeds on this and it seems the bias gradient on the last layer are always correct. And the bias error will be extremely large since last hidden layer. However, error on W seems to be correct all the time. I firstly noticed this weird thing on my own implementation and it seems that the same thing occurs in your implementation. Any ideas?

assignment1 k_nearest_neighbor.py memory overflow

in function compute_distances_no_loops(self, X)

T = np.sum(X**2,axis = 1)
F = np.sum(self.X_train**2,axis = 1).T
F = np.tile(F,(500,5000))
FT = X.dot(self.X_train.T)
print T.shape,F.shape,FT.shape,X.shape,self.X_train.shape
dists = T+F-2*FT`

the code

F = np.tile(F,(500,5000))

will make a matrix which contain 500 * 5000 * 5000 elements
may be the code should be modified like

T = np.reshape(np.sum(X**2,axis = 1),(num_test,1))
F = np.sum(self.X_train**2,axis = 1).T
F = np.tile(F,(num_test,1))
FT = X.dot(self.X_train.T)
print T.shape,F.shape,FT.shape,X.shape,self.X_train.shape
dists = T+F-2*FT`

cthorey / cs231 Goto Github PK

cs231's People

Contributors

Stargazers

Watchers

Forkers

cs231's Issues

assignment1 KNN classifier

Can't view the svm.ipynb file in assignment one

Shouldn't Batch Norm Derivatives, be normalized by batch_size

assignment1 KNN classifier L2 distance

Typo: 'mode' should be a key rather than value.

Non linearity for h0 in rnn.py

lstm_step_backward line# 343 += is mistake or intend?

Full Connected Net gradient issue

assignment1 k_nearest_neighbor.py memory overflow

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent