mantasu / cs231n Goto Github PK

View Code? Open in Web Editor NEW

229.0 229.0 52.0 14.81 MB

Shortest solutions for CS231n 2021-2024

Shell 0.03% Jupyter Notebook 97.69% Python 2.25% Cython 0.03%

computer-vision convolutional-neural-networks cs231n deep-learning pytorch tensorflow visual-recognition

cs231n's People

Contributors

Stargazers

Watchers

Forkers

sj-huang adimohekar vancuong1216 micaeltchapmi taylover2016 onlyourmiracle zhao-yiqiang bvv4ai bala22-cyber zhukuixi draconis98 edwardlan2002 momoyrz czx6858 higuseonhye xxcadam jiamu1 lijiyao919 karloknezevic ajmairkashif azzam154 ssingh10 ajyy waffleonwaffle cristivolo guenter-r wenhaoy-0428 alan-chua parawd pierrenowi sherry975-rxi liuxin0315 naya0000 saisiva123 alyssachen01 stephen-cloud-art nguyenit67 danilrivero ahmetahmed manadopeee nhattan417 hitling chenbadream onerai calloftimereiver alex-young1113 suhridgit yashwanth-mamilla louanna1208 3a1b2c3

cs231n's Issues

"two_layer_net.ipynb" Solver section default setting problem

I have encounterd isues when tring to run the solver in default learning rate 1e-2, when looking for other people's answers, I notice that everyone basically change it to 1e-3 instead, that could work, but I wonder why would you do that so uniformly? I mean is there any official announcement to make it fixed like that?

KNN, Inline Q2 Rotation.

If we take || R.dot(p(k))-R.dot(p(k')) ||, it is equal to 1. Need some more explaination?

Regarding eval_numerical_gradient_array

In the affine Affine layer: backward section of assignment1/two_layer_net.ipynb why are we multiplying dout which is some random array?

np.random.seed(231)
x = np.random.randn(10, 2, 3)
w = np.random.randn(6, 5)
b = np.random.randn(5)
dout = np.random.randn(10, 5)

dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)

Assignment1 knn Inline question 2 is incorrect

4 will change the performance because each feature is scaled differently.

Output of dropout notebook (assignment 2) seems wrong

In https://github.com/mantasu/cs231n/blob/master/assignment2/Dropout.ipynb, the outputs under "Dropout: Forward Pass" seem wrong.

For inverted dropout, mean of train time output and mean of input should be roughly the same. This is not the case there.
As p is the probability of keeping a node, "Fraction of train-time output set to zero" should be around 1 - p. This is not the case there.

As a result, the pics under "Regularization Experiment" maybe wrong.

In my notebook, the delta of train accuracy between no-dropout and dropout seems to be quite big; see attachment.

I checked your implementation of dropout forward/backward layer and they seem fine. Perhaps it's worth rerunning the notebook?

p.s. thanks for sharing this repo! It's been helpful to cross check answers

Assignment2/cs231n/layers.py line 482: "np.random.rand" rather than "np.random.randn"

Link to the issue

Hi, I think it should be "np.random.rand" instead of "np.random.randn" on line 482 as "rand" will return numbers from uniform distribution of [0; 1) whereas "randn" will return numbers from standard normal distribution. In the case of implementing dropout, the function "rand" seems to be more reasonable. (p/s: sample code from classnote of cs231n also use "np.random.rand"; please check the link, dropout part).

Please correct me if I am wrong. Thank you!

softmax_loss derivative -1 subtraction

Here is a partial derivative jacobian matrix for softmax:

This simplifies to:

softmax[y[i]] -= 1                  # update for gradient

Didn't get this? Can someone explain?
For reference see blog.

I think Inline question 2 is False

The problem in the conv_backward_naive

In this function, the author uses the np.insert() function to supplement the missing rows and columns in dout.
Just like: dout = np.insert(dout, range(1, H_o), [[0]]*(stride-1), axis=2) if stride > 1 else dout
But when the stride is more than 3, this cannot be done because dout is missing at least two columns between each column element, as well as between rows, and np.insert() cannot insert more than one row in more than one row at a time.
np.insert() only can insert one row in more than one row at a time.

svm_loss_vectorized Derivative

https://github.com/naya0000/cs231n/blob/e1192dc8cbaf078c3cfb691e12b8d6d2ec40c8fa/assignment1/cs231n/classifiers/linear_svm.py#L110
Can someone explain why this subtraction is done? An explanation for derivative calculation.

can not get the relative error which is less than e-3.

I am learning the Transformer_Captioning.ipynb in assignment3. After I run the cell of testing MultiHeadAttention, I get some incorrect results:

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

I even copied your MultiHeadAttention code. But, I still get the same result:

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

I even downloaded your assignment3 code, and I still get the same output.

Is there anything else I missed?

Can I run assignment3 locally?

Using my MacBook Pro M1