mantasu / cs231n Goto Github PK
View Code? Open in Web Editor NEWShortest solutions for CS231n 2021-2024
Shortest solutions for CS231n 2021-2024
Hi, I think it should be "np.random.rand" instead of "np.random.randn" on line 482 as "rand" will return numbers from uniform distribution of [0; 1) whereas "randn" will return numbers from standard normal distribution. In the case of implementing dropout, the function "rand" seems to be more reasonable. (p/s: sample code from classnote of cs231n also use "np.random.rand"; please check the link, dropout part).
Please correct me if I am wrong. Thank you!
4 will change the performance because each feature is scaled differently.
In https://github.com/mantasu/cs231n/blob/master/assignment2/Dropout.ipynb, the outputs under "Dropout: Forward Pass" seem wrong.
As a result, the pics under "Regularization Experiment" maybe wrong.
I checked your implementation of dropout forward/backward layer and they seem fine. Perhaps it's worth rerunning the notebook?
p.s. thanks for sharing this repo! It's been helpful to cross check answers
I have encounterd isues when tring to run the solver in default learning rate 1e-2, when looking for other people's answers, I notice that everyone basically change it to 1e-3 instead, that could work, but I wonder why would you do that so uniformly? I mean is there any official announcement to make it fixed like that?
In the affine Affine layer: backward section of assignment1/two_layer_net.ipynb why are we multiplying dout which is some random array?
np.random.seed(231)
x = np.random.randn(10, 2, 3)
w = np.random.randn(6, 5)
b = np.random.randn(5)
dout = np.random.randn(10, 5)
dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)
https://github.com/naya0000/cs231n/blob/e1192dc8cbaf078c3cfb691e12b8d6d2ec40c8fa/assignment1/cs231n/classifiers/linear_svm.py#L110
Can someone explain why this subtraction is done? An explanation for derivative calculation.
Using my MacBook Pro M1
I am learning the Transformer_Captioning.ipynb in assignment3. After I run the cell of testing MultiHeadAttention, I get some incorrect results:
self_attn_output error: 0.449382070034207
masked_self_attn_output error: 1.0
attn_output error: 1.0
I even copied your MultiHeadAttention code. But, I still get the same result:
self_attn_output error: 0.449382070034207
masked_self_attn_output error: 1.0
attn_output error: 1.0
I even downloaded your assignment3 code, and I still get the same output.
Is there anything else I missed?
In this function, the author uses the np.insert() function to supplement the missing rows and columns in dout.
Just like: dout = np.insert(dout, range(1, H_o), [[0]]*(stride-1), axis=2) if stride > 1 else dout
But when the stride is more than 3, this cannot be done because dout is missing at least two columns between each column element, as well as between rows, and np.insert() cannot insert more than one row in more than one row at a time.
np.insert() only can insert one row in more than one row at a time.
Here is a partial derivative jacobian matrix for softmax:
This simplifies to:
softmax[y[i]] -= 1 # update for gradient
Didn't get this? Can someone explain?
For reference see blog.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.