STAT 453: Intro to Deep Learning @ UW-Madison (Spring 2021)
rasbt / stat453-deep-learning-ss21 Goto Github PK
View Code? Open in Web Editor NEWSTAT 453: Intro to Deep Learning @ UW-Madison (Spring 2021)
License: MIT License
STAT 453: Intro to Deep Learning @ UW-Madison (Spring 2021)
License: MIT License
Hello @rasbt,
first of all thanks for making all this material available online, as well as your video lectures! A really helpful resource!
A small issue and fix: The classic softmax regression implementation in L08/code/softmax-regression_scratch.ipynb
has a small error in the bias computation (I think). Output for training (cell 8) gives the same weight for all bias terms:
Epoch: 049 | Train ACC: 0.858 | Cost: 0.484
Epoch: 050 | Train ACC: 0.858 | Cost: 0.481
Model parameters:
Weights: tensor([[ 0.5582, -1.0240],
[-0.5462, 0.0258],
[-0.0119, 0.9982]])
Bias: tensor([-1.2020e-08, -1.2020e-08, -1.2020e-08])
whereas the second implementation with nn.Module API gives different bias terms.
The problem lies in the torch.sum
call in SoftmaxRegression1.backward
: it computes a single sum over all biases which is later broadcast across all bias terms. You can fix this by changing
def backward(self, x, y, probas):
grad_loss_wrt_w = -torch.mm(x.t(), y - probas).t()
grad_loss_wrt_b = -torch.sum(y - probas)
return grad_loss_wrt_w, grad_loss_wrt_b
to
def backward(self, x, y, probas):
grad_loss_wrt_w = -torch.mm(x.t(), y - probas).t()
grad_loss_wrt_b = -torch.sum(y - probas, dim=0)
return grad_loss_wrt_w, grad_loss_wrt_b
it learns the toy problem a (very slight) bit better then.
why here is train_dp_list?
train_loader = DataLoader(train_dp_list,
batch_sampler=BatchSamplerSimilarLength(dataset = train_dp_list,
batch_size=BATCH_SIZE),
collate_fn=collate_batch)
valid_loader = DataLoader(train_dp_list,
batch_sampler=BatchSamplerSimilarLength(dataset = valid_dp_list,
batch_size=BATCH_SIZE,
shuffle=False),
collate_fn=collate_batch)
test_loader = DataLoader(train_dp_list,
batch_sampler=BatchSamplerSimilarLength(dataset = test_dp_list,
batch_size=BATCH_SIZE,
shuffle=False),
More of a FYI... Tried to reproduce L17 4_VAE_celeba-inspect notebook. When loading dataset, got ERROR "Unable to load CelebA dataset. File is not zip file error" with "BadZipFile: File is not a zip file". Found TorchVision Issue #2262 that identified problem as exceeding daily max quote on GoogleDrive, punted issue back to dataset authors, and closed their issue. A future version of TorchVision should give a better descriptive error message.
So, FYI to your students. Work-around is to...
Hi!
I'd like to ask why should we use avgpool instead pf maxpool?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.